The Prova Method

One method behind every engagement.

Every Read, and the larger work that follows, runs on one method: a disciplined way of turning what a program produces into evidence you can defend, at a cost we are setting out to bring within reach. What follows is how it works, and where we draw its limits.

What AI-native means here

The constraint was always the cost of reading.

“AI-native” has started to blur as a phrase. Here is exactly what it means in our work.

For decades the methodology was settled. The limit on rigorous measurement was the cost of reading: someone has to work through everything a program produces (case notes, interview transcripts, field reports) and turn it into a structured, claim-by-claim account against a defined standard of evidence. That reading is what made serious measurement slow and expensive.

Frontier AI models, inside the right system, can now do that reading, and much of the work around it. We’ve built our method around exactly that: unstructured documents become structured claims, each traceable back to its source, under human supervision. Whether an effect is real, whether a design is sound, what an outcome means for this program in this place: those calls stay with the people who have spent careers earning them, and so does the care owed to the people whose records these are.

That is what we mean by AI-native. AI does the heavy lifting; the judgment in everything that matters stays human. How far that brings the cost of proof down is what we’re setting out to prove.

Where the line falls

AI does the heavy lifting. The judgment stays human.

THE MATERIALAITHE HEAVY LIFTINGHUMANEVIDENCE

Lines the system cannot cross

It cannot move a pre-registered line, soften a finding, lower the bar, or bury a null result. Where it is unsure it says so, and the work stops for a person to look.

The diagram shows where the line sits today. Where a task needs causal judgment, or where handing it to AI would produce evidence that looks rigorous and isn’t, we keep it with a person. We treat the line as a matter of discipline, and move it only as the tools genuinely earn it.

What we use, and what’s built

A method we’ve built, and AI we use inside it.

Prova works this way today. Two things sit behind every engagement, built in depth, and we use AI inside them.

The first is an operational architecture: the factors that decide whether a finding holds or falls apart, mapped across design, measurement, context, and governance. The second is the incentive design beneath it: the arrangement that makes honest evidence the rational outcome rather than a hope. Both are built. AI does the heavy lifting inside them, under human supervision; the judgment in everything that matters stays human.

What we are setting out to prove, engagement by engagement, is how far this brings the cost of proof down. The method is real; the change in what proof costs is the work ahead.

Finding the design

Most programs already contain a credible experiment.

Points where some people, places, or times were treated differently for reasons that let you read a causal effect from data you already have.

ELIGIBILITY CUTOFFregression discontinuitySTAGGERED ROLLOUTdifference-in-differencesOVERSUBSCRIPTIONthe trial you already ranTHE BORDERgeographic discontinuity

The administrative border

A program stops at a district line. Households just inside and just outside are alike but for access, and the border reads the effect.

The Identification Atlas, in full →

How we govern it

Limits the system cannot cross.

Building on AI raises a fair question: how do you know it isn’t cutting corners where no one is looking? The answer is structural.

People hold the judgment, the ethics, and the relationships. We are building the system so it cannot touch the things that define honest evidence: it cannot rewrite a pre-registered analysis, soften a finding, lower the standard of proof, or bury a null result. Where it isn’t sure, it says so rather than fill the gap with a confident guess. When something looks wrong, the work stops and a person looks.

Claim discipline

We match every claim to its evidence.

Evidence comes in grades, from descriptive to experimental, and the method makes the grade explicit. Our job is to find the strongest claim your evidence can honestly support, and to certify it at exactly that grade. Where the data will carry more, we say so; where a claim reaches past its evidence, we find the firmer one beneath it. We certify what the evidence holds, mark what is not yet settled, and keep the standard the same whether the answer is convenient or not, because that is what makes every program stronger.

The Honesty Layer: how we grade a claim →

Method fit

The grade the decision actually needs.

A more elaborate design is not always the right one. The question is what grade of evidence a decision actually needs, what the context allows, and what it costs to get there. We meet the question at the right grade and help you climb only where climbing is worth it. The standard for each grade never moves; what can change is the price of reaching it. And the same method is configured to the setting: a design that works in one place can fail in another, so the approach bends to context while the standard does not.

What you can check

The record holds, and the method sharpens.

Every claim is traceable back to the source it came from, so the reasoning can be checked rather than taken on trust. Behind the work is an explicit architecture of the factors that make evidence credible or fragile, something we keep developing rather than treat as finished. The method is built to get sharper with each engagement.

The architecture: what makes evidence credible or fragile →

Standards

The rubrics we work to.

The methods behind the Reads, published in full: the standards by which we grade a claim, find a credible design, and stress-test a study. Use them, argue with them, hold us to them.

The Honesty Layer Rubric

How a claim is graded against the evidence beneath it, and what it takes to certify, downgrade, or refuse one.

Read it →

The Identification Atlas

The families of natural experiments, and the conditions under which each yields credible causal evidence.

Read it →

The Risk Taxonomy of Evaluation Failure

The predictable ways large studies fail, and how to catch each one early.

Read it →

Start a conversation

Bring a question.

If a decision is coming and you’re not sure the evidence will be there in time, that’s the conversation to have. The first conversation is a fit check.

Start a conversation