Built with AI
Built with AI, from the first day.
AI did not assist this work after the fact. It is how Prova was conceived, specified, and built, and it is how the work gets done.
Why now
Why this is possible now.
The methods of rigorous social measurement were settled decades ago. What kept it expensive was the cognitive work between the judgments: reading everything a program produces, designing the study, configuring the instruments, synthesizing across cases, holding a hundred moving parts in mind at once. That work needed trained people, and a lot of them.
Frontier models crossed a threshold: long-context reliability, judgment-grade reasoning, and the orchestration to run a real measurement practice on. It lets a small team, with a system of agents, operate something too complex to hold by hand. That is why AI is at the heart of Prova, and why it could not have been a few years ago.
What it built
It helped build the firm itself.
The methods for rigorous evaluation are well established; they live in the literature, not in any one person’s head. What was missing was a structure that runs them at the cost and speed real programs need. Working layer by layer with frontier models, we built that structure: an eight-layer architecture of what makes a finding credible or fragile, 115 densely interconnected factors across design, measurement, context, and governance. Beneath it sits the incentive design that makes honest evidence the rational outcome rather than a hope. Built this way, it could lower the cost of rigorous evidence for everyone, not just Prova.
Under the hood
The prior is the product.
AI sits inside an authored, versioned structure that carries the evaluation method: factors, layers, literature, and the rules that keep a finding auditable.
Frozen general model
Reasoning substrate. As frontier models improve, Prova gets a stronger reader, extractor, and design-search engine. The gains compound.
Prova's domain intelligence lives here.
- 115
- factors
- 8
- layers
- ~50
- literature streams
Auditable measurement instrument
Graded causal claim, provenance, and confidence level.
What it changes
For the first time, decision-grade evaluation comes within reach of those who were priced out.
When the cost of reading everything a program produces, its case notes, interviews, transcripts, and field reports, comes down, so does the cost of the evaluation built on it: work that once took a research team six months can be done in weeks, for a fraction of the price.
The standard of rigor does not move, and nothing is lowered to get there.
What keeps improving
The learning compounds outside the model.
As the work runs, Prova can keep what each engagement teaches: calibrated assumptions, recurring risks, and design judgments that make the next evaluation sharper.
- 01
Engagement record
source material, study design, field context, and what the evidence could bear
- 02
Calibrated parameters
the assumptions that sharpen the next design live in Prova’s library
- 03
Next evaluation
a stronger starting prior, still inspected by people and tested against the case
Prova parameter library
- PL.01Effect sizes
- PL.02ICCs
- PL.03Attrition rates
- PL.04Implementation risks
Frontier model improves
Delegated reading, extraction, and search get stronger
In the work
AI does the heavy lifting. The judgment stays human.
On every engagement, AI does the heavy lifting: working through the material a program produces and turning it into structured, traceable claims, far faster than the same work done by hand. The judgment in everything that matters stays with people: whether an effect is real, what it means in this place, where the evidence runs out, and the care owed to the people whose records these are. We move the line between the two only as the tools earn it.
AI carries
- AI.01
Read the record
case notes, transcripts, field reports, and the materials a program already produces
- AI.02
Structure the claims
turn raw material into traceable statements, each tied back to its source
- AI.03
Search the design space
compare plausible measurement paths under explicit constraints
- AI.04
Surface uncertainty
mark missing context, thin data, contradictions, and places where the work should stop
Humans hold
- HU.01
Judge the effect
decide what the evidence can honestly bear, including when it cannot bear the claim
- HU.02
Hold the context
ask what an effect means in this place, for these people, under these conditions
- HU.03
Carry the ethics
protect consent, dignity, proportionality, and the people inside the records
- HU.04
Own the relationship
stand behind the finding with the funder, operator, and community it affects
The line, held by design
Some things the system is never allowed to touch.
We built AI into Prova knowing what it can do wrong: it is fast and fluent enough to find a flattering pattern, or tell a better story after the fact, if you let it. So the system has lines it is not allowed to cross, on every engagement, built into how it works.
For example, the first rule every evaluator learns: you decide what would count as success before you see the data, and you do not move that line afterward. The system can’t. It will not run a design that fails an ethics review. And when it is unsure, or a study is too small to show what it is looking for, it says so plainly. Where something looks wrong, the work stops and a person looks. People hold the judgment, the ethics, and the relationships. That is the part we will never hand to a model.
- HC.01Before data
Success is defined first
the question, threshold, and analysis plan are set before the system sees the result
- HC.02Analysis lock
The line cannot move
the generator cannot lower the standard, rewrite the target, or explain away a miss
- HC.03Evidence review
Uncertainty stays visible
thin samples, null results, contradictions, and missing context remain in the record
- HC.04Human stop
A person takes over
where the evidence looks wrong, unethical, or too weak, the work stops for judgment
The system cannot
- move the success line
- bury a null result
- soften a finding
In short
This is how Prova does what it does.
AI is the reason a small team can offer rigorous evidence at the speed and cost real programs run at, and the human judgment is the reason you can trust what comes back. If any of it raises a question, that is a good place to start.
Start a conversation