Built with AI

Built with AI, from the first day.

AI did not assist this work after the fact. It is how Prova was conceived, specified, and built, and it is how the work gets done.

Why now

Why this is possible now.

The methods of rigorous social measurement were settled decades ago. What kept it expensive was the cognitive work between the judgments: reading everything a program produces, designing the study, configuring the instruments, synthesizing across cases, holding a hundred moving parts in mind at once. That work needed trained people, and a lot of them.

Frontier models crossed a threshold: long-context reliability, judgment-grade reasoning, and the orchestration to run a real measurement practice on. It lets a small team, with a system of agents, operate something too complex to hold by hand. That is why AI is at the heart of Prova, and why it could not have been a few years ago.

What it built

It helped build the firm itself.

The methods for rigorous evaluation are well established; they live in the literature, not in any one person’s head. What was missing was a structure that runs them at the cost and speed real programs need. Working layer by layer with frontier models, we built that structure: an eight-layer architecture of what makes a finding credible or fragile, 115 densely interconnected factors across design, measurement, context, and governance. Beneath it sits the incentive design that makes honest evidence the rational outcome rather than a hope. Built this way, it could lower the cost of rigorous evidence for everyone, not just Prova.

The architecture →

Under the hood

The prior is the product.

AI sits inside an authored, versioned structure that carries the evaluation method: factors, layers, literature, and the rules that keep a finding auditable.

Fig. 01 · Compiled priorSystem A · Model-swappable
Base model

Frozen general model

Reasoning substrate. As frontier models improve, Prova gets a stronger reader, extractor, and design-search engine. The gains compound.

Authored prior

Prova's domain intelligence lives here.

115
factors
8
layers
~50
literature streams
Finding

Auditable measurement instrument

Graded causal claim, provenance, and confidence level.

The model supplies general reasoning. The prior supplies the evaluative method: the factor map, the boundaries, and the evidentiary standards. That is why Prova improves as models improve: the delegated work gets stronger, while the method and audit trail stay intact.

What it changes

For the first time, decision-grade evaluation comes within reach of those who were priced out.

When the cost of reading everything a program produces, its case notes, interviews, transcripts, and field reports, comes down, so does the cost of the evaluation built on it: work that once took a research team six months can be done in weeks, for a fraction of the price.

The standard of rigor does not move, and nothing is lowered to get there.

What keeps improving

The learning compounds outside the model.

As the work runs, Prova can keep what each engagement teaches: calibrated assumptions, recurring risks, and design judgments that make the next evaluation sharper.

Fig. 04 · Compounding outside the weightsParameter library · Model-swappable
  1. 01

    Engagement record

    source material, study design, field context, and what the evidence could bear

  2. 02

    Calibrated parameters

    the assumptions that sharpen the next design live in Prova’s library

  3. 03

    Next evaluation

    a stronger starting prior, still inspected by people and tested against the case

Prova parameter library

  • PL.01Effect sizes
  • PL.02ICCs
  • PL.03Attrition rates
  • PL.04Implementation risks

Frontier model improves

Delegated reading, extraction, and search get stronger

The learning Prova owns is not hidden in model weights. It compounds in the authored prior and parameter library, so the next run starts sharper while the model remains replaceable.

In the work

AI does the heavy lifting. The judgment stays human.

On every engagement, AI does the heavy lifting: working through the material a program produces and turning it into structured, traceable claims, far faster than the same work done by hand. The judgment in everything that matters stays with people: whether an effect is real, what it means in this place, where the evidence runs out, and the care owed to the people whose records these are. We move the line between the two only as the tools earn it.

Fig. 02 · Cognition boundaryModel labor · Human judgment

AI carries

  1. AI.01

    Read the record

    case notes, transcripts, field reports, and the materials a program already produces

  2. AI.02

    Structure the claims

    turn raw material into traceable statements, each tied back to its source

  3. AI.03

    Search the design space

    compare plausible measurement paths under explicit constraints

  4. AI.04

    Surface uncertainty

    mark missing context, thin data, contradictions, and places where the work should stop

Humans hold

  1. HU.01

    Judge the effect

    decide what the evidence can honestly bear, including when it cannot bear the claim

  2. HU.02

    Hold the context

    ask what an effect means in this place, for these people, under these conditions

  3. HU.03

    Carry the ethics

    protect consent, dignity, proportionality, and the people inside the records

  4. HU.04

    Own the relationship

    stand behind the finding with the funder, operator, and community it affects

The line can move only when the tools earn it. Extraction and design search can be delegated; judgment, ethics, context, and care remain held by people.

Where the line falls →

The line, held by design

Some things the system is never allowed to touch.

We built AI into Prova knowing what it can do wrong: it is fast and fluent enough to find a flattering pattern, or tell a better story after the fact, if you let it. So the system has lines it is not allowed to cross, on every engagement, built into how it works.

For example, the first rule every evaluator learns: you decide what would count as success before you see the data, and you do not move that line afterward. The system can’t. It will not run a design that fails an ethics review. And when it is unsure, or a study is too small to show what it is looking for, it says so plainly. Where something looks wrong, the work stops and a person looks. People hold the judgment, the ethics, and the relationships. That is the part we will never hand to a model.

Fig. 03 · Honesty by constructionPre-commitment · Non-overridable line
  1. HC.01Before data

    Success is defined first

    the question, threshold, and analysis plan are set before the system sees the result

  2. HC.02Analysis lock

    The line cannot move

    the generator cannot lower the standard, rewrite the target, or explain away a miss

  3. HC.03Evidence review

    Uncertainty stays visible

    thin samples, null results, contradictions, and missing context remain in the record

  4. HC.04Human stop

    A person takes over

    where the evidence looks wrong, unethical, or too weak, the work stops for judgment

The system cannot

  • move the success line
  • bury a null result
  • soften a finding
Honesty is not left to a model instruction. The important constraint is structural: the standard is fixed before the result, and the finding has to survive that line.

In short

This is how Prova does what it does.

AI is the reason a small team can offer rigorous evidence at the speed and cost real programs run at, and the human judgment is the reason you can trust what comes back. If any of it raises a question, that is a good place to start.

Start a conversation