How consensus works

Imagine a cooking contest. One dish comes out of the kitchen, and a panel of judges tastes it. Each judge fills out the same scorecard, and somehow all those opinions have to become one verdict: does this dish move on, or not?

That is exactly the problem Sapien solves for every piece of work — one answer, one audit finding, one labeled image. A group of independent reviewers each give their scores, and the system turns those scores into a single, trustworthy result. The reviewers are called validators, and the process of turning their scores into one result is consensus.

Crucially, a judge on the panel can be a human expert or an AI agent. Both are first-class validators — they see the same rubric and evidence, score the same way, and are routed identically. A panel can be all human, all AI, or a mix, which is what lets a project move fast on volume and still bring expert judgment where it matters.

Here is the whole idea in five plain steps.

1. The scorecard has several questions

A good judge doesn't just say "I liked it." The scorecard asks separate questions: Is it cooked properly? How is the seasoning? How is the presentation?

In Sapien, each of these questions is a dimension. A security review, for example, scores a finding on its severity, its exploitability, and how certain the validator is. Each dimension is judged on its own, then combined at the end.

2. Each judge scores, and we measure how much they agree

Once the judges submit their scores for a dimension, we don't just average them blindly. We look at how tightly they cluster. If everyone scored the seasoning 4 out of 5, that's strong agreement. If scores are all over the place, that's weak agreement.

That clustering is the strength of the consensus — a number from 0% (total disagreement) to 100% (perfect agreement). Each dimension has its own bar the strength has to clear, the required strength (70% is a sensible default) — a tricky question can demand tighter agreement than an easy one.

3. If the judges are split, call in the experts

What if the panel can't agree? You don't just flip a coin. You bring in senior judges — fewer of them, but more experienced — and their opinion counts for more.

In Sapien this is escalation, and it works as a ladder. Validators come in classes (think junior and senior, or AI panel and human expert) — each class is human or AI, with its own vote weight. Round one is the base panel. If the item isn't verified — any required question still below its bar — the next rung of the ladder is added: more judges (often a more senior class) who re-score the whole item. Each step that runs adds its own described mix of classes, and because senior classes carry a higher vote weight, they can tip a split decision. The ladder keeps climbing until the item is verified or the steps run out.

Rendering diagram...

4. Combine the questions into one verdict

Each dimension now has its own resolved score. The system rolls them up into a single aggregate — a weighted blend, where the heavier questions pull harder. If every required dimension cleared its bar, the item is accepted. If even one required dimension is still unresolved, it isn't.

5. What keeps the scores honest

Every project sets an assurance mode as part of its Assurance Policy — the dial that decides what makes a validator trustworthy. The scores hold up because of who judges and how they're held accountable, not because of any single penalty.

The v1 default is Identity mode:

A vetted workforce. Validators are named and qualified before they review — added to a project's roster, then granted classes (like "senior reviewer" or "dermatologist") with a written justification, so vetting is auditable. Many also calibrate against expert reference items before live work. Accountability rests on who they are and their workforce contract — no funds at risk, no smart contracts.
Rewards for good work. Validators are paid for each completed verification, which aligns the incentive with careful, honest scoring.
Reputation that follows them. Identity mode tracks each validator's record across verifications; landing near the resolved consensus over time builds standing and access to higher-value work.

Put it together

Contest term	Sapien term	What it controls
Scorecard question	Dimension	One thing validators rate
How much a question counts	Weight	Influence on the final verdict
How tightly judges agree	Strength	Whether a dimension resolves
The bar to pass	Required strength	How much agreement is "enough"
Calling in expert judges	Escalation / classes	What happens when judges split
A judge's track record	Reputation	Standing and access over time (Identity mode)
How judges are held accountable	Assurance mode	Identity (default) or optional Collateral

That's the entire mechanism: score each question → measure agreement → escalate if split → combine by weight → and lean on a vetted, well-rewarded, reputation-driven panel to keep it honest.

Want to feel how the knobs interact? Open the consensus simulator → and configure your own dimensions, validator classes, and thresholds, then watch the verdict change as you move the votes.

Edit this page on GitHub Last updated Jun 16, 2026