Britt Probabilistic Validation Lifecycle — Britt Biocomputing

Architecture · Execution methodology

The Britt Probabilistic
Validation Lifecycle

Seven steps that take a probabilistic AI system from intended use to audit-ready evidence — the execution methodology that runs inside Layer 3 of the House of AI Trust. Same skeleton as the GAMP 5 V-model; different muscle for systems whose outputs vary.

Lives in House of AI Trust · Control Layer (L3) · Rigor scales with risk tier

The seven steps

Seven steps, one continuous cycle

The lifecycle is not a one-time gate. Steps 1–5 establish the system; monitoring and change control feed back into context of use whenever the model, its data, or its operating context shifts.

Define the job

Context of Use

Who uses the model, for what decision, with which failure modes? The context of use fixes the intended purpose and the consequences of getting it wrong — everything downstream is scoped from here.

Asks

Intended use, users, decision, failure modes

Output

A bounded statement of what the system is for

Scale the rigor

Risk Tiering

Risk is tiered by consequence to patient safety and data integrity — not by software category. The tier sets how much evidence each later step must produce, so a high-consequence use earns deeper validation than a low-stakes one.

Asks

What is the consequence if this fails?

Output

A risk tier that drives evidence depth

Build the test

Evaluation Design

Traceable, domain-specific evaluation sets with lineage (ALCOA+), leakage controls, and real edge cases. The Error Taxonomy maps candidate failure modes to consequence-weighted outcomes so the eval targets what actually matters.

Asks

How will we know it works, on what data?

Output

Golden datasets + a taxonomy-mapped eval plan

Set the bar

Acceptance Criteria

Probabilistic systems don't pass or fail on a single run. Acceptance criteria are defined as bounded-variance thresholds calibrated to the risk tier — what range of behaviour is acceptable, and how confident must we be in it.

Asks

What performance, within what tolerance, counts as validated?

Output

Quantified, defensible acceptance thresholds

Design the oversight

HITL / HOTL Controls

Human-in-the-loop and human-on-the-loop controls are designed to match the risk tier — where a person reviews, where they supervise, and where the system runs unattended. Oversight is engineered, not assumed.

Asks

Where must a human review, supervise, or intervene?

Output

An oversight design mapped to decision points

Watch for drift

Continuous Monitoring

Post-deployment, performance is monitored against the acceptance criteria with drift detection on inputs, outputs, and the operating context. Monitoring turns validation from a moment into a standing state.

Asks

Is the system still inside its validated envelope?

Output

Drift signals + revalidation triggers

Govern the change

Change Control

↻

When the model, data, or context changes — or a monitoring trigger fires — change control decides what must be re-evaluated and routes the system back to the relevant step. This is the loop that closes back to context of use.

Asks

What changed, and what must be revalidated?

Output

A scoped revalidation, looping to step 01

One framework, three depths

Rigor scales with the risk tier

The same seven steps apply whether the stakes are exploratory or GxP-critical — what changes is the depth of evidence each step demands.

R&D · lightweight

Discovery sprint

Fast cycle, lighter documentation. Establish context of use and acceptance criteria, evaluate against a representative set, ship with eyes open.

GxP · full V&V

Regulated validation

Complete verification and validation with traceable evaluation sets, ALCOA+ lineage, HITL/HOTL controls, and audit trails for every step.

Production · retainer

Drift watch

Continuous monitoring with drift detection, revalidation triggers, and change control running as an ongoing operational discipline.

Where this sits in the architecture

The lifecycle is the execution engine inside the Control Layer of the House of AI Trust. The Error Taxonomy supplies the failure classification that the evaluation-design and monitoring steps depend on; VALID Trust handles the supplier-qualification thread that runs horizontally through every layer.

House of AI Trust ↗ Error Taxonomy → VALID Trust →