Quality measurement vs guardrails: a different layer of the stack
There’s a category confusion that has cost a lot of teams a lot of debugging hours, and it’s this: people use the word “guardrails” to mean two completely different things. Sometimes they mean content safety — the filter that catches the model emitting a slur or a credit card number. Sometimes they mean behavioural correctness — the check that the model called the right tool with the right arguments in the right sequence. Both are useful. They are not the same problem.
MPL doesn’t ship guardrails. It ships quality measurement, which lives on the second problem. Pulling these apart is the most useful thing we can do in a blog post, because once the categories are clean the architecture gets obvious.
What a guardrail actually is
A guardrail, in the strict sense, is a reactive filter on a single model output. It looks at the generated text — or the generated tool call — and answers a yes/no question: is this safe to emit? The answer is computed without reference to the wider conversation, the contract the message is supposed to satisfy, or the downstream consequences. That’s not a criticism; it’s the whole point. Guardrails are designed to be cheap, local, and stateless.
Guardrails are good at:
- Catching PII in outputs that aren’t supposed to contain PII.
- Blocking unsafe tool invocations (e.g.
rm -rf /arriving at a shell tool). - Refusing categories of prompt injection.
- Enforcing brand tone or policy at the message boundary.
Guardrails are not good at, and were never meant to be good at:
- Telling you whether the call was correct.
- Telling you whether the agent followed instructions that were given three turns ago.
- Telling you whether claims in the output are grounded in cited sources.
- Telling you whether the same input would have produced the same output yesterday.
The last four are quality questions. They are what MPL’s quality-of-message (QoM) profiles are for.
Six metrics that compose
The MPL registry ships six metrics. You compose them into profiles, and profiles attach to stypes via policy. The metrics are:
| Metric | What it answers |
|---|---|
| Schema Fidelity | Does the payload structurally match the contract? |
| Instruction Compliance | Did the message respect the constraints in the prompt or upstream instruction? |
| Groundedness | Are claims backed by referenced sources? |
| Determinism | Does the same input produce a stable output? |
| Ontology Adherence | Are domain rules respected (e.g. ICD-10 codes are valid codes)? |
| Tool Outcome | Did the actual side effect match the declared intent? |
None of these is a safety filter. All of them are correctness against a declared contract. Schema fidelity asks “does the wire format hold up”; instruction compliance asks “did you do what you were told”; groundedness asks “are you making this up”; determinism asks “are you stable”; ontology adherence asks “do you know the domain”; tool outcome asks “did the world actually move the way you said it would.”
You pick the metrics that matter for your domain. A clinical decision
support agent might run schema_fidelity: 1.0, groundedness: 0.95,
ontology_adherence: 1.0. A creative-writing agent might not run
groundedness at all but might care about determinism for reproducibility
of test fixtures. The point is the choice is yours, and it’s expressed
declaratively in a profile JSON.
Where each layer belongs
Stack it out:
┌─────────────────────────────────────────┐
│ Content safety / guardrails │ reactive, per-output
├─────────────────────────────────────────┤
│ MPL quality measurement (QoM) │ scored against contract
├─────────────────────────────────────────┤
│ MPL contracts (stypes) │ typed envelope
├─────────────────────────────────────────┤
│ Transport (MCP, A2A, HTTP) │
└─────────────────────────────────────────┘
Each layer answers a different question. The content-safety layer asks should this exist. The QoM layer asks did this meet the bar we set. The contracts layer asks is this a thing we have a name for. If you try to make one layer do all three jobs, you get a tangled middleware that is hard to tune and harder to audit.
The “did it meet the bar” output
A guardrail returns a decision: pass or block. That’s a one-bit signal. Useful, but one-bit.
A QoM profile returns a report: each configured metric, its measured
value, whether it cleared the threshold, and the inputs the measurement
was computed on. That’s a multi-dimensional signal you can persist,
aggregate, regress on, and point at. When something drifts — say,
groundedness on org.support.Reply.v1 drops from 0.94 to 0.81 over
two weeks — you see it in the metric. You don’t see it in a
pass/block boolean.
This is the part that maps to the EU AI Act’s transparency requirement in a way a guardrail can’t. The Act doesn’t ask whether you blocked unsafe outputs. It asks whether you can produce evidence of the quality characteristics of your system’s behaviour, over time, with provenance. A pass/block log doesn’t do that. A timestamped QoM report keyed by content hash and provenance does.
”Why not just both?”
You should run both. The thing we want to disabuse you of is the idea that they’re substitutes. They are different tools for different failure modes:
- Content safety blocks the agent from emitting a slur.
- Quality measurement catches the agent emitting a structurally valid
but factually invented
LabResultpayload.
The first is a moderation problem and the well-developed solutions live in the guardrail/content-filter ecosystem. The second is an engineering correctness problem and is what MPL is for. Putting both on the same budget — or, worse, asking your guardrail vendor to handle both — collapses two distinct concerns into a single confused middleware.
What it looks like on the wire
When MPL is in the path, every message comes out with a QoM block attached:
{
"stype": "org.support.Ticket.v1",
"valid": true,
"qom": {
"profile": "qom-strict-argcheck",
"schema_fidelity": 1.0,
"instruction_compliance": 0.97,
"groundedness": 0.92,
"passed": true
},
"sem_hash": "blake3:7f2a..."
}
That block is what you store. That block is what you regress over time. That block is what you hand to compliance. The pass/block boolean from your guardrail is fine too — but it goes in a different column, because it’s answering a different question.
The drift-detection bonus
There’s one more reason to keep these layers separate, which is that the
QoM block has an emergent use the guardrail layer can’t replicate: it’s
a time-series. Once you’re scoring every message against a profile, the
scores themselves become observable. You can chart groundedness per
stype over time, alert when a metric drops below a moving baseline, and
correlate the drop with deployments, prompt changes, or model swaps.
Guardrails are stateless by design — they don’t know what last week’s output looked like, so they can’t tell you “this agent has been getting worse at citing sources for ten days.” A QoM stream can. That’s not a moderation capability; it’s an engineering signal about your agents’ behaviour, on the same axis you already think about latency and error rate.
This is the most underrated reason to instrument the second layer separately. It’s not just compliance. It’s that you finally have a number you can put on “is this agent doing its job,” and you can watch that number move when you change something upstream.
Guardrails keep your agents safe. Quality measurement keeps them correct. You need both, and you need them in distinct layers, because the day they share a layer is the day you can’t tell which one failed.