Reading mpl audit trails: forensic patterns
The thing about an audit trail is that nobody appreciates it until they need it, and when they need it they need it right now, in a hurry, with someone on the other side of a desk asking pointed questions. The point of writing this post is to walk through the shapes of those questions and how an MPL trail answers them, so the day before is not the day you find out your trail wasn’t enough.
The model is simple: every message that passes through the MPL proxy
becomes a record. The record contains the envelope (stype, payload,
sem_hash, provenance), the QoM report from whatever profile applied,
and a timestamp. That’s it. The trail is a sequence of those records,
append-only, content-addressed by the BLAKE3 hash. You can store it in
whatever sink you like — object storage, a time-series DB, Loki, plain
NDJSON files. MPL doesn’t prescribe the sink; it prescribes the
record shape.
What follows is the questions, and the trail patterns that answer them.
”Show me everything agent X did between two timestamps”
This is the most basic forensic query and the one that justifies the
whole exercise. In the MPL record, the provenance.agent_id field is
what makes it possible. A query against the trail:
provenance.agent_id == "scheduler-prod"
AND timestamp >= 2026-05-01T00:00:00Z
AND timestamp < 2026-05-01T01:00:00Z
returns a chronological sequence of every envelope that agent emitted
in the window. Because the records carry the stype, you can filter
further — stype == "org.calendar.Event.v1" — to scope to a specific
contract. Because they carry sem_hash, two records with the same hash
are guaranteed to have identical payloads; you can dedup with confidence.
The thing this question doesn’t answer is “and is each of those records intact.” That’s the next pattern.
”Prove this record hasn’t been tampered with”
The sem_hash is BLAKE3 over the canonicalised payload. To verify a
record, you recompute the hash from the payload field of the record and
check it against the stored sem_hash. If they match, the payload
hasn’t changed since it was written. If they don’t, the record is
either corrupted or has been edited.
This is tamper-evident, not tamper-proof. An attacker with write access to your trail sink can replace both the payload and the hash and the verification still passes. The integrity guarantee comes from chaining: if you sink to an append-only store, or anchor the trail’s roots into a separate system (a Merkle log, an external timestamper), edits become detectable. MPL produces the per-record hash; the chaining strategy depends on how paranoid your compliance team is.
For most SOX/GDPR auditors, “this content hash was emitted at this timestamp, and the payload still hashes to that value today” is the answer they’re looking for.
”Trace this outcome back to the agent that caused it”
This is the most useful query and the hardest to answer without a contract layer. The setup: something happened downstream — an invoice got issued, a calendar event got created, a record got updated — and you want the chain of agent activity that led to it.
In an MPL trail, the chain is reconstructable because provenance
carries an intent, and intents are linkable across hops. Concretely:
when agent A calls the org.calendar.create tool, the record shows
provenance.agent_id == "A", provenance.intent == "create-event". When
the calendar service writes a record to its own trail, it can include
the upstream sem_hash as a parent reference. You walk the chain:
outcome record --> parent sem_hash --> envelope record
envelope record --> provenance.agent_id, provenance.intent
--> parent of THAT record (if any)
The shape of the chain depends on how rigorously your services thread the parent hash. The discipline MPL imposes is: every envelope has a canonical hash, so the chain can be threaded. Without that, you’re correlating on timestamps and hoping clocks agreed.
”What was the quality score on this message?”
Each record carries the QoM block from whatever profile was applied.
For a forensic query, the useful columns are qom.profile (which
contract you measured against), qom.passed (did it meet the bar),
and the per-metric values.
A pattern we see often: a downstream service complains that “the data from agent X is bad.” With a QoM column in the trail, you can answer concretely: of the last 10,000 messages from agent X on stype Y, 9,872 had schema fidelity 1.0 and groundedness above 0.9. Either the complaint is about the 128 that didn’t, or the complaint is about a metric we aren’t measuring, in which case the conversation is “what metric should we add to the profile.” Both conversations are more productive than “the data is bad."
"Did policy block this, or did it pass through?”
When the proxy is running in strict mode and a policy denies a message,
MPL writes a record with valid: false and the policy that fired. The
record exists even though the message didn’t propagate, which is the
property you want for forensics — deny events are first-class.
The query is straightforward: valid == false over your window. The
follow-up — why did it fail — is in the policy field. You can
distinguish schema violations from policy violations from QoM threshold
failures, because each writes a different reason code.
The pattern to watch for is a sudden spike in deny events on a single
stype — almost always either a sender drifted or a schema got
tightened without coordinating. Both are recoverable; you only see
them if the deny events are recorded.
”Reconstruct the state of the registry at time T”
This one is subtler. The contract a message was validated against can itself change over time. A message that was valid in March might fail validation in June, not because the message changed but because the schema did.
MPL records the schema version (org.calendar.Event.v1) on every
envelope. Combined with a versioned registry — which the MPL registry
service is — you can recover the schema that was current at the time
the record was written, and re-run validation against the historical
schema. This is the difference between “is this record valid now”
and “was this record valid at the time it was emitted.” The second
is the question forensic queries usually want.
What to do tomorrow
If you don’t have a trail today: turn the proxy on in transparent mode and sink the records anywhere. You’ll catch the schema drift in the first week.
If you have a trail today but it’s a stream of free-form log lines:
the gap is not volume, it’s queryability. The records have to be
structured around stype, agent_id, intent, sem_hash, and
timestamp for the questions above to be answerable. MPL gives you
those columns by construction.
And if you have a structured trail and a compliance review coming up:
practice the queries above before the meeting. The trail is only as
good as your ability to run the queries on it, and the worst time to
discover that your agent_id field has been empty for six months is
during the review.