Blog

Reading mpl audit trails: forensic patterns

2026-06-02 · Skelf-Research · audit · forensics · provenance

The thing about an audit trail is that nobody appreciates it until they need it, and when they need it they need it right now, in a hurry, with someone on the other side of a desk asking pointed questions. The point of writing this post is to walk through the shapes of those questions and how an MPL trail answers them, so the day before is not the day you find out your trail wasn’t enough.

The model is simple: every message that passes through the MPL proxy becomes a record. The record contains the envelope (stype, payload, sem_hash, provenance), the QoM report from whatever profile applied, and a timestamp. That’s it. The trail is a sequence of those records, append-only, content-addressed by the BLAKE3 hash. You can store it in whatever sink you like — object storage, a time-series DB, Loki, plain NDJSON files. MPL doesn’t prescribe the sink; it prescribes the record shape.

What follows is the questions, and the trail patterns that answer them.

”Show me everything agent X did between two timestamps”

This is the most basic forensic query and the one that justifies the whole exercise. In the MPL record, the provenance.agent_id field is what makes it possible. A query against the trail:

provenance.agent_id == "scheduler-prod"
AND timestamp >= 2026-05-01T00:00:00Z
AND timestamp <  2026-05-01T01:00:00Z

returns a chronological sequence of every envelope that agent emitted in the window. Because the records carry the stype, you can filter further — stype == "org.calendar.Event.v1" — to scope to a specific contract. Because they carry sem_hash, two records with the same hash are guaranteed to have identical payloads; you can dedup with confidence.

The thing this question doesn’t answer is “and is each of those records intact.” That’s the next pattern.

”Prove this record hasn’t been tampered with”

The sem_hash is BLAKE3 over the canonicalised payload. To verify a record, you recompute the hash from the payload field of the record and check it against the stored sem_hash. If they match, the payload hasn’t changed since it was written. If they don’t, the record is either corrupted or has been edited.

This is tamper-evident, not tamper-proof. An attacker with write access to your trail sink can replace both the payload and the hash and the verification still passes. The integrity guarantee comes from chaining: if you sink to an append-only store, or anchor the trail’s roots into a separate system (a Merkle log, an external timestamper), edits become detectable. MPL produces the per-record hash; the chaining strategy depends on how paranoid your compliance team is.

For most SOX/GDPR auditors, “this content hash was emitted at this timestamp, and the payload still hashes to that value today” is the answer they’re looking for.

”Trace this outcome back to the agent that caused it”

This is the most useful query and the hardest to answer without a contract layer. The setup: something happened downstream — an invoice got issued, a calendar event got created, a record got updated — and you want the chain of agent activity that led to it.

In an MPL trail, the chain is reconstructable because provenance carries an intent, and intents are linkable across hops. Concretely: when agent A calls the org.calendar.create tool, the record shows provenance.agent_id == "A", provenance.intent == "create-event". When the calendar service writes a record to its own trail, it can include the upstream sem_hash as a parent reference. You walk the chain:

outcome record  -->  parent sem_hash  -->  envelope record
envelope record -->  provenance.agent_id, provenance.intent
                -->  parent of THAT record (if any)

The shape of the chain depends on how rigorously your services thread the parent hash. The discipline MPL imposes is: every envelope has a canonical hash, so the chain can be threaded. Without that, you’re correlating on timestamps and hoping clocks agreed.

”What was the quality score on this message?”

Each record carries the QoM block from whatever profile was applied. For a forensic query, the useful columns are qom.profile (which contract you measured against), qom.passed (did it meet the bar), and the per-metric values.

A pattern we see often: a downstream service complains that “the data from agent X is bad.” With a QoM column in the trail, you can answer concretely: of the last 10,000 messages from agent X on stype Y, 9,872 had schema fidelity 1.0 and groundedness above 0.9. Either the complaint is about the 128 that didn’t, or the complaint is about a metric we aren’t measuring, in which case the conversation is “what metric should we add to the profile.” Both conversations are more productive than “the data is bad."

"Did policy block this, or did it pass through?”

When the proxy is running in strict mode and a policy denies a message, MPL writes a record with valid: false and the policy that fired. The record exists even though the message didn’t propagate, which is the property you want for forensics — deny events are first-class.

The query is straightforward: valid == false over your window. The follow-up — why did it fail — is in the policy field. You can distinguish schema violations from policy violations from QoM threshold failures, because each writes a different reason code.

The pattern to watch for is a sudden spike in deny events on a single stype — almost always either a sender drifted or a schema got tightened without coordinating. Both are recoverable; you only see them if the deny events are recorded.

”Reconstruct the state of the registry at time T”

This one is subtler. The contract a message was validated against can itself change over time. A message that was valid in March might fail validation in June, not because the message changed but because the schema did.

MPL records the schema version (org.calendar.Event.v1) on every envelope. Combined with a versioned registry — which the MPL registry service is — you can recover the schema that was current at the time the record was written, and re-run validation against the historical schema. This is the difference between “is this record valid now” and “was this record valid at the time it was emitted.” The second is the question forensic queries usually want.

What to do tomorrow

If you don’t have a trail today: turn the proxy on in transparent mode and sink the records anywhere. You’ll catch the schema drift in the first week.

If you have a trail today but it’s a stream of free-form log lines: the gap is not volume, it’s queryability. The records have to be structured around stype, agent_id, intent, sem_hash, and timestamp for the questions above to be answerable. MPL gives you those columns by construction.

And if you have a structured trail and a compliance review coming up: practice the queries above before the meeting. The trail is only as good as your ability to run the queries on it, and the worst time to discover that your agent_id field has been empty for six months is during the review.


← All posts