Blog

Why agent-to-agent calls need contracts, not just JSON

2026-05-06 · Skelf-Research · contracts · stypes · protocol

The default way two agents talk to each other today is the same way two microservices talked to each other in 2014: a POST, a JSON body, a hopeful handler at the other end. The handler reads a few fields, ignores the ones it doesn’t recognise, occasionally throws a 500, and the whole thing mostly works until it doesn’t.

That arrangement has a name: it’s an implicit contract. The shape of the payload, the meaning of each field, the units, the enumerations, the optionality — all of it lives in the head of whoever wrote the sender and whoever wrote the receiver. When those are the same person on the same afternoon, this is fine. When they are two different agents, generated by two different prompt chains, on two different days, it is the place your incident postmortems are going to come from.

”JSON Schema is enough” is not enough

The honest objection is: we already validate with JSON Schema. Good. JSON Schema covers the structural layer — “is this an object with a title string and a start ISO timestamp?” That is necessary and not sufficient. A contract for an agent message has to answer four questions, not one:

  1. What is this thing? Not the shape of the bytes; the semantic type. org.calendar.Event.v1 is not the same as org.communication.Invite.v1, even if their payloads look similar. The type identifier is what lets you route, lets you key audit records, and lets you evolve.
  2. What does it have to contain? This is where JSON Schema does its job.
  3. What does it have to be true about? Cross-field invariants (“end is after start”), domain rules (“priority critical requires assignee”), external references (“account_id resolves in this tenant”).
  4. Who said it and why? Provenance: which agent emitted this, under what intent, in response to what upstream call. Without this, the payload is anonymous.

MPL bundles all four into an envelope: an stype identifier, the payload, a sem_hash, and a provenance block. The payload alone is one of four fields. The other three are how you stop hoping.

The two failure modes

The cost of skipping contracts shows up in two distinct failures, and teams usually only notice the first.

The loud failure is the malformed payload. An agent generates "priority": "P0" when the receiver expected "priority": "high". The handler raises, the trace shows up in your error tracker, someone fixes the prompt or adds a synonym map. This is annoying but cheap: the system told you it was broken.

The quiet failure is more expensive. An agent generates a payload that parses, validates against a permissive schema, passes through three more hops, triggers a real-world side effect — a calendar invite sent, a ticket assigned, a payment authorised — and only later does someone notice the description field was hallucinated, or the attendees list silently dropped half the people, or the amount was in the wrong currency unit. There is no exception in your logs. The only artefact is the outcome, and by then it is downstream.

The quiet failure is the one contracts are really for. Tightening the schema reduces it. Tightening the schema and attaching a quality score and anchoring it to a content hash gives you something you can go back to and check.

What an “stype” buys you that a route doesn’t

A REST route says “this is a POST to /calendar/events.” An stype says “this payload claims to be a org.calendar.Event.v1.” Those two statements differ in three useful ways.

First, the stype is carried with the message. Once it leaves the sender, it doesn’t need the URL context to mean something. An auditor reading a log line a year later can recover the contract by lookup. A router can dispatch on it. A downstream agent that receives the same envelope through a different transport can still validate.

Second, the stype is versioned. v1 and v2 of the same logical type can coexist. Senders that have been upgraded can emit v2; older receivers reject it cleanly instead of misinterpreting it. The version is part of the identity, not a parallel header that might or might not match.

Third, the stype is namespaced. org.calendar.Event.v1, org.finance.Invoice.v1, org.healthcare.LabResult.v1 live in distinct trees with distinct ownership. Policy can match a glob — org.finance.* — and apply rules to a whole domain at once, without a code change in every service that handles finance messages.

”But our agents will produce whatever they produce”

The pushback we hear most often is: the LLM is going to generate what it generates; you can’t make it conform. That is true and also beside the point. The contract layer doesn’t make the model produce valid output. It tells you, at the boundary, whether the output it produced is valid. The decision of what to do next — retry, repair, escalate, drop — is yours. What you get is the signal, with a timestamp, a hash, and a quality score.

In transparent mode, the proxy doesn’t even block anything. It logs. That alone is usually enough to surface the prompt drift, the silent schema breaks, the agents that have been emitting subtly wrong payloads for three weeks. You graduate to strict mode when you trust the contracts more than the senders.

The cost of “we’ll add it later”

There is a predictable arc to agent systems that skip contracts and add them later. It goes: ship the prototype, ship the next prototype, glue them together, notice the first quiet failure, write a one-off validator for one endpoint, notice another quiet failure, write another one-off validator, eventually pull all the one-offs into a “schemas” repo, discover the schemas don’t agree with each other, refactor the schemas, discover the schemas drift from the actual senders, write a “schema drift checker,” and at some point realise what you’ve built is the front half of a protocol layer, badly. The work that would have been a few days at the start is now a six-month migration.

The case for adopting contracts before you hit that arc is not that the prototype is wrong. It’s that the value-of-information of “we know what shape these messages are supposed to be” compounds. Each agent you add to the fabric raises the cost of not having that information. Each external integration raises it further. Each compliance review raises it more.

The cheap moment to introduce the contract layer is the first moment you have two agents that talk to each other. Past that, it’s a refactor.

What we ship

The stype envelope is the smallest thing MPL adds to your stack. The rest — quality metrics, policies, audit records — is what you can do because every message has a typed, hashed, attributed contract attached to it. None of that works if the wire format is “anonymous JSON.”

If you have agents talking to agents today, the question to ask is not “do my messages validate against a schema.” It is: if a regulator asked, could you produce the payload, prove it wasn’t tampered with, identify the agent that emitted it, and show the contract it claimed to satisfy? If the answer involves grepping log files and crossing your fingers, you don’t have contracts. You have JSON.


← All posts