The Verification Gap in Inference Billing

Verification requires evidence the verifier did not produce, cannot modify, and does not need permission to access. That is what the word means. Run the test against any usage-based invoice in your stack. Nothing passes.

A customer at an AI application company opens their monthly invoice. The number is twelve percent higher than what their internal usage tracker recorded. Their engineers pull logs. The provider's dashboard shows a different total. Both numbers came from the same vendor's systems. The customer's tracker was reading API responses. The provider's billing dashboard counts a category of tokens (reasoning tokens, in this case) that the customer's tracker did not know to count.

Nobody is wrong. The customer's tracker did what the documentation said. The provider's billing did what the rate card said. The two systems disagree about what was consumed, and the only way to settle the disagreement is for the customer to ask the provider to check the provider's own work.

This is not a billing bug. It is a property of how every usage-based invoice in the world is currently produced.

The Test

Set everything aside for a moment. Consider what the word “verification” actually requires.

Verification requires evidence the verifier did not produce, cannot modify, and does not need permission to access.

That sentence is analytic. It follows from what verification means. Each of the three failure modes (production, modification, access) makes the evidence inseparable from the party being verified, and inseparability is the negation of what verification is. To deny this, you have to argue that “verification” means something else. At that point you are using a different word for a different concept and the conversation has stopped being about verification.

Three contamination points. Production, modification, access. There is no fourth. There is also no way to remove any of the three without reopening the gap. Try.

Read the test again. Now read it as a question. Where in your stack does anything satisfy this for usage-based invoices?

What Fails the Test

Five things show up in any conversation about this. Each fails at a specific point.

Your billing platform fails Production. Stripe, Metronome, Lago, Orb, your in-house billing service. They are excellent at metering, rating, and producing invoices. They do that work well. But they are the vendor's system. The customer (the verifier) did not produce the record they are being asked to verify. The vendor did. When the customer questions the bill, the system being questioned is the source of the proof. That is not verification. That is asking the witness to serve as the judge.

Your dashboard exports fail Production too. A CSV is data. It is a serialized view of the same source the invoice came from. Selectively presenting that data, regenerating it, or modifying the underlying source affects every export downstream. The customer cannot tell. They received an artifact produced on demand by the system whose accuracy is in question.

Customer-side reconciliation fails Access. The customer can run their own tracker, count their own API calls, build their own evidence. But for inference, the customer cannot count tokens consumed inside the provider's model, GPU minutes consumed inside the provider's compute, or tool-use tokens that the API response did not return. The metering layer is inside the vendor's infrastructure. The customer has no parallel observation. To verify, they would need permission to see what the vendor saw. That is not verification either. That is permission.

Audit rights fail Access. The contract gives the customer the right to demand an audit when the discrepancy is large enough to justify the fight. Invoking that right routes the verification path back through the party being verified. The auditor sees what the vendor produces in response to the audit. The vendor controls timing, scope, and what is shown. Audit rights are useful protection against catastrophic disputes. They are not verification of routine ones.

Trust fails everything. Most B2B commerce runs on trust today, and a great deal of money changes hands successfully on that basis. That is the diagnosis, not a defense. Trust is what you have when you do not have verification. It works at the volumes and relationship scales where disputes resolve through goodwill, and it stops working at the volumes and contract scales where disputes resolve through procurement. A $500,000 monthly invoice is not a relationship. It is a contract. And contracts are read by procurement teams whose job is to verify what the contract says is true. “We trust our vendor” is not a verification statement. It is the absence of one. The argument of this piece is that the absence is starting to matter at the volumes where invoices actually live now.

If you accept this argument, you accept that nothing in your current stack verifies anything. If you reject it, you are using a different definition of “verification” than the word means.

It is written into the terms

This is not hypothetical. It is the ordinary structure of usage-based contracts. Take one published example, Groq's Services Agreement. On how usage is measured:

“Groq's metering or measurement tools ... will be used to determine Customer's usage of the Cloud Services or AI Model Services.”

And on disputes:

“Any payment dispute must be submitted in good faith before the date on which a payment is due. If Groq, having reviewed the dispute in good faith, determines that certain billing inaccuracies are attributable to Groq, Groq will either issue a corrected invoice or a credit.... Nothing in this Agreement obligates Groq to extend credit to any party.”

Groq Services Agreement, sections 5.2 and 5.4.

Read structurally, one party measures the usage, issues the invoice, and reviews any dispute about it, and the dispute must be raised before the customer has had a chance to verify the charge. This is not unique to Groq. Materially the same language appears across the major providers. It is simply what self-attestation looks like once it is written down. The provider here is acting in good faith, by the contract's own words, and that is the point: good faith is the assurance the structure offers. Verification is the assurance it cannot.

What Passes the Test

The test does not require new technology. It requires a specific structural property: the record cannot be controlled by either party after it is made.

Notice what the test does not require. It does not require the vendor's claim to be correct. It requires the claim to be testable by parties other than the vendor. The test is about transparency, not correctness. The vendor's claim may still be wrong. With the test passing, it is checkable.

Here is the architectural shift. Self-attestation does not go away. The vendor still records, because the vendor is the only party who can credibly observe events that happen inside their infrastructure. What changes is whether the record stays under the vendor's control after recording.

In the old world, the record is fluid. The vendor records, and afterward can edit, regenerate, or selectively present. The act of attesting is continuing. The answer can move between askings.

In the new world, the record is committed. The act of recording publishes the claim to a destination the vendor no longer controls. The record becomes evidence the moment it is made. The vendor still claims. The vendor no longer revises.

That is what passes the test. The verifier did not produce the record (the vendor did). It cannot be modified after creation (the destination does not allow it). It does not need the vendor's permission to access (the destination is not the vendor's). All three contamination points close.

The architecture works because of one structural fact: the party that records is not the party that holds the record. The role here is the role of a notary in the self-attested loop. A notary does not verify truth. A notary verifies that signing happened at a moment, and that act produces evidence the signer cannot retract. The vendor is the signer. The notary is somebody else. Self-notarization is not a thing in any legal system in the world. For the same structural reason, self-verification is not a thing in any verification architecture. The whole point of the notary is that the notary is not the signer.

Why the Test Will Be Required

The picture above is sharp today. The trajectory makes it sharper.

Inference rates already change frequently. They will change continuously. Inference routing platforms already select between providers minute to minute based on price, capacity, and latency. The endpoint is auction-based pricing in real time between machines. The conditions are already in place. Inference supply is uneven. Demand is volatile. Providers are competing on price. Buyers are increasingly automated procurement systems that select inference routes algorithmically. The same forces that pushed display advertising and cloud compute toward real-time auctions are pushing inference toward the same shape.

In a static rate card, the rate is knowable in advance. In a continuous-pricing model, the rate is at least logged after the fact, even if reconstructing it is painful. In a real-time auction model, the rate active for one specific request at one specific millisecond was the result of a price discovery that happened among multiple parties under conditions that existed only in that moment. A CSV export from any single party's logs cannot reconstruct that. An audit right cannot reconstruct that. Only a verified record produced at the moment of each auction outcome, by infrastructure that none of the auction participants control, can.

The verification gap stops being a two-party problem. It becomes an N-party problem. The architecture that solves it is the same. The need is sharper.

Procurement Teams Are Starting To Notice

SOC 2 became table stakes for enterprise SaaS through procurement, not regulation. A few large buyers required it. Smaller buyers copied. Within a few cycles, every B2B SaaS vendor needed a Type II report or deals stalled in security review. Nobody legislated this. The market did.

The same shape is forming in inference. The volumes are large enough that procurement teams are noticing the verification gap. The buyers are sophisticated enough to recognize the structural problem when it is named. The economics are real enough that even small drift compounds into meaningful money at enterprise scale.

Within twenty-four months, enterprise inference contracts will require independent usage verification as a standard procurement term, in the same procurement surface where SOC 2 Type II reports are requested today. The first wave will be Fortune 500 buyers who already have mature vendor security review processes and whose auditors are starting to flag self-attested billing as an open question. The second wave will be the mid-market, which copies enterprise procurement playbooks with a lag of roughly twelve to eighteen months. By the time the third wave hits, every enterprise inference invoice will be expected to prove itself.

Procurement teams will not need to know any of the architecture above. They will need to know one thing.

Can your invoice produce evidence the customer did not produce, cannot modify, and does not need permission to access?

When that becomes a contract requirement, the companies that already pass will close. The ones that do not will negotiate.

The verification gap is structural. The fix is structural. The market is going to discover this on its own timeline, and the companies paying attention now are the ones who get to shape what discovery looks like.