why agent-to-agent economy will explode

6 giugno 2028

every week a new language model comes out, more powerful than the last. we see it with gpt 5.6 and with the mythos/fable class (assuming we still have access). what almost nobody looks at is the price. and here’s a misconception worth dismantling right away: the cost per token is collapsing, not rising. gpt-4-level performance cost around twenty dollars per million tokens in late 2022. today the same thing costs a few cents. it’s one of the fastest price drops in the history of computing. so problem solved? no.

this is exactly where it gets interesting. because while the cost of a single token plummets, total spend explodes. the reason is simple: an agent doesn’t make one call, it makes dozens. it reasons, corrects itself, calls tools, verifies, second-guesses. an agentic task burns 5 to 30 times more tokens than a single question to a chatbot. consumption grows faster than the price falls. so the bottleneck is no longer how much a token costs.

it’s predictability. it’s knowing, before you launch a workflow, what it will actually cost you. and there’s a second problem, twin to the first: we use enormous models for tasks that don’t call for them. horizontal models, trained to know a little of everything, put to work on something vertical and repetitive. it’s like keeping a whole orchestra to play a singl, specific note. big models, on top of that, tend to overthink. they burn hundreds of tokens reasoning over things that don’t deserve it, and they make cost impossible to estimate. the direction i see is this:

big models for decisions, orchestration, architectural design (obviously under the control of human taste and common sense).

small models, light, a few billion active parameters per token, but extremely specific, for everything else. except nobody can build all of them themselves. vertical cases number in the hundreds of thousands. no company has the resources to train and maintain thousands of different capabilities. it’s division of labor, exactly like in the economy we already know: you don’t grow the wheat, you don’t weave the clothes, you don’t smelt the steel. you buy them from whoever does them better than you. that’s why i believe in the birth of an economy between agents. agent-to-agent.

i picture it like this: hundreds of thousands of developers and companies making their own vertical agents available to the world, for a few cents per call. imagine someone building an ocr agent specialized in reading handwritten medical prescriptions in italian. they publish it. anyone, anywhere, can use it for two cents; instead of taking a guess with a generalist model that gets that specific task wrong. and imagine a company that needs fifteen hundred vertical capabilities, and that above all needs reliability and predictable costs. the interaction of the future, in my view, looks like this: i give my agent a wallet in usdc. it hires more specialized, more efficient sub-agents for the pieces of work it isn’t worth doing in-house, and it focuses on what matters.

sound great, right? one problem remains. the biggest one.

reliability.

how do i know an agent will produce an output i can trust? how do i know the payment actually arrived? it’s to answer this that i’m building an open-source protocol: [swarmwage] (https://github.com/Swarmwage/swarmwage).

i’ll be honest: today it doesn’t yet have reliability data on many agents, for the simple reason that there still aren’t many agents out there selling their compute. it’s a piece of infrastructure that arrives a moment before the market it’s meant to serve. i’m aware of that. but i’m convinced it’s a necessary piece. without a way to measure reliability and guarantee payment, the economy between agents doesn’t scale. it stays an experiment, and we’ve already seen other experiments, like moltbook.

if you feel like looking at the specs and tearing them apart, serious criticism is welcome. you can drop a comment in the HN thread. and if you have an agent that does something useful, try running a paid call through the facilitator: that’s exactly the signal i’m looking for.

for anyone in milan, the coffee’s on me. let’s talk about where this is going.

in the next days i’ll write a more technical post, where i explain the architectural choices (eip-3009 on base, x402, the facilitator). that’s the funny part.