What AI agents actually cost, and where the money goes
The agent bill surprises teams, and the model price is rarely the reason. Most of the cost is context you pay to reprocess on every call: the same docs reread each session, the whole history carried forward. AI isn't mainly a cost-cutting play, but the waste inside the bill is real and mostly avoidable.
tsukumo
Short version: The agent bill comes in higher than teams plan for, and the model's per-token price is almost never the reason. You pay for context, every token of it, on every single call. The same handful of docs reread each session to figure out what's current. The whole conversation history dragged forward step after step. That is where the money goes. Two honest things follow. AI agents are not mainly a way to do the same work cheaper, the return is capability. And the part of the bill that is genuine waste, paying to reprocess context the agent never needed, is real, measurable, and mostly avoidable. This page maps the cost picture and links the evidence for each piece.
More than the sticker price implies, because the unit you actually buy is tokens of context, not answers. Every call sends the model some context to process and pays for all of it. The model's per-token rate is the small, visible number. The large, hidden one is how many tokens you push through on every call, and for an agent doing real work that number is big and it repeats.
So the first move in understanding agent cost is to stop looking at the model price and start looking at the token volume per call, multiplied by every call, every session, every agent, every teammate. That product is your bill.
Because the bill is driven by context volume, and the defaults waste it. Two patterns dominate.
The first is rereading. Every session, an agent rereads several files to work out which one is current, then answers from a guess, and you pay for that scan on every session, every agent, every teammate. The token cost of agents rereading your docs shows where that cost comes from and how to measure it on your repo.
The second is carrying everything forward. The full conversation history rides along on each step, so the agent reprocesses a growing pile every turn. The broader breakdown of what's actually driving your agent token bill, and which levers move it, is in why your AI coding agents cost so much.
Usually that is the wrong question, and starting there leads teams to the wrong expectations. If the goal is to do the same work for less money, agents tend to disappoint. The return that holds up is capability: production-grade output, and a more capable version of the team you already trust. AI isn't cheaper, it's capability makes that case in full.
Why does this belong on a page about cost? Because the frame changes what you optimize. If you adopted agents to be cheaper, every token feels like failure. If you adopted them for capability, cost discipline becomes a separate, tractable problem: not "how do I make this free," but "how do I stop overpaying for the capability I actually want." That second question has clean answers.
Stop paying for context the agent does not need. The waste is concentrated in rereading and in carrying everything forward, so the fixes target both: prune stale history, summarize old turns instead of keeping them verbatim, and serve one canonical document per query rather than making the agent reread and rank the whole pile to reconstruct an answer. The mechanics of selecting context live in context engineering for AI agents, the hub for that quadrant.
This is the principle our own tooling is built on, and we measured it. When an agent resolves a question to the one canonical doc instead of rereading the top candidate files, it spends about 60% fewer tokens per markdown lookup. That figure is measured on our own repo at equal task-success: a median of 69% across 26 queries, range 41 to 81%, judged by an LLM with no gold answers. We publish about 60% as the conservative floor, and you can run the same benchmark on your repo with one command. We benchmarked it and shipped the command has the method and the reproduction steps.
~60% fewer tokens
per markdown lookup when an agent reads one canonical doc instead of rereading candidate files, the conservative floor we publish
measured on our own repo at equal task-success: median 69% across 26 queries, range 41 to 81%, LLM-judged; reproduce it on your repo at tsukumo.ch/measure
Source: trovex token-cost benchmark, run uvx trovex bench
Put together, controlling agent cost is the same discipline as controlling context, viewed through the bill:
Measure tokens per call, not the model price. The model rate is the small number. Token volume per call, times every call, is the bill. Instrument that.
Cut the rereading. Serve one canonical, current answer per query instead of making the agent scan and guess every session.
Don't carry everything forward. Prune and summarize history so each step reads what it needs, not the whole transcript.
Adopt for capability, optimize for waste. Expect agents to buy capability, not to be free, then drive the avoidable token waste out of the bill.
This is the operating model we build with teams whose agent bill is climbing faster than the value, and the one we run ourselves: a canonical document layer our agents read instead of rereading the repo, and context selection treated as a cost decision. We shipped the measurement as trovex precisely because "your tokens are wasted" is a claim you should be able to check on your own repo, not take on faith.
If your agent token bill keeps climbing and you can't say where it goes, that's not a model to swap. It's a cost you haven't instrumented. That's the work we do with teams.
We find where your agents pay to reprocess context they don't need, measure it on your stack, and put the cost discipline in place that holds the bill down without losing the capability.
More than the model price suggests, because you pay per token of context on every call, not just for the answer. The dominant cost is usually the context an agent reprocesses each step: rereading the same docs, carrying the full history. The model's per-token rate is the small number; how many tokens you send it every call is the large one.
Why are AI coding agents so expensive?
Because the token bill is driven by context volume, not model choice. Agents reread the same files every session to work out what's current, and carry long histories forward step after step. You pay for all of it on every call. The waste is in how much context you send, which is an engineering decision, not a fixed cost.
Is using AI agents cheaper than hiring a team?
Usually that's the wrong frame. Treating agents as a way to do the same work cheaper tends to disappoint. The real return is capability: production-grade output and a more capable version of the team you have. Cost discipline matters, but as a way to stop overpaying for capability, not as the reason to adopt agents.
How do you reduce AI agent token costs?
Stop paying for context the agent doesn't need. Prune stale history, summarize old turns, and serve one canonical document per query instead of making the agent reread and rank the whole pile. On our own repo, resolving to one canonical doc cut about 60% of the tokens per markdown lookup at equal task-success.