17 June 20263 min read
Why your AI coding agents cost so much (and how to cut it)
If your agent token bill keeps climbing, the model price usually isn't the problem. The waste is in how much context you pay for on every call. The real cost drivers, and the levers that actually move them.

Short version: when an agent token bill climbs, the model's per-token price is rarely the real problem. The cost is in context, how much an agent has to read, and reread, to do anything. The biggest driver is agents rereading large parts of the codebase every session instead of being served the current canonical answer. Then bloated context on every call, and rework when an agent acts on stale context and has to redo it. Run a fleet and each of those multiplies. The levers that actually cut the bill are about context and rework, not switching to a cheaper model.
The model price is the small number#
It's natural to look at the per-token rate and shop for a cheaper model. That's usually optimizing the small number. What you actually pay for is volume: how many tokens move through the agent to get a unit of work done. A cheaper model that needs more retries, or that you still feed the same bulky context, can cost more, not less. Start with the volume, not the rate.