Why are my AI coding agents so expensive?

Usually context, not the model. Agents that reread large parts of the repo every session, carry bloated context into each call, and redo work after acting on stale context burn tokens fast. The model price is a small part; the waste is in how much context you pay for per call, multiplied across a fleet.

How do I reduce AI agent token costs?

Serve agents the current canonical answer instead of letting them reread the codebase, give each call only the context its task needs, and cut rework by fixing the context that makes agents confidently wrong. Measure cost per lookup so you can see what's working. A cheaper model is the last lever, not the first.

Does using a cheaper model lower agent costs?

A little, and often at the cost of more retries, which can erase the saving. The larger waste is paying to feed the same bulky context into every call. Fix that first; it cuts cost without trading away reliability.

How much can better context cut agent costs?

A lot, because context is the biggest line. As one concrete data point, serving agents the canonical answer instead of rereading the repo cuts the tokens per lookup by roughly 60% in our own tooling. Your number depends on your codebase, but context is where the savings live.

tsukumo

Why AI coding agents cost so much · tsukumo

tsukumo

Agency17 June 20264 min read

Why your AI coding agents cost so much (and how to cut it)

If your agent token bill keeps climbing, the model price usually isn't the problem. The waste is in how much context you pay for on every call. The real cost drivers, and the levers that actually move them.

tsukumo

The short answer

Mostly context, not the model price. The biggest driver is agents rereading the codebase every session instead of being served the current canonical answer; serving it instead cuts the tokens per lookup by roughly 60% in our own tooling. Bloated context and rework from stale context multiply the bill across a fleet.

Short version: when an agent token bill climbs, the model's per-token price is rarely the real problem. The cost is in context, how much an agent has to read, and reread, to do anything. The biggest driver is agents rereading large parts of the codebase every session instead of being served the current canonical answer. Then bloated context on every call, and rework when an agent acts on stale context and has to redo it. Run a fleet and each of those multiplies. The levers that actually cut the bill are about context and rework, not switching to a cheaper model.

Where the tokens go, and which lever moves them

Lever	What it attacks	Why it pays
Durable canonical context	Rereading the repo every session	Biggest line; serving the answer instead of the files cut tokens per lookup ~60% in our own tooling
Right-size each call	Bloated context stuffed in just in case	Less to ingest, and the signal stops getting buried
Cut rework	Confident mistakes from stale context	You stop paying twice: wrong work, then the redo
Measure cost per unit	Guessing at the wrong fix	You attack the real driver, not a hunch
Cheaper model	The per-token rate	Smallest line; can backfire when retries climb

Why your AI coding agents cost so much (and how to cut it)

The model price is the small number#

Driver 1: rereading the codebase#

Driver 2: bloated context on every call#

Driver 3: rework from bad context#

Driver 4: the fleet multiplies everything#

You can't cut what you can't see#

The order that actually works#

How we help#

AI agents in production: the four operating problems that decide it

Your AI agent didn't fail the deploy. It stopped itself.

How we run a 9-agent growth team on wrai.th (and what broke)

Want this running on your team?