How much do AI coding agents actually cost to run?

The model API bill is the cost you see; the one that surprises you is context. An agent re-reads and re-derives your codebase every session, so most of the spend is tokens paid to re-learn what it already knew. Control it by giving agents a durable source of truth instead of re-reading the repo, and by making cost observable per task, so you tune a number, not a guess.

Updated 19 June 2026

Go deeper: read the full write-up on the blog.

Where the spend actually goes

Not the clever output, the context. Every agent, every session, re-reading the same files to rebuild understanding it had yesterday. A 2026 study of agentic coding runs (Bai et al., arXiv:2604.22750) found input tokens, not output, drive the cost, and the same task can swing up to 30x in tokens from one run to the next. At fleet scale that repetition is the bulk of the bill, and it's invisible until you measure per-task cost.

The fix is a context layer, not a cheaper model

Serve agents the current, canonical answer about your codebase on demand (this is what trovex does, a measured ~60% fewer tokens per lookup at equal task-success; methodology at trovex.dev/measure) instead of having them re-derive it. Cheaper models help at the margin; cutting re-derivation moves the number.

Make cost a dial, not a surprise

Observability per task (what each run cost and produced) turns spend into something you tune. Without it, you only learn the bill at the end of the month and can't attribute it.

Straight answers.

Is a bigger model cheaper overall?: Not usually. The driver is how much context the agent re-reads, not the per-token price. A cheaper model that still re-derives the repo every session can cost more in total.
What's the ~60% number?: Two ways, one honest answer. Modeled: the tokens an agent avoids by reading one canonical doc instead of re-reading the repo, about 60% fewer per lookup at the median (~38-79% by repo). And now measured at equal task-success: a median 69% across a 26-query benchmark on our own repo (range 41-81%, LLM-judged, losses disclosed). We headline a conservative ~60%. Full methodology and a reproduce-it-yourself harness at trovex.dev/measure; trovex also ships a savings view to check the number on your repo.
How do we even see agent cost?: Instrument per-task: cost and outcome per run. That observability is the difference between tuning spend and guessing at a monthly bill.

Keep reading.