The token cost of coding agents rereading your docs
Every session, your coding agent rereads several markdown files to work out which one is current, then answers from a guess. You pay for that on every session, every agent, every teammate. Here is where the cost comes from, and how to measure it on your own repo.
tsukumo
Short version: every session, your coding agent rereads several markdown files to work out which one is the current source of truth, then answers from a guess. You pay for those reads on every session, every agent, every teammate. Serve the agent the one canonical doc instead of the pile and a typical doc lookup drops from roughly 720 tokens to roughly 280, about 60% fewer tokens for the same answer. This post shows where that number comes from and how to reproduce it on your own repo.
Why does my coding agent reread the same docs every session?
Because it has no way to know which doc is current.
A real repo accumulates overlapping markdown: a deploy/runbook.md, an older wiki/old-deploy.md, an ops/postmortem-0420.md that mentions rollback in passing. When your agent needs to answer "how do we roll back a deploy?", it does what you would do without an index. It lists the candidates, opens the likely ones, and reads enough of each to decide which to trust.
It does this from a cold start every session, because the context window does not persist. So the same lookup gets paid for again tomorrow, and again by the next agent, and again by your teammate's agent on the same repo.
CLAUDE.md, AGENTS.md, and .cursorrules help a little. They pin a single static blob into context. But one blob goes stale, does not scale past a handful of topics, and cannot route a specific question to the specific doc and section that answers it. It is a sticky note, not an index. We went deeper on that in do AGENTS.md and context files actually help coding agents.
Let's price one lookup. The numbers below are illustrative, your repo and agent differ, but the shape holds and you can measure your own at the end.
Take the deploy-rollback question against the three candidate files above. Without an index, the agent reads enough of each candidate to judge which is canonical:
Without an index: tokens spent reading every candidate to find the current one
File
Read to decide
~Tokens
deploy/runbook.md (current)
relevant section
~280
wiki/old-deploy.md (stale duplicate)
enough to reject
~240
ops/postmortem-0420.md (stale)
enough to reject
~200
Total
~720
The agent spent ~440 tokens reading two files it ended up discarding. That is the tax. The cost is not the answer, it is the guessing.
With one canonical answer, the agent asks once and gets back a single pointer: deploy/runbook.md:42, marked canonical, updated 3 days ago, plus just the section that answers.
With one canonical answer: only the doc that answers enters the window
It is the average of that same arithmetic across many lookups, not a single cherry-picked one.
Per-lookup savings vary. A question with one obvious doc saves little. A question that touches a thicket of overlapping, half-stale files saves a lot. Across a real session, with many lookups of mixed difficulty, the reduction on doc reads lands around 60% for the repos we have measured.
Two honest caveats:
It is the doc-lookup portion of the bill, not your whole bill. Your agent also reads code, runs tools, and reasons. This cuts the markdown-rereading slice. On a doc-heavy repo with a lot of agent traffic that slice is large; on a tiny repo it is small.
The real number is yours, not ours. ~60% is representative, not a promise. The point of measuring is that you do not have to take the figure on faith.
You point a context server at your repo, and it does three things that map directly to the costs above.
One canonical answer instead of a pile. It indexes your markdown and exposes a single tool over MCP. Your agent asks a question in plain language and gets back the one current doc that answers it, as a path:line pointer with a freshness marker (canonical, stale, or duplicate), not a list of candidates to rank. The guessing step disappears.
Section-level reads. When the answer is two paragraphs, you hand back two paragraphs, not the 400-line file they live in. A short answer costs short-answer tokens.
A shared write path. Agents also save what they learn (an incident, a decision, the rollback steps that actually worked) back through one shared point. The next agent, and your teammate, read it instead of re-deriving it. That is one source of truth for a fleet of agents, by construction, with no copies left to drift.
This is the layer trovex handles, and it runs locally: vectors in SQLite, embeddings via ONNX, no cloud, no API keys, your code never leaves the machine. The mechanism is the same one behind managing context for AI coding agents; this post is just the arithmetic underneath it.
How is this different from RAG or dumping the repo?#
Three ways to give an agent context, and what each one costs
Approach
What the agent gets
Dumping the repo (repomix, files-to-prompt)
The whole corpus floods the window to use a fraction of it, the opposite of token-efficient
Plain RAG / context servers
A handful of candidate chunks ranked by similarity, no freshness signal, a smaller pile still left for the agent to rank
Canonical-answer serving
The one current doc with an explicit freshness marker, plus a write path to keep it canonical. The unit of output is an answer, not a candidate set
A stale chunk and a current one look equally relevant to a similarity score. That is the gap RAG leaves, and it is the gap that costs you on every lookup. We pulled the three patterns apart in MCP context patterns for coding agents.
Don't trust the ~60%. Reproduce it. trovex is open source and in public beta, so this runs locally in about a minute:
```bash uv sync uv run trovex index /path/to/your/repo uv run trovex serve # MCP at /mcp, dashboard at / ```
Point your agent (Claude Code, Cursor, Windsurf, Zed, any MCP client) at the trovex MCP endpoint, then work a normal session. Open the Savings tab. It shows would-have-read versus actual tokens on doc lookups: your real reduction, on your real docs.
If the number is small, your repo probably does not have a doc-rereading problem, and you should keep your setup. If it is large, you were paying that tax on every session.
We run this layer on our own products before we put it anywhere near a client codebase. When a team is rolling agents out and the doc-rereading tax is a daily line item, we fit the context and observability layers we run ourselves to your repo, then train your people to keep the bill honest as the fleet grows.
We map the doc-rereading slice of your bill, show what serving canonical context instead of rereading the repo would save on your codebase, and leave you with a number you can reproduce.
Why does my coding agent reread the same docs every session?
Because it has no way to know which doc is current. A real repo accumulates overlapping markdown, so the agent lists candidates, opens the likely ones, and reads enough of each to decide which to trust. The context window does not persist, so it does that from cold every session, and the next agent pays the same tax tomorrow.
How much does rereading docs actually cost?
On a single deploy-rollback lookup against three overlapping files, an agent reads roughly 720 tokens to find and trust the current one, of which about 440 are spent on files it discards. Served the one canonical section instead, the same answer costs about 280 tokens, roughly 60% fewer. Across a session of mixed lookups the reduction lands near 60% on doc reads.
Does a bigger context window fix this?
No. A bigger window does not make the reread cheaper, it makes it bigger, and the cost compounds across every session, agent, and teammate. A large window is also not a current one: it will hold three conflicting docs and answer from the wrong one. Window size does not fix correctness.
How do I measure the savings on my own repo?
Do not trust a representative number, reproduce it. Index your repo with a local tool that serves the canonical doc per query, run a normal agent session, and compare would-have-read versus actual tokens on doc lookups. If the gap is small your repo has no rereading problem; if it is large you were paying that tax every session.