Why does my coding agent reread the same docs every session?

Because it has no way to know which doc is current. A real repo accumulates overlapping markdown, so the agent lists candidates, opens the likely ones, and reads enough of each to decide which to trust. The context window does not persist, so it does that from cold every session, and the next agent pays the same tax tomorrow.

How much does rereading docs actually cost?

On a single deploy-rollback lookup against three overlapping files, an agent reads roughly 720 tokens to find and trust the current one, of which about 440 are spent on files it discards. Served the one canonical section instead, the same answer costs about 280 tokens, roughly 60% fewer. Across a session of mixed lookups the reduction lands near 60% on doc reads.

Does a bigger context window fix this?

No. A bigger window does not make the reread cheaper, it makes it bigger, and the cost compounds across every session, agent, and teammate. A large window is also not a current one: it will hold three conflicting docs and answer from the wrong one. Window size does not fix correctness.

How do I measure the savings on my own repo?

Do not trust a representative number, reproduce it. Index your repo with a local tool that serves the canonical doc per query, run a normal agent session, and compare would-have-read versus actual tokens on doc lookups. If the gap is small your repo has no rereading problem; if it is large you were paying that tax every session.

The token cost of agents rereading your docs

tsukumo

The token cost of agents rereading your docs · tsukumo

Without an index: tokens spent reading every candidate to find the current one

File	Read to decide	~Tokens
deploy/runbook.md (current)	relevant section	~280
wiki/old-deploy.md (stale duplicate)	enough to reject	~240
ops/postmortem-0420.md (stale)	enough to reject	~200
Total		~720

Served	~Tokens
deploy/runbook.md § "Rolling back a deploy"	~280
Total	~280

Three ways to give an agent context, and what each one costs

Approach	What the agent gets
Dumping the repo (repomix, files-to-prompt)	The whole corpus floods the window to use a fraction of it, the opposite of token-efficient
Plain RAG / context servers	A handful of candidate chunks ranked by similarity, no freshness signal, a smaller pile still left for the agent to rank
Canonical-answer serving	The one current doc with an explicit freshness marker, plus a write path to keep it canonical. The unit of output is an answer, not a candidate set

The token cost of coding agents rereading your docs

Why does my coding agent reread the same docs every session?

What does the rereading actually cost?#

Where does the ~60% number come from?#

How do you fix it?#

How is this different from RAG or dumping the repo?#

How do I measure this on my own repo?#

How we help#

How we run a 9-agent growth team on wrai.th (and what broke)

The canonical-doc layer the 7 agent-memory types miss

One source of truth for a fleet of coding agents

Want this running on your team?