More context isn't better: what the ETH Zurich AGENTS.md study means
A 2026 ETH Zurich study found that adding repository context files to coding agents usually makes them worse and more expensive. The lesson isn't 'no context.' It's the right context, which is a harder engineering problem than writing a long AGENTS.md.
tsukumo
Short version: there's a comforting idea that you fix a flaky coding agent by writing it a better AGENTS.md, a long, careful overview of your repo so it finally "understands" the project. A 2026 ETH Zurich study put that idea on a benchmark, and it lost. Adding repository context files tended to make agents less likely to succeed and reliably more expensive. We've been saying context is the operating problem for a while, so this is the part where we get to point at someone else's numbers instead of our own.
Attention is finite, and a context window is not free storage. When you paste a big project overview in front of every task, the few lines that actually matter for this change get buried under the lines that don't. This is the well-documented "lost in the middle" effect: models reliably miss information stranded in a long context.
So the agent reads more, wanders into files it didn't need, runs more steps, and pays for every token of it. You bought worse behavior and a bigger bill with the same document. The intuition that "the model will just use the parts it needs" is exactly the intuition the study breaks.
Here's the part people skip when they share the headline. The study didn't find that context is useless. It found that the obvious way to provide it, a comprehensive, auto-generated dump, backfires, while a short human-written file with only the real constraints helps a little.
That distinction is the whole game, and it's the one most teams get wrong. The failure mode is:
Static. A file written once and never updated is wrong the week after you write it. The agent then grounds itself in a stale description of a codebase that has moved.
Comprehensive. "Document everything" is the instinct, and it's the thing the study penalizes hardest. Coverage is not relevance.
Undifferentiated. The same overview is injected into every task, whether the agent is fixing a typo or refactoring auth. Most of it is noise for any given job.
A good AGENTS.md is short, current, and about constraints, not architecture tourism. But even a good static file is a blunt instrument, because the right context for a task isn't a fixed document. It's whatever is true and relevant right now, for this change.
If a static file is the wrong tool, the right one is retrieval: serve the currently-correct code and decisions for the task in front of the agent, scoped tightly and kept fresh, instead of a frozen overview that ages the day it ships. Less in the window, more of it relevant.
This is the exact problem trovex exists to solve, and it's where our one hard number lives: trovex cuts roughly 60% of the tokens per lookup by serving the right context instead of stuffing the window. The ETH Zurich result is, from where we sit, third-party confirmation of the thing we built a product around. Naive context costs money and success. Served context buys both back.
It also reframes a cost conversation we have constantly. Teams treat their agent token bill as the price of capability. A good chunk of it is the price of context done the lazy way, paid on every single run.
Stop growing your AGENTS.md. If yours is long, it's probably costing you. Cut it to the constraints a new senior hire would actually need, and nothing else.
Measure it. Run a set of real tasks with and without the file and look at success and cost. The study's method is reproducible on your own repo, and the answer might surprise you.
Treat context as a system, not a document. The durable fix is retrieval that serves the right slice per task, which is engineering, not authorship.
We run agents in production to ship our own software, so we paid for this lesson before there was a paper to cite. The teams that get agents working don't have better context files. They have a better context system: current, scoped, served on demand. The model is maybe 10% of a working setup. A surprising amount of the other 90% is just not drowning it in your own docs.
If your agents are slow, expensive, and confidently off, the context layer is the first place we look. Talk to us about your team.
Usually the opposite, per the ETH Zurich study. A long, auto-generated context file dilutes the model's attention and pushes it to explore more, which lowers success and raises cost. Short, human-written files that state only the real constraints can help slightly; comprehensive dumps hurt.
Why does more context make an agent worse?
Attention is finite. Fill the window with a big project overview and the signal the agent actually needs gets buried, the long-context "lost in the middle" effect. The agent reads more, wanders more, and pays for every token, so you get lower success at higher cost.
So should I give my coding agent no context?
No. The study's own result is that minimal, human-written context can give a small gain. The failure mode is naive context, the whole repo stuffed in, not context itself. The goal is to serve the right slice at the right moment, which is the hard part.
How do you give an agent the right context instead of all of it?
Retrieval, not stuffing. Serve the currently-correct code and decisions for the task at hand, scoped and fresh, instead of a static overview that ages the day it's written. trovex does exactly this, and cuts roughly 60% of the tokens per lookup by serving the right context rather than the whole window.