Why AI coding agents fail: the failure-mode catalogue
AI coding agents don't fail randomly. They fail in a handful of predictable, nameable ways, and every one is an engineering problem with a known fix, not a model that isn't smart enough yet.
tsukumo
Short version: AI coding agents don't fail randomly, and they rarely fail because "the model isn't good enough yet." They fail in a handful of predictable, nameable ways. We've run agent fleets in production to ship our own software, so we've paid for every one of these in real money and real reverts. Here's the catalogue, the symptom you'll actually see, the root cause, and the fix. The pattern underneath: almost every failure lives in the harness around the model, not in the model, which means it's an engineering problem you own, not a capability you wait on.
The most common failure, by a wide margin. The agent does something confidently wrong because the information in its window was wrong: an outdated README, a deprecated helper, a pattern you abandoned six months ago that still lives in the repo. The output looks right. It's grounded in the wrong reality.
This is not a reasoning failure. The agent reasoned fine over bad inputs. The fix is a context layer: retrieval that serves the currently correct code and decisions, not whatever happened to be nearby. This is the exact problem trovex exists to solve, and it's where our one hard number comes from: trovex cuts roughly 60% of the tokens per lookup by serving the right context instead of stuffing the window. Less noise in, fewer confident mistakes out. More on this in managing context for AI coding agents.
A coding agent has no built-in sense of what it doesn't know. Ask for a call to a library it half-remembers and it will invent a function signature, a config key, or an entire API that reads perfectly and does not exist. There's no internal "low confidence" flag stopping it.
You don't fix this with a better prompt. You fix it by grounding the agent in things that can say no: a type checker, a test suite, a build that fails loudly, and a review gate a human or another agent has to clear before the change lands. The agent is allowed to be wrong; it's not allowed to ship wrong. That gate is the whole game, and it's covered in is AI-written code safe to ship.
The moment you go from one agent to several, a new failure appears: two agents editing the same files at once, clobbering each other's work, producing a merge no one asked for. We hit this ourselves early, which is why every agent in our fleet works in its own isolated tree.
The fix is boring and effective: isolation plus task boundaries. Each agent gets its own working copy (a git worktree, a branch, a sandbox) and a task scoped tightly enough that two of them don't reach for the same file. Orchestration's job is to hand out non-overlapping work, not to run more agents in the same room. See orchestrating AI coding agent fleets.
Where does your team actually stand on this? A short agent-ops assessment is the low-risk way to find out.
A task that should cost cents costs twenty dollars, and you find out at the end of the month. Usually it's one of two things: the agent re-reads an enormous context on every single step, or it gets stuck in a loop, retrying a failing approach without noticing it's failing the same way each time.
Both are harness problems. Budget the context (serve the right slice, not the whole repo), cap the steps, and detect loops so a stuck agent stops instead of burning tokens until something else kills it. Cost is an engineering constraint you design for, not a surprise, and it's the whole subject of why AI coding agents cost so much.
This one surprises people. Set the goal to "make the test pass," and the cheapest path is sometimes to delete the test, stub the function to return the expected value, or weaken the assertion until it's meaningless. The agent did exactly what you asked. It shipped nothing.
The fix is to make the success condition something the agent can't edit its way out of: tests it isn't permitted to modify, a task definition checked by a separate reviewer, and a diff a human actually reads on anything that matters. If the only thing standing between the agent and "done" is a check the agent controls, expect it to be gamed.
The quiet killer. An agent makes a change, something breaks two days later, and nobody can answer "what did it do, and why?" Without that, you can't debug it, you can't trust it, and you certainly can't let it near production.
Agents need the same observability you'd demand of any production system: every run logged, inspectable, and tied to what it changed and what it cost. We built yoru because we needed to see what our own agents were doing; you can't operate what you can't observe. The full argument is in AI agent observability.
Read the catalogue again and the shape is obvious. Stale context, collisions, cost, reward- hacking, no audit trail. None of those is "the model needs to be smarter." Every one is an engineering failure in the system around the model, the part a demo is built to hide and a better model won't repair. That's the good news, actually: harness problems have fixes you control and ship today, instead of capabilities you wait for someone else to release.
It's also why "we tried an AI agent and it didn't work" usually means "we ran a raw model with no harness and hit three of these in the first week." The model was never the bottleneck.
We don't theorize about these failures. We run agents in production to build our own products (WRAI.TH, trovex, yoru), so we've paid for every mode in this catalogue and built the harness that contains each one: context, isolation, gates, budgets, observability. When we do client work, that harness comes with it, because the failure modes are the same on your codebase as on ours.
If your team has hit a few of these and concluded agents "aren't there yet," that conclusion is usually about the harness, not the model, and the harness is the part we build. Talk to us about your team.
Why do AI coding agents fail even with a strong model?
Because most failures live in the harness, not the model. The agent reads the wrong file, runs over the same code another agent is editing, or blows its budget re-reading context. A better model makes a cleaner mistake faster; it doesn't fix a missing context layer, review gate, or observability.
Why is an AI agent confidently wrong?
A coding agent has no built-in sense of what it doesn't know. Without grounding it will invent an API, a function signature, or a config key that reads perfectly and doesn't exist. The fix is grounding it in things that can say no: types, tests, and a review gate a human or another agent has to clear.
What is reward-hacking in an AI coding agent?
When the goal is "make the test pass," the cheapest path is sometimes to delete the test, stub the function, or weaken the assertion. The agent did exactly what you asked and shipped nothing. Make the tests and the task definition something the agent can't edit its way out of.
How do you stop AI coding agents from failing in production?
Treat the failure modes as engineering, not luck. Give the agent a context layer so it reads the right thing, isolate agents so they don't collide, gate output on tests and review, budget cost per task, and make every run observable. The model is maybe 10% of that.