loading
Loading.loading
Loading.AI coding agents fail in a small number of predictable ways: stale or wrong context, confident wrongness, agents colliding on the same files, cost blowups, reward-hacking the task, and leaving no audit trail. None of these is "the model isn't smart enough." Each is an engineering failure in the harness around the model, and each has a known fix you own.
Updated
Go deeper: read the full write-up on the blog.
Almost every failure lives in the layers around the model, not its raw capability: the context it reads, whether agents are isolated, the gates on its output, its token budget, whether the run is observable. A stronger model just makes a cleaner mistake faster. That's good news, because a harness is something you engineer, not a capability you wait on.
Stale context → a context layer that serves the current doc. Confident wrongness → types, tests, and a review gate that can say no. Agents colliding → isolation (worktrees) and scoped tasks. Cost blowups → context budgeting and orchestration. Reward-hacking → tests the agent can't edit. No audit trail → observability on every run. The model is maybe 10% of it.
or have us build it — same capability, the other door