Why are AI coding agents unreliable in production?

Not because the model is bad, but because nothing around it is set up. By default an agent has broad access, no gate before its changes land, no record of what it did, and weak context, so it acts confidently on wrong information. Reliability comes from the engineering that constrains and checks it, not from the model alone.

Can you trust an AI agent with commit access?

Only with scoped permissions and a gate. An agent should touch only what its task needs, and its changes should pass a human review or an automated check before they merge, the same bar you'd hold a new engineer to. Public incidents where agents deleted data happened precisely where those guardrails were missing.

What makes a reliable agent different from an impressive demo?

A demo runs once, on a clean example, with a human quietly fixing what breaks. A reliable agent runs repeatedly, on your real codebase, inside your standards, with guardrails and observability instead of a person rescuing it. The difference is all the unglamorous engineering a demo is built to hide.

Do better or bigger models make agents reliable?

They help at the margin and don't fix it. A stronger model still needs scoped permissions, review gates, observability, and good context to be trustworthy in production. Reliability is an operating problem, not a model upgrade.

How to make AI coding agents reliable in production

tsukumo

How to make AI coding agents reliable in production · tsukumo

The demo you saw vs the agent you run

Property	Impressive demo	Reliable agent
Access	broad, standing	scoped to the task
Before merge	nothing	human review or a required check
When it breaks	a person quietly fixes it	a record shows what it did and what it cost
Context	a clean, curated example	current canonical answer about your codebase
Runs	once	repeatedly, on your real repo

How to make AI coding agents reliable in production

Why agents are unreliable by default#

The four things that make an agent reliable#

1. Scoped permissions#

2. Review gates#

3. Observability#

4. Context the agent can trust#

The operating model: a human at the gate#

How we do it#

AI agents in production: the four operating problems that decide it

Your AI agent didn't fail the deploy. It stopped itself.

How we run a 9-agent growth team on wrai.th (and what broke)

Want this running on your team?