How do you actually make AI work for a dev team?

Run it as an operating model, not a tool. Point AI at the right tasks, keep change sets small behind real review, measure outcomes over output, serve trusted context, and raise codebase quality. The independent research shows these five levers, not the model choice, decide whether AI helps or hurts.

What is an AI operating model?

The set of practices around the tool that determine its results: which work it does, how that work gets reviewed and shipped, what context it runs on, and how you measure it. The model writes code; the operating model decides whether that code makes your team faster or slower.

Why isn't a better model enough to get value from AI?

Because the studies that found AI slowing teams down used frontier models. The bottleneck wasn't intelligence; it was oversized batches, weak review, missing context, and messy codebases. A smarter model amplifies whatever operating model it lands in, good or bad.

How does a team start fixing its AI operating model?

Measure first. Look at PR size, rework rate, and where AI is pointed. Then fix the cheapest broken lever, usually batch size or context. A scoped assessment maps which of the five levers is costing you most before you invest in the others.

tsukumo

How to make AI actually work for your engineering team · tsukumo

tsukumo

20 June 20264 min read

How to actually make AI work for your dev team

The research is clear that AI underdelivers by default. It's just as clear about why, and that points straight at the fix. Five operating levers separate the teams that get real gains from the ones that get rework. This is the model we install, and none of it is about a better model.

tsukumo

Short version: if AI is underwhelming on your team, the instinct is to wait for a smarter model. Don't. The independent research that found AI slowing teams down was already using frontier models. The problem was never intelligence. It was everything around the model: what it works on, how its output gets reviewed, what context it sees, and the state of the code it touches. Fix those and the same model that disappointed you starts paying off. That set of fixes has a name. It's an operating model, and it's the actual product.

We laid out the problem, with sources, in what the research says about AI coding agents. This is the other half: what the teams getting real value do differently. Five levers.

Lever 1: Point AI at the right work#

Stanford's data is blunt about this: AI delivered 35-40% gains on greenfield, low-complexity work and single digits on the complex, brownfield code most teams live in. So the first decision is Boilerplate, tests, migrations, scaffolding, the high-volume low-context work, is where the gains are real. The gnarly core a senior holds in their head is where AI burns time you thought you saved.

How to actually make AI work for your dev team

Lever 1: Point AI at the right work#

Lever 2: Keep batches small, and make review mean something#

Lever 3: Measure outcomes, not output#

Lever 4: Serve the agent trusted context#

Lever 5: Invest in codebase cleanliness#

What this looks like in practice#

What to do on Monday#

How we think about it#

Your developers feel faster with AI. The clock disagrees.

AI ships code by copy-paste. GitClear measured the bill.

AI helps most where you need it least

Want this running on your team?