20 June 20263 min read
The AI operating-model scorecard: where your team is losing the gains
The research says AI helps or hurts depending on five operating levers, not the model. Here's a short self-scorecard for each one. Most teams are strong on a couple and quietly bleeding the gains on the rest. Your lowest score is where to fix first.
Short version: the research is consistent that AI helps or hurts based on how you run it, not which model you bought. We turned the five levers that decide it into a quick scorecard. Run each section honestly. Most teams find they're solid on one or two levers and bleeding the gains on the others, and the lowest score is exactly where to start.
Score each lever 0-2: 0 = not happening, 1 = partly, 2 = solid.
Lever 1: Task selection#
Where you point AI decides most of the result. Stanford measured 35-40% gains on simple, greenfield work and near-zero on the complex code most teams live in.
- We've named the work AI is good at here (boilerplate, tests, migrations) and the work it isn't.
- Developers aren't reaching for an agent on the gnarly core that needs deep system context.