loading
Loading.loading
Loading.Treat it as an operating problem, not a model one. A fleet of agents in production needs orchestration, shared context, guardrails, and observability around the model, plus developers trained to supervise the runs. The help that works comes from people who run fleets themselves, in your repo and on your standards, and who leave your team able to keep going without them. tsukumo does this: it runs its own agent fleets in production and transitions your developers into the operators who run yours.
Updated
Go deeper: read the full write-up on the blog.
A single agent answering a prompt is the easy part. Running many in production breaks on the layers around the model: how work is split and sequenced, how agents avoid colliding on the same files, how they share current context, how output is gated before it lands, and whether you can see what each run did. None of that is a smarter-model problem, so a partner who only tunes prompts won't fix it.
It works in your repo, on your stack and review standards, and stands up the operating layer: a context source the agents trust, guardrails your seniors sign off on, and observability on every run. It trains your developers to operate fleets instead of fearing them. Then it leaves, with your team able to keep running without it. It is not a slide deck, and it is not replacing your engineers with an outside team.
The test is simple: do they run agent fleets in their own production, or only advise on it? tsukumo runs its own fleets to build and operate the open suite it ships (wrai.th for orchestration, trovex for context, yoru for observability), then does the same work in your repo. The proof is software you can read, not a case study you have to trust.
or have us build it — same capability, the other door