How we ship our own product with a fleet of AI agents
Most 'we use AI' stories are an autocomplete in someone's editor. Ours is an org chart: a CTO agent, domain leads, a coordination layer, tickets claimed off a board, one isolated worktree per agent, and a review gate nothing skips. Here's how it actually runs.
tsukumo
Short version: when we say we build our software with AI, we don't mean autocomplete. We mean an engineering org made of agents. A coordinating agent breaks work into tickets. Specialist agents claim them off a shared board. Each runs in its own isolated git worktree so they never overwrite each other. Nothing merges without passing a review gate. And the humans operate the fleet, setting goals and reviewing, instead of typing every line. This is the part most "we use AI" stories skip, so here is how it actually runs.
What does "build with an agent fleet" actually mean?#
It means many agents working at once, coordinated like a team, not one agent prompted in a chat window.
The popular version of "we use AI to code" is a single developer with a copilot suggesting the next line. That is genuinely useful, and it is also not what we are describing. A fleet is several agents working concurrently on different parts of the same product, each owning a unit of work from ticket to reviewed change. The constraint that matters stops being how fast anyone types. It becomes how well the work is split, coordinated, and reviewed.
That difference is not cosmetic. It changes the org chart, the tooling, and the job description of the humans involved.
The fleet has a shape, and the shape looks a lot like a company.
A coordinating agent (think of it as a CTO role) holds the goals, breaks them into tickets, and decides what gets worked on next.
Specialist agents run domains: backend, frontend, infrastructure, data, content. Each has a focused brief and the context for its area.
A coordination layer lets them message each other, hand off work, and share memory, so a decision one agent makes is visible to the others instead of being re-derived and contradicted.
If that reads like an org chart, that is the point. The reliability of a fleet comes from the same things that make a human team work: clear ownership, a way to communicate, and a shared source of truth. We had to build a fair amount of our own tooling to get there, because off-the-shelf single-agent tools don't address coordination at all.
Agents pull tickets off a shared board rather than being handed tasks one by one.
Each unit of work is a ticket with a clear definition of done. An agent running the matching specialty claims it, moves it to in-progress, does the work, and marks it for review. This is deliberate. A pull model means the fleet stays busy without a human dispatching every item, and it makes the state of the work legible at a glance: what is pending, what is in flight, what is blocked, who owns what.
It also means throughput scales with how many agents you can keep usefully fed, which is an operating question, not a model question.
Isolation is what keeps parallel work from turning into a pile of merge corruption.
Each agent works in its own git worktree, a separate checkout of the repo on its own branch. Two agents editing nearby code never see each other's half-finished changes, because they are physically in different working directories. Their work only meets at merge time, on a branch, behind review. This one practice removes an entire category of failure that shows up the moment you run more than one agent against a shared checkout.
A review gate sits between any agent's work and the main branch, every time.
Before a change lands, it goes through a review pass: does it do what the ticket asked, is it correct, does it fit the standards of the codebase. Some of that review is itself done by agents pointed at the diff; the irreversible call to merge stays with a human or a trusted gate. The rule is simple and absolute: an agent's confidence is not a merge criterion. Passing the gate is.
This is the same reason we tell client teams that observability and review are the load-bearing parts of running AI in production. A fleet without a gate is just a faster way to ship mistakes.
Because the limit on a copilot is one human's attention, and the limit on a fleet is how well you orchestrate.
A better autocomplete makes one developer a bit faster. A fleet changes what a small team can take on, because the work happens in parallel and the humans spend their time on direction and judgment rather than mechanics. The catch, and it is a real one, is that a fleet is only as good as the operating model around it. Without coordination, isolation, and a review gate, more agents just means more chaos, faster. The upside is real, and so is the discipline it demands.
The job is to set goals and guardrails, decide what the fleet works on, review at the right altitude, and step in when an agent is confidently heading the wrong way. It is a genuine skill, and it is different from writing code by hand. It is also the exact capability we install when we work with a client team: not "here are some AI tools," but "here is how your developers operate a fleet in your environment, on your standards, and keep that capability after we leave."
We run our own product this way because we needed to before we could honestly help anyone else do it. If your team wants to get from copilots to an operating model that actually ships, that is the work we do. Talk to us about your team.
What does it mean to ship software with an agent fleet?
It means several AI coding agents work in parallel on the same codebase, coordinated like a team rather than prompted one at a time. Each owns a slice of the work, claims tickets off a shared board, and runs in its own isolated git worktree so they don't collide. Humans set direction and review, instead of typing every line.
How is this different from using Copilot or Claude in an IDE?
A copilot autocompletes for one developer in one editor. A fleet is an operating model: many agents working concurrently, coordinated, reviewed, and observed, shipping whole units of work. The bottleneck moves from typing speed to how well you can orchestrate and review. It's the difference between a power tool and a crew.
How do you stop multiple agents from colliding in the same codebase?
Two things: isolation and coordination. Each agent works in its own git worktree (a separate checkout), so their changes never overwrite each other mid-flight. A coordination layer assigns ownership, so two agents don't claim the same work. Conflicts surface at merge time, behind a review gate, where they're cheap to resolve.
Do AI agents replace your developers?
No. The developers become operators: they set goals and guardrails, review at the right altitude, and intervene when an agent is heading the wrong way. The skill shifts from writing every line to running a fleet that writes lines. That shift is the whole job, and it's the capability we install in client teams.