Does adding more AI agents make a system perform worse?

It can, and scale matters more than task difficulty. A 2026 enterprise benchmark (arXiv:2606.20058) tested two orchestration approaches across 208 scenarios at up to 200 agents and found scale, not complexity, is the dominant performance factor. Both architectures held at small scale and degraded at enterprise scale. The cause was discovery noise: the overhead of each agent finding the right peer and context.

What is the main bottleneck in large multi-agent systems?

Agent-discovery noise, the overhead of agents finding the right peer and the right context to act on. The 2026 enterprise benchmark in arXiv:2606.20058 found this becomes the primary bottleneck at the 200-agent scale, while smaller fleets held up. It is a coordination problem, not a reasoning one, which is why a larger model does not fix it and a better orchestration layer does.

How do I fix multi-agent orchestration at scale?

Build the orchestration layer rather than swapping the model. The same benchmark showed a task-manager component (priority inference, related-event merging, preemption) cut high-priority queue latency by 14-75% and improved related-event correctness by over 20 percentage points at enterprise scale. The transferable moves are priority, scoped handoffs so agents do not broadcast to everyone, and a shared log for context.

How many agents can a fleet run before it needs orchestration?

Earlier than most teams expect. The 2026 benchmark grouped scales as Persona (under 10 agents), Department (20-80), and Enterprise (200). Both orchestration approaches held under 10 and degraded toward 200. Coordination overhead starts mattering in the Department tier, so the orchestration layer should be built before you cross 20 agents, not retrofitted once it stalls.

What breaks when you scale a multi-agent fleet?

tsukumo

What breaks when you scale a multi-agent fleet? · tsukumo

What actually drives the wall

Lever	Task complexity	Fleet scale
Was it the dominant factor in the benchmark?	no	yes
What it stresses	a single agent's reasoning	coordination between agents
The fix it points to	a bigger or better model	the orchestration layer
Gets worse as you add agents	roughly flat	compounds

Scaling an agent fleet breaks on discovery, not difficulty

The benchmark: scale beat complexity, cleanly#

Why complexity is the wrong thing to worry about#

The by-tier playbook: what breaks and what to build#

Persona tier (under 10 agents): build the habits while they're cheap#

Department tier (20-80 agents): discovery noise starts billing you#

Enterprise tier (200 agents): orchestration is the system#

What we build the coordination layer out of#

How we think about it#

How we run a 9-agent growth team on wrai.th (and what broke)

Your agents' coordination failure happened before the pull request

Fast and production-grade: how an agentic studio ships both

Want this running on your team?