Why do agents fail with many tools?

Not because they cannot call the tool. They fail at planning a path through a large tool set when tools break mid-task. PlanBench-XL (2026) found agents are especially fragile when a failure carries no explicit error signal, or when recovery needs a longer alternative tool-use path. The agent cannot recover from a failure it cannot see, so it keeps planning against a tool that is already dead.

How many tools should an agent have?

Fewer than you think, and scoped to the task in front of it. The exact number is less important than the discipline: expose the smallest tool set the current goal needs, retrieve tools on demand rather than mounting all of them, and prefer one current answer over the whole catalogue. PlanBench-XL's 1,665-tool runs degrade hard, which is the case against mounting everything an agent might ever use.

What happens when a tool fails silently?

The agent keeps going as if it succeeded. A silent failure returns no error signal, so the planner has nothing to react to and builds the next steps on a result that never came. PlanBench-XL (2026) names this one of the conditions agents handle worst. The fix is to force explicit error signals and to log which tool failed and why, so the agent and you can both see it.

Do more tools make an AI agent better?

Q: Do more tools make an AI agent better?

No. More tools widen the planning space the agent has to reason over, and past a point that hurts. PlanBench-XL (Liu et al., 2026) tested 10 models across 1,665 tools on 327 retail tasks and found performance falls sharply as the environment gets harder to predict. GPT-5.4 dropped from 51.90% in clean conditions to 11.36% under heavy tool blocking. The tool count is a liability, not a feature.

tsukumo

Do more tools make an AI agent better? · tsukumo

Two ways to give an agent its tools

Property	Huge ecosystem, silent failures	Scoped set, explicit errors
Tools in scope per task	Everything mounted	Smallest set the goal needs
When a tool fails	Returns nothing, agent plans on	Throws an explicit error signal
Recovery path	Long, often never found	Short, the failure is visible
What you can review	A confident wrong answer	Which tool failed and why

More tools do not make an agent more capable

Do more tools make an AI agent better?

Why do agents fail with many tools?#

What happens when a tool fails silently?#

How many tools should an agent have?#

How do you make tool-use reliable?#

How we think about it#

How we run a 9-agent growth team on wrai.th (and what broke)

You cannot budget an AI agent the way you budget a server

pass@1 measures capability, not agent reliability

Want this running on your team?