loading
Loading.loading
Loading.Fewer than you can mount. Past a small working set, adding tools lowers task success, because the agent has to pick the right one, fill its arguments, and sequence the call correctly, and every extra option is a new way to get that wrong. PlanBench-XL (arXiv 2606.22388) ran 10 models across 1,665 tools and watched accuracy fall from 51.9% to 11.4% as the tool space grew. Give the agent the few tools the task needs.
Updated
Go deeper: read the full write-up on the blog.
Every tool you mount widens the space the agent has to plan over. It reads more options and gets more chances to call the wrong one or mis-fill an argument. The work the agent does to choose grows faster than the value of having one more tool available, so past a small set the extra reach costs you accuracy.
PlanBench-XL (Liu et al., 2026; arXiv 2606.22388) tested 10 models across 1,665 tools on 327 retail tasks. Accuracy fell from 51.9% to 11.4% as the tool space grew and tools failed without a clear error. Agents are worst exactly when a failure carries no explicit signal, or when recovery needs a longer alternative path: the agent builds its next step on a result that never came.
Scope the mounted toolset to the task instead of exposing the whole catalogue, force tools to return explicit errors, and log which tool failed and why so the agent can react. It is the same discipline that fixes context bloat: serve the right slice, not everything. trovex does that for your docs (one current answer, ~60% fewer tokens per lookup); the tool equivalent is a scoped per-task toolset. tsukumo builds both with your team.
or have us build it — same capability, the other door