How much agent compute is wasted on runs that fail?

A lot of it is spent after the failure was already detectable. In a 2026 study on 165 GAIA traces, among failed runs that triggered an early warning, 58.1% of tokens were spent after the first warning on average. The run was visibly in trouble and kept spending, which is exactly the compute early intervention can recover.

Can you tell an agent run is failing before it finishes?

Often yes, from the trace rather than the final answer. The 2026 study converts execution events into online signals for loops, budget pressure, low information gain, and tool instability. Those fire while the run is still going, before final-answer evaluation can explain what went wrong, which is what makes early intervention possible.

What are the early warning signs of a failing agent run?

In the study, four online signals: the agent looping, budget pressure as it burns through its cap, low information gain where new steps add nothing, and tool instability. Each is computed from the structured trace as the run proceeds, so a run accumulating them is flagged before it reaches a final answer.

Does catching failures early actually save tokens?

In a small pilot, yes. On a 10-task set, using the warnings to diversify the search or require evidence cut the fraction of tokens spent after a warning from 0.638 to 0.304. It's a 10-task pilot, so treat it as directional, but the mechanism is clear: act on the warning instead of letting a doomed run finish.

tsukumo

Catching wasted agent compute early (58.1% is spent after the warning) · tsukumo

tsukumo

Agency3 July 20264 min read

Your failing agents waste most of their tokens after the warning signs

When an agent run is going to fail, it usually shows signs early, then keeps spending. A 2026 study found that on warned failed runs, 58.1% of tokens were spent after the first warning. That's compute you paid for on a run you could have already known was doomed.

tsukumo

Short version: When an agent run is headed for failure, it rarely fails out of nowhere. It shows signs, loops, burns budget, stops making progress, and then keeps spending tokens until it gives up or returns something useless. A 2026 study put a number on the waste: on failed runs that had triggered a warning, 58.1% of the tokens were spent after that first warning, on average. More than half the cost of a doomed run is incurred after the point you could have known it was doomed. The fix is not a better final-answer eval. It is watching the run as it happens and acting on the warning.

You have paid this tax. An agent goes in circles, re-runs the same failing tool, drifts off the task, and your token meter keeps climbing the whole time. At the end you get a bad answer and a bill for the full run, including the long tail after it was clearly lost.

Final-answer monitoring vs failure-aware observability

What it watches	Final-answer evaluation	Failure-aware observability
When it knows	After the run ends	While the run is still going
What it sees	The output	Loops, budget pressure, low info gain, tool instability
What you can do	Log the failure	Intervene before the tokens are spent
The waste	Paid in full	Recoverable

Your failing agents waste most of their tokens after the warning signs

Where does wasted agent compute actually go?#

Can you tell an agent run is failing before it finishes?#

What are the early warning signs of a failing run?#

Does catching it early actually save tokens?#

How do you wire failure-aware observability?#

How we run a 9-agent growth team on wrai.th (and what broke)

Your most accurate agent setup is the wrong one to ship

When your agents fail, can you tell which one did it?

Want this running on your team?