How do I monitor AI agents in production?

Monitoring an AI agent isn't uptime monitoring. Agents are non-deterministic and multi-step, so you trace each run: every step, tool call, input/output, token cost, and failure mode — and you alert on cost spikes, error-rate spikes, and silent wrong-answers, not just crashes. Treat an agent run like a distributed trace. yoru is an open-source, self-host observability layer for agent fleets (public beta).

Updated 19 June 2026

Why agent monitoring is different

Traditional APM watches a deterministic service: latency, errors, throughput. An agent run is a sequence of decisions — it picks tools, retries, loops, and can return a confidently wrong answer with a 200 and no error. So the unit of observability is the run/trace, not the request.

What to observe, per run

Capture the full trace — every step, prompt, tool call, and intermediate output, so you can replay why the agent did what it did. Token cost, per run and per step, because cost spikes are the earliest signal something's looping or re-reading. Failure modes — wrong tool, infinite loop, truncated output, hallucinated result, stuck retries — which rarely throw, so you detect them by watching outcomes. Latency per step, to find the slow tool or the runaway chain. And the outcome itself: did the run actually accomplish the task? Pair the trace with whatever gate verifies the output.

How to think about it

Capture inputs, tool calls, and outputs for every run; keep a human at anything irreversible; alert on cost and failure-rate, not just exceptions. The goal is to answer “what is my fleet doing, what's it costing, and where is it quietly failing?”

Where yoru fits

yoru is an open-source, self-host observability project for agent fleets — you run it yourself (public beta, in active development). It's the observability pillar of a self-host suite; there's no hosted version. Use the concepts above with whatever tools you have; yoru is one OSS option you can run on your own infra.

Straight answers.

Is agent observability just APM for LLMs?: No. APM watches a deterministic service; agent observability watches non-deterministic, multi-step runs where the failure is often a wrong answer with no error. The unit is the run/trace, plus token cost and outcome.
What should I log for each agent run?: The full trace (steps, prompts, tool calls, outputs), token cost per step, latency, and the final outcome — enough to replay the run and spot silent failures.
Do I need this for a single agent?: Less so. It pays off across a fleet or long-running agents, where cost compounds and silent failures hide.
Is yoru hosted?: No. yoru is open source and self-host — you run it yourself. (Public beta, in active development.)

Keep reading.