Your agents' coordination failure happened before the pull request
A 2026 study found AI coding agents open PRs faster but get them accepted less often, and pull-request telemetry can't explain why. With a shared coordination log, the share of work that re-did a teammate's task fell from 78% to 0% and useful throughput more than tripled.
tsukumo
Short version: by the time an agent opens a pull request, the coordination already failed. Two agents claimed the same file. One quietly redid work another finished an hour ago. A third raced to close before the others noticed. The PR is the scene of the accident, not the cause. And PR-level telemetry, your reviews, your merge graph, your CI dashboards, can only ever show you the wreck. A 2026 study put numbers on this, and the most useful one is brutal: agents that didn't share a coordination log spent 78% of their effort redoing each other's work.
Autonomous coding agents now open PRs by the million. You'd expect that volume to convert into shipped work. The data says otherwise. In "Before the Pull Request: Mining Multi-Agent Coordination" (Sarkar, 2026), large-scale studies find agent PRs are produced faster but accepted less often. More throughput at the top of the funnel, worse conversion at the bottom.
That gap is the tell. If the problem were code quality, your review tooling would catch it and name it. It doesn't, because the failure isn't in the diff. It's in how the agents arrived at the diff, upstream, where they claimed tasks, divided the work, and collided. The PR records the outcome of that coordination. It records nothing about the coordination itself.
The study's framing is sharp: the missing signal lives in how concurrent agents claim, divide, and collide over shared work. Those are real, recurring failure modes, and most of them never reach a human's eyes because they resolve, badly, before anyone opens a tab.
The paper enumerates four that a coordination log makes recoverable, with provenance:
Conflicting edits. Two agents touch the same surface from different assumptions. The merge looks clean; the intent collided.
Lock starvation. One agent holds a claim it isn't working, and others stall waiting.
Redundant rediscovery. An agent re-derives, from scratch, something a teammate already solved.
Race-to-close. Agents sprint to finish first, shipping the fast answer over the right one.
Several of these are, in the authors' words, invisible in pull-request history. You can stare at the merge graph forever and never see lock starvation, because a lock that starved left no PR.
Here's the result worth pinning up. The study uses an open-source coordination substrate called grite that needs no central server and stores its records inside git itself, so its append-only, signed event log captures the coordination process directly. With that shared substrate in place, the share of work that merely re-did a teammate's task fell from 78% to 0%, and useful throughput more than tripled.
Share of agent work that re-did a teammate's task
Without a shared coordination log78 %
With a shared coordination log0 %
Source: Sarkar 2026 (arXiv:2606.19616)
Read 78% slowly. That's not a tuning inefficiency. In an unshared setup, the majority of the fleet's effort went into work a teammate had already done. The agents weren't worse at coding. They were blind to each other. Give them a shared, convergent log and the redundancy doesn't shrink, it disappears, and the throughput you were paying for finally shows up.
To be precise about whose number this is: grite is the study's tool, not ours. We cite the 78%→0% and the tripled throughput as independent evidence for the principle. It's the principle our own fleet is built on, not a claim about our software.
The reflex fix is a shared file: a TASKS.md, a coordination doc, a sheet the agents read and write. It fails in exactly the way you'd fear under concurrency. The study is blunt about it: a file-based tracker loses concurrent writes. Two agents update it at once, one write wins, the other vanishes, and now your coordination layer is itself a source of collisions.
The substrate in the paper doesn't have that hole. Every agent's copy of the log converges to the same state with no write silently dropped. That convergence is the whole game. A coordination layer you can't trust to record every claim is worse than none, because it gives you false confidence the agents are aligned when one of them is working from a half-written truth.
Where each multi-agent failure is visible
Failure mode
PR history
Pre-PR coordination log
Conflicting edits
hidden until merge
recoverable with provenance
Lock starvation
invisible
recoverable
Redundant rediscovery
invisible
recoverable
Race-to-close
invisible
recoverable
Dropped concurrent write
n/a
prevented by convergence
This isn't the observability conversation. It's upstream of it.#
If you've been investing in agent observability, good. But be honest about what it is. Post-run observability is forensics on what already merged: it tells you, after the fact, what broke once it was live. That's necessary. It is also the wrong layer for this problem, because the coordination failure happened before the run produced anything observable in the usual sense.
The same caveat applies to fleet ops in general. Running agent fleets in production gives you the operational surface, but the observability layer is necessary and insufficient without pre-PR coordination capture. You can watch every container and still miss the two agents that claimed the same task, because that event never became a metric you were scraping.
And the failures aren't novel. The deadlocks and handoff degradation we wrote about in going from developer to agent orchestrator are pre-PR failure modes, made visible the moment you have a log that records the claim and the handoff. MAST made the broader point earlier: multi-agent failures live in specification and coordination, not capability. The agents are fine. The seam between them is where the work leaks.
We run a growth fleet of agents to build and ship our own software, so we hit all of this in production before we read a paper about it. Our answer is wrai.th, the relay our fleet actually runs on: an owned, observable coordination layer where agents claim tasks, hand off, and message each other over an append-only log. It's our implementation of pre-PR observability, built on the same principle the study measures, capturing the coordination before it becomes a PR you can only autopsy.
I'll be straight about the evidence. The 78%→0% and the tripled throughput are the paper's, on the paper's tool. We don't have a published wrai.th number to put next to them, and I'm not going to invent one. What we have is the conviction, earned by getting it wrong first, that the coordination log has to be owned, signed, convergent, and mineable, or it's theatre. Where this fits the bigger picture: the multi-agent orchestration research all points at the same seam.
If you're scaling an agent fleet and your PRs are arriving faster but landing softer, don't tune the review. Look upstream, at the layer where the agents coordinate, because that's where your acceptance gap was born.
We map where your agents collide before the PR, and what an owned coordination layer should record.
Why do AI coding agents open PRs faster but get them accepted less often?
Because speed and coordination are different problems. A 2026 study (Sarkar, arXiv:2606.19616) found large-scale agent PR volume is high but acceptance is lower, a trust gap that pull-request telemetry can't explain. The reason sits upstream: concurrent agents claim, divide, and collide over shared work before any PR exists. The PR is where the damage surfaces, not where it happened.
What is a pre-PR coordination layer for AI agents?
It's an owned, observable place where agents claim tasks, hand off, and message each other over an append-only, signed event log, before any pull request opens. Unlike a PR dashboard, it captures the coordination process itself: who claimed what, when, and what collided. In the 2026 study that log was a mineable artefact for conflicting edits, lock starvation, redundant rediscovery, and race-to-close.
How is pre-PR coordination different from agent observability?
Post-run observability is forensics on what already merged. It tells you what broke after it shipped. Pre-PR coordination capture records how agents divided and collided over work before a PR existed, where the failure actually originated. Both matter, but an observability layer without pre-PR capture is blind to the most common multi-agent failures.
Does a shared coordination log actually reduce wasted agent work?
In the 2026 study, yes, by a large margin. With a shared coordination substrate, the share of work that merely re-did a teammate's task fell from 78% to 0%, and useful throughput more than tripled. Every agent's copy of the log converged to the same state with no write silently dropped, where a file-based tracker loses concurrent writes.