Agency1 July 20264 min read
A keyword search beats most AI agent memory
Agent memory looks solved until you put it in a real group chat. A 2026 benchmark found the strongest memory system scored 46.0% average accuracy, and a plain BM25 keyword search matched or beat most of them. Memory is a retrieval-quality problem, not a vector DB you bolt on.
Short version: Agent memory looks solved in a one-on-one demo and falls apart in a real group chat. A 2026 benchmark tested memory systems in multi-party conversations, the way actual deployments run, with multiple people talking to the agent and to each other. The strongest system scored 46.0% average accuracy. And a plain BM25 keyword search, the kind of retrieval that predates the entire agent era, matched or beat most of the dedicated memory systems. So the expensive memory stack you bolted on may not be earning its keep. Memory accuracy is a retrieval-quality problem, not a storage problem, and the complexity is mostly hiding that.
You have probably seen the impressive version: a single user, a long chat, the agent recalls something from twenty messages ago. Then you put it in a shared channel where four people talk past each other, and it starts attributing the wrong statement to the wrong person and answering from a fact that was updated yesterday. The benchmark is about that second world, which is the one production lives in.
