Agency17 June 20264 min read
RAG in production over regulations and client documents
Production RAG over regulations isn't embed-and-retrieve. It's a layered pipeline (hybrid search, reranking, hierarchical summaries, graph context), and the failure that bites hardest is a silent embedding-dimension mismatch that returns confident garbage.

Short version: retrieval-augmented generation looks trivial in a demo and gets hard the moment the corpus is real. Over a fiduciary's regulations, internal procedures, and client documents, naive embed-and-retrieve returns passages that are semantically close and factually wrong. What holds in production is a layered pipeline: hybrid search, a reranking pass, hierarchical summaries, and graph context. And the failure that cost us the most was silent: an embedding-dimension mismatch that returned confident garbage without throwing a single error.
Why naive RAG fails on a real corpus#
Because pure vector search optimizes for "sounds similar," and regulations punish that.