We Taught the AI to Say 'I Don't Know': The Hallucination That Almost Cost a Client (and the RAG Guardrails That Stopped It)

Three minutes before the memo went out, a senior partner at a UAE law firm caught it. The RAG assistant had cited a precedent with perfect confidence: correct formatting, a plausible case name, the works. The case did not exist. Here is why that happens, what a production guardrail stack actually looks like, and how we test for the failure you cannot see. The uncomfortable part is the conclusion. If your RAG system cannot refuse to answer, it is not a research tool. It is a liability generator with good grammar, and the math on fixing that is not close.

Why RAG Systems Hallucinate Even With Source Documents

Most people assume retrieval-augmented generation fixes hallucination by grounding the model in real documents. It doesn't. And understanding why is the whole game if you want to build something that holds up inside a law firm or a clinic. Here is what actually happens. When you query a RAG system, the retriever returns the top-k chunks ranked by embedding similarity. If the right document isn't in the corpus — never ingested, chunked badly, or the question simply sits outside the domain — the retriever doesn't come back empty. It returns the closest thing it found. The model then receives that marginally relevant chunk and writes an answer that looks exactly like a correct one, because it learned from a world where legal citations and clinical references are produced fluently and with total confidence. The model cannot tell high-confidence retrieval from low-confidence retrieval. To the generator, both look the same. A 2025 Google research finding made this worse than we assumed. RAG paradoxically reduces abstention: more retrieved context makes models more confident, not more cautious, even when that context is only loosely related to the question. So the default behavior of an unguarded RAG system, asked something it cannot answer from its corpus, is to confidently invent a plausible-sounding fiction. In legal AI, that fiction is a case citation. In clinical AI, it is a dosage recommendation. Same mechanism, very different consequences.

The Guardrail Stack We Actually Deploy

There is no single guardrail that prevents hallucination in production. It takes a layered pipeline, where each layer catches a failure mode the others miss. Four gates do most of the work. The first is retrieval confidence thresholding. Every chunk the vector store returns carries a cosine similarity score against the query, and we reject any query whose top chunk scores below 0.70. The data backs that cutoff: at 0.70, roughly 98.6% of irrelevant queries fall below threshold while 88.9% of relevant ones clear it. Pushing to 0.80 backfires. The retriever starts rejecting valid queries, and the system quietly degrades into an ungrounded generator — exactly the thing you were trying to prevent. The second gate is mandatory verbatim citation. The system prompt forces the model to quote its source chunk directly whenever it makes a factual claim. If it can't quote, because the chunk doesn't actually say what it's about to assert, the generation fails on inspection. This isn't bulletproof. A capable model can hallucinate a quote that sounds like the chunk. But it stops most low-effort fabrication and, more importantly, it makes hallucination auditable instead of invisible. The third gate is abstention triggering. When the top similarity score drops below threshold, the system returns a structured non-answer that names the gap rather than papering over it. Short, firm, and logged every time. The fourth gate is human escalation routing. Queries we classify as high-stakes — contract interpretation, treatment decision support — get flagged before the response reaches anyone, not after the damage is done.

Testing What You Cannot See: Adversarial Corpus Evaluation

A RAG system that aces the questions it can answer tells you nothing about what it does when it can't. That second behavior is the one that gets you sanctioned, and most teams never test for it. The test that matters is the adversarial unanswerable set: questions with no correct answer anywhere in the corpus. We build these on purpose. Cases that aren't in the legal database, drug interactions absent from the clinical formulary, regulations not yet ingested. Then we measure whether the system correctly abstains instead of guessing. Our bar before go-live is 90% correct abstention on that adversarial set, and I'll be blunt about why the number is that high. Below 90%, the system fabricates at a rate that surfaces in client-facing use within weeks, not months. Correct abstention means returning the structured non-answer, not a confident wrong one. For faithfulness, we use the RAGAS framework, and 0.75 is the floor for production. Under that, users hit hallucinations or context drift often enough that it becomes operationally visible. Healthcare RAG systems in demanding deployments have reached 0.995 faithfulness. Legal systems land more realistically between 0.80 and 0.92, depending on corpus quality and how varied the queries are. One thing teams forget: the test set needs maintenance. As the corpus grows, the line between answerable and unanswerable moves. A question that was unanswerable in month one may be perfectly answerable by month three. Adversarial testing is a recurring discipline, not a launch-day checkbox you tick once and retire.

The Business Case Is Not Optional: Liability, PDPL, and the Cost of One Wrong Answer

U.S. courts have been quietly generating the dataset on what happens when this guardrail stack is missing. In February 2025, Morgan & Morgan attorneys were sanctioned after their internal AI tool inserted eight fabricated case citations into court motions, with financial penalties landing on three named attorneys. In May 2025, Ellis George LLP and K&L Gates faced roughly USD 31,000 in sanctions after 9 of 27 citations in a supplemental brief turned out to be wrong, at least two of them citing cases that did not exist. Legal analytics now track more than 1,000 court cases involving AI-generated hallucinations. In Q1 2026 alone, U.S. courts imposed over USD 145,000 in sanctions for AI hallucinations in legal filings. The UAE adds a regulatory layer on top of the liability one. UAE Federal Decree-Law No. 45 of 2021 — the PDPL — establishes in Article 18 the right to object to automated decision-making that carries legal or serious practical consequences. It also requires organizations to provide meaningful information about the logic involved and to allow human intervention on request. A RAG assistant that produces an unverified legal brief or clinical summary is making exactly that kind of decision, and PDPL Article 18 creates an obligation to explain it when someone asks. The Dubai Health Authority's AI Policy goes further for its sector: AI systems must support clinical decision-making, not replace it, with mandatory human oversight. Now weigh that against the cost of the fix. The guardrail stack described above is four to six hours of engineering configuration per deployment. On the other side of the ledger sits unlimited professional liability and live regulatory exposure. The engineering hours have never once been the expensive option.

هل لديك أسئلة حول إعدادك؟

نساعد الشركات الإماراتية الصغيرة والمتوسطة على بناء أنظمة ذكاء اصطناعي متوافقة ومحلية وفعّالة فعلاً. محادثة أولى مجانية.