Introduction
Retrieval-Augmented Generation (RAG) is transforming how Saudi enterprises handle internal knowledge. But rushed implementations lead to hallucinations, security gaps, and user distrust. Here are the 7 mistakes we see most often — and how to avoid them.
Mistake #1: No Hallucination Testing Framework
The Problem
Many teams deploy RAG systems without systematic hallucination detection. Users lose trust after a few confidently wrong answers.
How It Manifests
- AI cites documents that don't exist
- AI mixes information from multiple sources incorrectly
- AI invents statistics or policy details
- AI gives outdated information as current
The Fix: Golden Set + Continuous Evals
- Create a Golden Set: 50–100 questions with verified correct answers
- Run automated evals: Test coverage, accuracy, and hallucination rate
- Set thresholds: Define acceptable error rates (e.g., <5% hallucination)
- Monitor continuously: Run evals weekly, not just at launch
Key Metric: Hallucination Rate = (Incorrect answers with high confidence) ÷ Total answers
Mistake #2: Skipping Access Controls (RBAC)
The Problem
RAG systems that index sensitive documents without RBAC let any user access any information. This violates PDPL principles and creates legal risk.
How It Manifests
- Junior employees access HR policies meant for managers
- Sales team sees pricing strategies meant for finance
- Contractors access internal memos
The Fix: RBAC from Day One
- Map document permissions: Mirror your existing folder/share permissions
- Scope by user role: Define what each role can query
- Enforce at retrieval: Filter results before showing to user
- Audit access: Log who accessed what and when
Key Metric: Access Violation Rate = (Unauthorized retrievals caught) ÷ Total retrievals
Mistake #3: Indexing Everything Without Scoping
The Problem
Teams try to index every document "just in case." This creates noise, slows retrieval, and introduces contradictory information.
How It Manifests
- Search results include outdated drafts
- Conflicting answers from different document versions
- Slow query times from bloated index
- Low-quality documents reduce overall accuracy
The Fix: Curated Source Selection
- Start with approved sources: SharePoint folders, official policies, SOPs
- Exclude drafts and personal files: Create clear inclusion criteria
- Set freshness rules: Auto-expire documents older than X months
- Iterate based on usage: Add sources when users request them
Key Metric: Source Utilization = (Documents actually cited) ÷ (Documents indexed)
Mistake #4: Ignoring Arabic Retrieval Quality
The Problem
RAG systems tuned for English often perform poorly on Arabic documents. Arabic's morphology, right-to-left text, and mixed Arabic-English content create unique challenges.
How It Manifests
- Arabic queries return English results
- Arabic documents are chunked incorrectly
- Arabic synonyms and variants are missed
- Mixed-language documents break retrieval
The Fix: Arabic-Specific Tuning
- Test on Arabic Golden Set: Verify retrieval quality on Arabic queries
- Tune chunking: Adjust for Arabic sentence boundaries
- Use multilingual embeddings: Models trained on Arabic
- Handle code-switching: Support Arabic-English mixed queries
Key Metric: Arabic Coverage = (Correct Arabic answers) ÷ (Total Arabic questions in Golden Set)
Mistake #5: No Feedback Loop for Continuous Improvement
The Problem
RAG systems degrade over time as documents change and user needs evolve. Without feedback, you don't know what's breaking.
How It Manifests
- Accuracy declines silently
- Users stop trusting the system
- New topics aren't covered
- Outdated answers persist
The Fix: User Feedback + Automated Monitoring
- Add thumbs up/down: Simple feedback on every answer
- Track "I don't know" rates: High rates indicate coverage gaps
- Review escalations: Learn from questions sent to humans
- Weekly evals: Compare current performance to baseline
Key Metric: User Trust Score = (Positive ratings) ÷ (Total rated answers)
Mistake #6: Missing Confidence Thresholds
The Problem
RAG systems that always answer — even when unsure — produce confident-sounding hallucinations. Users can't distinguish reliable from unreliable answers.
How It Manifests
- AI answers questions outside its knowledge
- AI guesses instead of escalating
- Users trust wrong answers
- Support burden increases from AI errors
The Fix: Confidence Scores + Escalation
- Calculate confidence: Score based on retrieval similarity and answer coherence
- Set thresholds: <70% → "I don't know" + escalation
- Show uncertainty: "I'm not sure, but..." for medium confidence
- Train users: Help them understand confidence indicators
Key Metric: False Confidence Rate = (Wrong answers with confidence >80%) ÷ (All answers with confidence >80%)
Mistake #7: No Version Control for Source Documents
The Problem
Documents change, but RAG systems often keep old versions indexed. This leads to outdated answers and legal risk from citing superseded policies.
How It Manifests
- AI cites old policy versions
- Conflicting answers from different versions
- Compliance risk from outdated guidance
- User confusion about "which answer is right"
The Fix: Source-of-Truth Management
- Establish authoritative sources: One folder/system per document type
- Version metadata: Track version numbers in index
- Retrieval preference: Always prefer latest approved version
- Expiration rules: Auto-flag old versions for review
Key Metric: Version Accuracy = (Answers citing current version) ÷ (All answers citing versioned docs)
Implementation Checklist
Use this checklist to audit your RAG implementation:
- Hallucination testing: Golden Set created and evals scheduled
- Access control: RBAC implemented and tested
- Source scoping: Approved sources defined, exclusions documented
- Arabic quality: Arabic Golden Set tested, tuning applied
- Feedback loop: User feedback collection active
- Confidence thresholds: Escalation rules defined and implemented
- Version control: Source-of-truth rules established
Conclusion
Enterprise RAG isn't just about connecting documents to an LLM. It's about building a trusted, accurate, and governed knowledge system. Avoid these 7 mistakes, and you'll deploy RAG that users actually trust.
LeenAI's OpsRAG pilot addresses all 7 challenges with a structured approach: scoped sources, RBAC, Arabic tuning, continuous evals, and an Acceptance Pack that proves readiness for production.