Skip to content

2024

Beyond Basic RAG: What You Need to Know

The Real World of RAG Systems

📒 Picture this: You're a developer who just deployed your first RAG system. Everything seems perfect in testing. Then reality hits - users start complaining about irrelevant results, not being able to do "basic stuff" and occasional hallucinations. Welcome to the world of real-world RAG systems.

The Problem With "Naive RAG"

Let's start with a truth bomb: dumping documents into a vector database and hoping for the best is like trying to build a search engine with just a dictionary - technically possible, but practically useless.

Here's why:

  1. The Embedding Trap: Think embedding similarity is enough? Here's a fun fact - in many embedding models, "yes" and "no" have a similarity of 0.8-0.9. Imagine asking for "yes" and getting a "no" instead in a legal search 😅

  2. The Context Confusion: Large Language Models (LLMs) get surprisingly confused when you give them unrelated information. They're like that friend who can't ignore a app notification while telling a story - everything gets mixed up.

  3. Length Effect: Just like humans tend to get worse at noticing details the longer a story is, LLMs with large context windows get worse at noticing details the longer the information is.

The Three Pillars of Production RAG

1. Query Understanding 🎯

The first step to better RAG isn't about better embeddings - it's about understanding what your users are actually asking for. Here's the basics:

  • Query Classification: Before rushing to retrieve documents, classify the query type. Is it a simple lookup? A comparison? An aggregation? Each needs different handling.
    • NIT: Navigational, Informational, Transactional are the 3 very broad types.
  • Metadata Extraction: Time ranges, entities, filters - extract these before retrieval. Think of it as giving your students sample questions to help them pay attention to what's important in the exam (at query time) much better and faster.

Metadata Queries

The CEO of a company asks for "last year's revenue"

The CFO asks for "revenue from last year"

The CMO asks for "revenue from the last fiscal year"

Do all these queries mean different things? Not really. The asker role i.e. query metadata changes the query intent.

2. Intelligent Retrieval Strategies 🔍

Here's where most systems fall short. Instead of one-size-fits-all retrieval:

  • Hybrid Search: Combine dense (embedding) and sparse (keyword) retrieval. You can rerank using late interaction, use LLM as a reranker or even use both in a cascade. I can probably write a whole blog post on this, but tl;dr is that you can use a combination of many retrieval strategies to get the best of precision, recall, cost and latency.
  • Query Expansion: Don't just search for what users ask - search for what they mean. Example: "Q4 results" should also look for "fourth quarter performance."
  • Context-Aware Filtering: Use metadata to filter before semantic search. If someone asks for "last week's reports," don't rely on embeddings to figure out the time range.

3. Result Synthesis and Validation ✅

The final piece is making sure your responses are accurate and useful:

  • Cross-Validation: For critical information (dates, numbers, facts), validate across multiple sources at ingestion time. It's possible that your ingestion pipeline is flawed and you don't know it.
  • Readability Checks: Use tools like the Flesch-Kincaid score to ensure responses match your user's expertise level.
  • Hallucination Detection: Implement systematic checks for information that isn't grounded in your retrieved documents. Considering evaluating the pipeline using offline tools like Ragas

Real-World Example: The Leave Policy Fiasco

Here's a real story that illustrates why naive RAG fails:

The Leave Policy Fiasco

Company X implemented a RAG system for HR queries. When employees asked about leave policies, the system kept used the entire company's wiki -- including that of the sales team. And sales "ranked" higher because it contained similar keywords.

The result? The entire company was getting sales team vacation policies instead of their own 🤦‍♂️

The solution? They implemented:

  1. Role-based filtering

  2. Document source validation

  3. Query intent classification

Making Your RAG System Production-Ready

Here's your action plan:

  1. Query Understanding: Implement basic query type classification
  2. Ingestion: Extract key metadata (dates, entities, filters)
  3. Retrieval: Begin with metadata filtering
  4. Retrieval: Add keyword-based search or BM25
  5. Retrieval: Top it off with semantic search
  6. Synthesis: Combine results intelligently using a good re-ranker or fusion e.g. RRF
  7. Validation: Cross-check extracted dates and numbers
  8. Validation: Implement a RAG metrics system e.g. Ragas
  9. Validation: Monitor user feedback e.g. using A/B tests and adapt

Reciprocal Rank Fusion

Reciprocal Rank Fusion (RRF) is a technique that combines the results of multiple retrieval systems. It's a powerful way to improve the quality of your search results by leveraging the strengths of different retrieval methods.

But it's NOT a silver bullet.

The Challenge

Stop thinking about RAG as just "retrieve and generate."

Start thinking about it as a complex system that needs to understand, retrieve, validate, and synthesize information intelligently.

Your homework: Take one query type that's failing in your system. Implement query classification and targeted retrieval for just that type. Measure the improvement. You'll be amazed at the difference this focused approach makes.


Remember: The goal isn't to build a perfect RAG system (that doesn't exist). The goal is to build a RAG system that improves continuously and fails gracefully.

Your Turn

What's your biggest RAG challenge? Let's solve it together. Let me know on Twitter or email.

Meetup Parameters

This is based on organising GenerativeAI Meetups in Bengaluru, India. This is a living document and will be updated as we learn more.

Venue

  1. Date & Time & Duration: Choose suitable timing and duration. Pick something that works for your community. Consider weekday and weekend meetups both.
    • Example: 4:00-5.00pm start on a Saturday works great in BLR! Chennai did a GenAI meetup on Saturday morning -- since that city wakes up early, it worked well for them.
  2. Camera: Consider the requirements for A/V if planning to do talks, streaming, or recording? Who is going to record? What kind of camera do we need? GenerativeAI outsources this to Hasgeek. Camera set-up for meetups is reduced to iPhone capture, with a Pivo Pod and tripod.
  3. Format: Define the structure of the meetup - is it just for drinks? Will there be talks? Is food provided? Is it open or by invitation only? Are plus ones allowed? Examples: GenerativeAI has never served alcohol, we often have 1-3 talks with snacks and is open by invitation only. We also have a Code of Conduct that we share with attendees.
  4. Speakers: If there will be talks, secure 1-3 speakers in advance. It's also fine to have a meetup with no speakers.
  5. Security: Keep in mind some venues might require pre-registration for security. Different venues enforce security with varying degrees of strictness. Some venues didn't allow anyone without registration, while the other venues allowed folks we didn't want to attend to enter. Discuss this upfront with your venue's security incharge.

Theme

  • Select a theme. This is crucial as it shapes the shared identity of all attendees and influences the discussions they initiate with strangers
  • Narrower themes are better than wider ones e.g. "DevTools" is better than "Enterprise tools for Devs"
  • Choose a catchy name to attract more attendees -- your naming is the most important branding
  • There are icebreaker lists on the Internet which you can use for more intimate meetings
  • Name tags: I've tried to enforce these, but failed at the GenerativeAI Meetups -- they're great! I've seen them work well at other meetups.

Photo and Video Documentation

  • Encourage attendees to take photos, tag you, or send them to you for sharing on social media
  • Consider recording the talks and posting them on YouTube afterwards to provide value to the community and allow great talks to live on. We use Hasgeek for this.
  • Note that live-streaming is not a must, but if done, it can add an extra layer of engagement for those who can't attend in person.

Function, Industry, Geography: Career Framework

Your career is a combination of Function, Industry, Geography.

That's it.

That's the framework.

You can change one of these. Not all three at a time.

Why only one at a time?

If you want to change your function, you need to learn new skills.

If you want to change your industry, you need to understand the new industry and the skills required to be successful in it.

If you want to change your geography, you need to uproot your life and move to a new place.

But what if I want to change all three?

You can, but it's difficult and perhaps needs a lot of thinking cycles and internal conviction.

Cheat code: Higher Education

MS/PhD/MBA helps folk change 2 of these at a time:

  1. You might get into a PhD program which changes your function and industry both
  2. MS abroad program which changes your function and geography
  3. MBA program which changes your industry (e.g. IT services to consulting) and geography (e.g. India to US)

Clarity Helps a Lot

The more granular you get, the easier it is to decide convert the wants into actionable steps.

Painful: I want to be a Machine Learning Engineer.

Tolerable: I want to be a Machine Learning Engineer at a Big Tech company.

Acceptable: I want to be a Machine Learning Engineer at Google.

Good: I want to be a Machine Learning Engineer at Google in New York.

Great: I want to be a Machine Learning Engineer working on the problems around generating human-like speech at Google in New York.

What if I am not clear about what I want?

If you are not clear about what you want, that's totally fine.

You can start with the most granular level and work your way up.

Start with the job description of the role you want. Talk to at least 12 people who are in that role.

Ask them what they do on a day-to-day basis. Ask how they got to where they are today. Ask what they ask when they are hiring or interviewing.

People on the Internet call this thing informational interviews and there's plenty of decent advice out there.

  1. The Antidote to I'm Feeling Stuck? from Swanand
  2. Act Like You're 35 from Nirant