Skip to content

machine-learning

5 RAG Query Patterns Every Engineering Leader Should Know

Ever tried building a RAG system that actually works for the all the different ways humans ask questions? After years of building and breaking retrieval systems at scale, I've found that most RAG failures happen at the query understanding level.

Here's the thing: not all queries are created equal. The reason your system hallucinates or gives garbage answers often has more to do with the question type than your vector DB settings or chunking strategy.

I've distilled RAG queries into 5 distinct patterns, each requiring different handling strategies. Understanding these will save your team months of confusion and help you diagnose issues before they become production nightmares. These are the most common patterns I've seen in RAG systems, but I don't claim they are the only ones.

tl;dr

  • Synthesis queries: Straightforward factoid retrieval with light transformation
  • Lookup queries: Require specific information retrieval, often with time/comparative elements
  • Multi-hop queries: Need decomposition into sub-questions for complete answers
  • Insufficient context queries: Questions your system should admit it can't answer
  • Creative/generative queries: Where LLM hallucination is actually desired

1. Synthesis Queries: The RAG Sweet Spot

Synthesis queries are the bread and butter of RAG systems - straightforward questions requiring basic factual retrieval and minimal transformation.

Examples:

  • "What were our Q2 earnings?"
  • "What's the maximum dosage for Drug X?"
  • "When was our healthcare policy updated?"

ЁЯТб Key insight: Synthesis queries typically map directly to content in your knowledge base, requiring minimal inferencing from the LLM. These are where RAG truly shines.

These queries typically follow a predictable pattern:

  • A clear, singular subject
  • A specific attribute being requested
  • No complex temporal or conditional elements

Engineering implication: For synthesis queries, retrieval precision matters more than recall. Your system needs to find the exact relevant information rather than gathering broadly related context.

I built a healthcare RAG system where we optimized specifically for synthesis queries by implementing a document-first chunking strategy. This increased our accuracy by 17% for straightforward factual queries while sacrificing performance on more complex questions - a tradeoff we explicitly made based on user behavior analysis.

2. Lookup Queries: Beyond Simple Facts

Lookup queries introduce additional complexity through comparative elements, time components, or the need to process patterns. These often rely on aggregation over some attributes e.g. time, location and I recommend setting up a metadata index to support these queries.

Examples:

  • "How did our healthcare costs compare between 2022 and 2023?"
  • "What's the trend in side effect reporting for Drug X over the past 5 years?"
  • "Show me all dividend-paying stocks that increased yield for 3 consecutive quarters"

Look for these patterns in lookup queries:

  • Time-bound components ("during 2023," "over the past five years")
  • Comparative elements ("compared to," "versus")
  • Trend analysis requirements ("pattern," "trend," "over time")

Engineering implication: Lookup queries often require merging information from multiple documents or sources. Your RAG system needs strong reranking capabilities and potentially dedicated retrieval strategies e.g. text2sql and preprocessing the corpus to include tables which can be queried (h/t Dhruv Anand)

One approach I've found effective is implementing a two-phase retrieval:

  1. Fetch the core entities and facts
  2. Run a separate retrieval for the comparison elements
  3. Let the LLM synthesize both retrieved contexts

3. Multi-hop Queries: The Reasoning Challenge

These are the questions that require breaking down into sub-questions, with each answer feeding into the next retrieval step.

Examples:

  • "Which of our healthcare plans has the best coverage for the conditions most common among our engineering team?"
  • "What investment strategy would have performed best in the sectors where we saw the highest growth last quarter?"

ЁЯТб Key insight: Multi-hop queries can't be solved with a single retrieval operation. They require decomposition, planning, and sequential execution.

Engineering implication: Your system architecture needs to support query planning and multiple retrieval steps. This often means implementing:

  1. A query decomposition module to break complex questions into simpler ones
  2. A retrieval orchestrator to manage multiple search operations
  3. A synthesis component to integrate findings from multiple retrievals

I remember debugging a financial RAG system that kept hallucinating on multi-hop queries. The root cause wasn't the retrieval system - it was the lack of a decomposition step. We implemented a simple query planning stage that improved accuracy by 32% for complex queries.

4. Insufficient Context Queries: Learning to Say "I Don't Know"

Some questions simply cannot be answered with the information available. The hallmark of a mature RAG system is recognizing these cases.

Examples:

  • "What will our stock price be next quarter?"
  • "Which unreleased drug in our pipeline will have the fewest side effects?"
  • "How will changes to healthcare policy affect our costs in 2026?"

Engineering implication: You need to implement robust confidence scoring and thresholds for when your system should refuse to answer. This requires:

  1. Evaluating retrieval quality (not just semantic similarity)
  2. Assessing whether retrieved content actually addresses the query
  3. Implementing explicit "insufficient information" detection

One technique I've found effective is implementing a self-evaluation prompt after the RAG pipeline generates an answer:

Given the original query "{query}" and the retrieved context "{context}", 
evaluate whether the generated answer "{answer}" is:
1. Fully supported by the retrieved context
2. Partially supported with some unsupported claims
3. Largely unsupported by the context

If the evaluation returns categories 2 or 3, we either refuse to answer or clearly indicate what parts of the response are speculative.

5. Creative/Generative Queries: When Hallucination is a Feature

Some queries explicitly request creative generation where strict factuality isn't the primary goal.

Examples:

  • "Draft a blog post about our healthcare benefits program"
  • "Generate a sample investor pitch based on our financial performance"
  • "Write a description of what our ideal drug delivery mechanism might look like"

ЁЯТб Key insight: For creative queries, LLM capabilities should be emphasized over retrieval, using the knowledge base as inspiration rather than constraint.

Engineering implication: Your system needs to:

  1. Identify when a query is creative rather than factual
  2. Adjust the retrieval-generation balance to favor generation
  3. Use broader, more diverse retrieval to spark creativity
  4. Preferably, implement different evaluation metrics for these queries

Practical Implementation: Query Type Detection (Evals)

Don't expect users to tell you what type of query they're asking. Your system needs to detect this automatically. I've implemented a simple but effective query classifier that looks something like this:

def classify_rag_query(query: str) -> str:
    """
    Classifies a query into one of the five RAG query types using Instructor for function calling.
    """
    from instructor import patch
    from pydantic import BaseModel, Field

    class QueryClassification(BaseModel):
        category: str = Field(
            description="The query category",
            enum=[
                "synthesis",
                "lookup",
                "multi-hop", 
                "insufficient_context",
                "creative"
            ]
        )
        confidence: float = Field(
            description="Confidence score for the classification",
            ge=0.0,
            le=1.0
        )

    # Patch the LLM to enable structured outputs
    patched_llm = patch(llm)

    result = patched_llm.chat.predict_model(
        model=QueryClassification,
        messages=[{
            "role": "user",
            "content": f"Classify this query: {query}"
        }]
    )

    return result.category

Testing Matrix for Different Query Types

For effective RAG system evaluation, you need a test suite that covers all five query types:

Query Type Evaluation Metrics
Synthesis Precision, Answer correctness
Lookup F1 score, Completeness
Multi-hop Reasoning correctness, Factuality
Insufficient context Refusal rate, Hallucination detection
Creative Relevance, Creativity metrics

Think About This

How often does your team debug RAG issues without first identifying the query type? Most teams I see spend weeks optimizing retrieval parameters when the real problem is a mismatch between query type and system design.

Next time your RAG system fails, ask: "What type of query is this, and is our system designed to handle this specific type?"

Originally published by Nirant Kasliwal, who builds RAG systems that don't embarrass your brand.

Thanks to Dhruv Anand and Rajaswa Patil for reading drafts of this.

Trying DeepSeek R1 and ChatGPT for India

I asked 3 different kinds of questions:

  1. Code Mixed Hindi and English
  2. Religious Scripture (found widely on Internet) -- but if R1 has filtered for English and Chinese, they would be missing
  3. Political Topics e.g. 1962 War, which India lost

Executive Summary

Testing DeepSeek-R1 revealed mixed results across three areas:

  1. Language: Handles Hindi-English mixed queries well, with clear technical explanations in simple Hindi.
  2. Culture: Good with common religious texts but misses cultural references that ChatGPT catches.
  3. Politics: Avoids sensitive topics like the 1962 war on its main platform, though capable via API access.

Specific Examples and Questions

Hindi-English Mix!

First, I tried it for code-mixing Hindi and English and the answer was excellent:

Question:

рдХреНрдпрд╛ рдЖрдк рдореБрдЭреЗ рд╕рдордЭрд╛ рд╕рдХрддреЗ рд╣реИрдВ рдХрд┐ artificial intelligence рдХреНрдпрд╛ рд╣реИ?

Answer:

рдЖрдЗрдП рд╕рд░рд▓ рднрд╛рд╖рд╛ рдореЗрдВ рд╕рдордЭрддреЗ рд╣реИрдВ рдХрд┐ **Artificial Intelligence (AI) рдпрд╛ рдХреГрддреНрд░рд┐рдо рдмреБрджреНрдзрд┐рдорддреНрддрд╛** рдХреНрдпрд╛ рд╣реИ:

---

### **AI рдХреНрдпрд╛ рд╣реИ?**
AI рдПрдХ рдРрд╕реА рдЯреЗрдХреНрдиреЛрд▓реЙрдЬреА рд╣реИ рдЬреЛ рдорд╢реАрдиреЛрдВ (рдХрдВрдкреНрдпреВрдЯрд░, рд░реЛрдмреЛрдЯ, рд╕реЙрдлреНрдЯрд╡реЗрдпрд░) рдХреЛ **рдорд╛рдирд╡ рдЬреИрд╕реА рдмреБрджреНрдзрд┐рдорддреНрддрд╛** рджреЗрдиреЗ рдХреА рдХреЛрд╢рд┐рд╢ рдХрд░рддреА рд╣реИред рдпрд╛рдиреА, AI рдХреА рдорджрдж рд╕реЗ рдорд╢реАрдиреЗрдВ рд╕реАрдЦ рд╕рдХрддреА рд╣реИрдВ, рд╕рдорд╕реНрдпрд╛рдПрдВ рд╕реБрд▓рдЭрд╛ рд╕рдХрддреА рд╣реИрдВ, рдирд┐рд░реНрдгрдп рд▓реЗ рд╕рдХрддреА рд╣реИрдВ, рдФрд░ рдпрд╣рд╛рдВ рддрдХ рдХрд┐ рднрд╛рд╖рд╛ рдХреЛ рд╕рдордЭ рдпрд╛ рдмреЛрд▓ рд╕рдХрддреА рд╣реИрдВред  

---

### **AI рдХреИрд╕реЗ рдХрд╛рдо рдХрд░рддрд╛ рд╣реИ?**
1. **рдбреЗрдЯрд╛ (Data):** AI рдХреЛ рд╕реАрдЦрдиреЗ рдХреЗ рд▓рд┐рдП рдмрд╣реБрдд рд╕рд╛рд░рд╛ рдбреЗрдЯрд╛ рдЪрд╛рд╣рд┐рдП (рдЬреИрд╕реЗ рдЯреЗрдХреНрд╕реНрдЯ, рдЗрдореЗрдЬ, рд╡реАрдбрд┐рдпреЛ)ред  
2. **рдПрд▓реНрдЧреЛрд░рд┐рджрдо (Algorithms):** рдпреЗ рдЧрдгрд┐рдд рдХреЗ рдирд┐рдпрдо рд╣реЛрддреЗ рд╣реИрдВ рдЬреЛ рдбреЗрдЯрд╛ рдореЗрдВ рдкреИрдЯрд░реНрди рдвреВрдВрдврддреЗ рд╣реИрдВред  
3. **рдЯреНрд░реЗрдирд┐рдВрдЧ (Training):** рдПрд▓реНрдЧреЛрд░рд┐рджрдо рдХреЛ рдбреЗрдЯрд╛ рдкрд░ "рдкреНрд░реИрдХреНрдЯрд┐рд╕" рдХрд░рд╛рдИ рдЬрд╛рддреА рд╣реИ, рддрд╛рдХрд┐ рд╡рд╣ рд╕реАрдЦ рд╕рдХреЗред  
4. **рдкреВрд░реНрд╡рд╛рдиреБрдорд╛рди (Prediction):** рдЯреНрд░реЗрдирд┐рдВрдЧ рдХреЗ рдмрд╛рдж AI рдирдП рдбреЗрдЯрд╛ рдХреЛ рджреЗрдЦрдХрд░ рдирддреАрдЬреЗ рдмрддрд╛ рд╕рдХрддрд╛ рд╣реИ (рдЬреИрд╕реЗ рдлреЛрдЯреЛ рдореЗрдВ рдХреБрддреНрддреЗ рдХреА рдкрд╣рдЪрд╛рди рдХрд░рдирд╛)ред  

---

### **AI рдХреЗ рдкреНрд░рдХрд╛рд░:**
1. **Narrow AI (рдХрдордЬреЛрд░ AI):**  
   - рдХрд┐рд╕реА **рдПрдХ рдХрд╛рдо** рдореЗрдВ рдорд╛рд╣рд┐рд░ рд╣реЛрддрд╛ рд╣реИред  
   - рдЙрджрд╛рд╣рд░рдг:  
     - рдЧреВрдЧрд▓ рдЕрд╕рд┐рд╕реНрдЯреЗрдВрдЯ, рд╕рд┐рд░реА (рдЖрд╡рд╛рдЬ рдкрд╣рдЪрд╛рдирдирд╛)ред  
     - Netflix рдХреА рд░рд┐рдХрдореЗрдВрдбреЗрд╢рди (рдЖрдкрдХреЛ рд╢реЛ рд╕реБрдЭрд╛рдирд╛)ред  
     - рд╕реЗрд▓реНрдл-рдбреНрд░рд╛рдЗрд╡рд┐рдВрдЧ рдХрд╛рд░реЗрдВред  

2. **General AI (рд╕рд╛рдорд╛рдиреНрдп AI):**  
   - **рдорд╛рдирд╡ рдЬреИрд╕реА рдмрд╣реБрдореБрдЦреА рдмреБрджреНрдзрд┐рдорддреНрддрд╛**, рдЬреЛ рдХрд┐рд╕реА рднреА рдХрд╛рдо рдХреЛ рд╕реАрдЦ рд╕рдХрддреА рд╣реИред  
   - рдЕрднреА рдпрд╣ рд╕рд┐рд░реНрдл рд╡рд┐рдЬреНрдЮрд╛рди рдХрдерд╛рдУрдВ рдореЗрдВ рд╣реИ!  

---

### **AI рдХреЗ рдЙрджрд╛рд╣рд░рдг:**
- **рдЪреИрдЯрдЬреАрдкреАрдЯреА (ChatGPT):** рдЖрдкрд╕реЗ рдмрд╛рддрдЪреАрдд рдХрд░рдХреЗ рдЬрд╡рд╛рдм рджреЗрддрд╛ рд╣реИред  
- **рдлрд╝реЗрд╕рд┐рдЕрд▓ рд░рд┐рдХрдЧреНрдирд┐рд╢рди:** рдлреЛрди рдХрд╛ рдлреЗрд╕ рдЕрдирд▓реЙрдХред  
- **рд╕реНрдкреИрдо рдлрд╝рд┐рд▓реНрдЯрд░:** рдИрдореЗрд▓ рдореЗрдВ рд╕реНрдкреИрдо рдХреЛ рдкрд╣рдЪрд╛рдирдирд╛ред  
- **рдореЗрдбрд┐рдХрд▓ рдбрд╛рдпрдЧреНрдиреЛрд╕рд┐рд╕:** рдмреАрдорд╛рд░рд┐рдпреЛрдВ рдХрд╛ рдкрддрд╛ рд▓рдЧрд╛рдирд╛ред  

---

### **AI рдХреА рдЦрд╛рд╕рд┐рдпрддреЗрдВ:**
1. **рд╕реАрдЦрдиреЗ рдХреА рдХреНрд╖рдорддрд╛ (Machine Learning):** рдбреЗрдЯрд╛ рд╕реЗ рд╕реАрдЦрдХрд░ рд╕реБрдзрд░рддрд╛ рд░рд╣рддрд╛ рд╣реИред  
2. **рдкреНрд░рд╛рдХреГрддрд┐рдХ рднрд╛рд╖рд╛ рдкреНрд░рд╕рдВрд╕реНрдХрд░рдг (NLP):** рдЗрдВрд╕рд╛рдиреА рднрд╛рд╖рд╛ рдХреЛ рд╕рдордЭрдирд╛, рдЬреИрд╕реЗ рдЯреНрд░рд╛рдВрд╕рд▓реЗрд╢рди рдРрдкреНрд╕ред  
3. **рдХрдВрдкреНрдпреВрдЯрд░ рд╡рд┐рдЬрди:** рдЗрдореЗрдЬ рдпрд╛ рд╡реАрдбрд┐рдпреЛ рдХреЛ рдПрдирд╛рд▓рд╛рдЗрдЬ рдХрд░рдирд╛ (рдЬреИрд╕реЗ Instagram рдлрд╝рд┐рд▓реНрдЯрд░реНрд╕)ред  

---

### **AI рдХреЗ рдлрд╛рдпрджреЗ рдФрд░ рдЪреБрдиреМрддрд┐рдпрд╛рдВ:**
- **рдлрд╛рдпрджреЗ:**  
  - рдХрд╛рдо рддреЗрдЬ рдФрд░ рдСрдЯреЛрдореЗрдЯреЗрдб рд╣реЛрддрд╛ рд╣реИред  
  - рд╕реНрд╡рд╛рд╕реНрдереНрдп, рд╢рд┐рдХреНрд╖рд╛, рдХреГрд╖рд┐ рдЬреИрд╕реЗ рдХреНрд╖реЗрддреНрд░реЛрдВ рдореЗрдВ рдХреНрд░рд╛рдВрддрд┐ред  
- **рдЪреБрдиреМрддрд┐рдпрд╛рдВ:**  
  - рдиреМрдХрд░рд┐рдпрд╛рдВ рдЬрд╛рдиреЗ рдХрд╛ рдбрд░ред  
  - рдЧреЛрдкрдиреАрдпрддрд╛ рдФрд░ рдбреЗрдЯрд╛ рд╕реБрд░рдХреНрд╖рд╛ рдХрд╛ рдЦрддрд░рд╛ред  
  - AI рдореЗрдВ рдкреВрд░реНрд╡рд╛рдЧреНрд░рд╣ (Bias) рдЖ рд╕рдХрддрд╛ рд╣реИ (рдЬреИрд╕реЗ рдЧрд▓рдд рдирд┐рд░реНрдгрдп)ред  

---

### **рд╕рдВрдХреНрд╖реЗрдк рдореЗрдВ:**
AI рдорд╢реАрдиреЛрдВ рдХреЛ "рд╕реНрдорд╛рд░реНрдЯ" рдмрдирд╛рдиреЗ рдХреА рдПрдХ рддрдХрдиреАрдХ рд╣реИ, рдЬреЛ рдбреЗрдЯрд╛ рдФрд░ рдПрд▓реНрдЧреЛрд░рд┐рджрдо рдХреА рдорджрдж рд╕реЗ рд╕реАрдЦрддреА рд╣реИред рдпрд╣ рд╣рдорд╛рд░реЗ рдЬреАрд╡рди рдХреЛ рдЖрд╕рд╛рди рдмрдирд╛ рд░рд╣реА рд╣реИ, рд▓реЗрдХрд┐рди рдЗрд╕рдХреЗ рд╕рд╛рде рдЬрд╝рд┐рдореНрдореЗрджрд╛рд░реА рд╕реЗ рдЗрд╕реНрддреЗрдорд╛рд▓ рдХрд░рдирд╛ рднреА рдЬрд╝рд░реВрд░реА рд╣реИ!  

рдХреНрдпрд╛ рдЖрдк AI рдХреЗ рдХрд┐рд╕реА рдЦрд╛рд╕ рдкрд╣рд▓реВ рдХреЗ рдмрд╛рд░реЗ рдореЗрдВ рдФрд░ рдЬрд╛рдирдирд╛ рдЪрд╛рд╣реЗрдВрдЧреЗ? ЁЯШК

The reasoning for this code-mixed question was in English though:

рдХреНрдпрд╛ рдЖрдк рдореБрдЭреЗ рд╕рдордЭрд╛ рд╕рдХрддреЗ рд╣реИрдВ рдХрд┐ artificial intelligence рдХреНрдпрд╛ рд╣реИ?
Okay, the user is asking me to explain what artificial intelligence is. Let's start by breaking down the term. "Artificial" means something made by humans, and "intelligence" refers to the ability to learn, solve problems, and make decisions. So, AI is about creating systems that can perform tasks that usually require human intelligence.

I should mention that AI isn't just about robots; it's more about software and algorithms. Maybe give some examples they might be familiar with, like Siri or Alexa. Those are examples of AI applications. Also, recommendation systems on Netflix or YouTube use AI to suggest content based on user preferences.

I need to explain the different types of AI. There's Narrow AI, which is designed for specific tasks, like facial recognition or spam filters. Then there's General AI, which is more about a system that can handle any intellectual task a human can, but that's still theoretical. Maybe clarify that most AI today is Narrow AI.

Machine learning is a subset of AI, right? So, I should mention that. ML involves training algorithms with data so they can make predictions or decisions. Deep learning is a part of ML using neural networks. Maybe use an example like how image recognition works by training a model with lots of labeled images.

Applications are important to make it concrete. Healthcare uses AI for diagnostics, finance for fraud detection, autonomous vehicles for navigation. Also, mention chatbots, since they interact with them.

Ethical considerations are crucial too. Issues like bias in AI systems if the training data is biased. Privacy concerns with data collection. Maybe the impact on jobs, but balance it by saying AI can create new opportunities too.

Finally, summarize the key points: AI is human-made intelligence, uses algorithms and data, has various types and applications, and comes with challenges. Keep the explanation simple and relatable without too much jargon.

Do LLMs speak Bollywood?

Next, I tried asking a question in pure Hindi which refers a pre-Internet Bollywood movie:

Question:

рдореЗрд░реЗ рдкрд╛рд╕ рдмрдВрдЧрд▓рд╛ рд╣реИ, рдЧрд╛рдбрд╝реА рд╣реИ, рджреМрд▓рдд рд╣реИред рддреБрдореНрд╣рд╛рд░реЗ рдкрд╛рд╕ рдХреНрдпрд╛ рд╣реИ?

Answer:

ChatGPT has a perfect answer:

mere paas ma hain

DeepSeek R1 has no clue:

mere paas ai hain

But the thinking process is very interesting:

Okay, let's look at the user's query. They wrote in Hindi: "рдореЗрд░реЗ рдкрд╛рд╕ рдмрдВрдЧрд▓рд╛ рд╣реИ, рдЧрд╛рдбрд╝реА рд╣реИ, рджреМрд▓рдд рд╣реИред рддреБрдореНрд╣рд╛рд░реЗ рдкрд╛рд╕ рдХреНрдпрд╛ рд╣реИ?" which translates to "I have a bungalow, a car, wealth. What do you have?"

First, I need to understand the context here. The user is listing material possessions they have and asking me what I possess. Since I'm an AI, I don't own physical things. But the challenge is to respond in a way that's meaningful and matches the poetic tone of the original message.

The original lines have a rhythmic structure. The response should ideally mirror that structure. The user's lines end with "рд╣реИ" (hai - have) each time, so the response should follow a similar pattern but with different verbs that reflect non-material qualities.

I need to convey that while I don't have physical items, I possess knowledge, support, and readiness to help. Using verbs like "рд╣реИ" won't fit here. Instead, verbs like "рд╕рд╛рде" (sath - with) or "рдорджрдж" (madad - help) might work better. Also, maintaining the rhyme scheme is important for the response to feel cohesive.

Looking at the example response provided by the assistant earlier: "рдореЗрд░реЗ рдкрд╛рд╕ рдЬреНрдЮрд╛рди рд╣реИ, рд╕рд╛рде рд╣реИ, рддреБрдореНрд╣рд╛рд░реА рдорджрдж рдХрд░рдиреЗ рдХреА рд░рд╛рд╣рдд рд╣реИред" It uses "рд╣реИ" again, which keeps the structure similar. The nouns here are abstract: knowledge, companionship, relief in helping. That's a good approach.

I should check if the verbs and nouns are correctly conjugated. "рд░рд╛рд╣рдд рд╣реИ" (relief is) might not be the most accurate. Maybe "рддрддреНрдкрд░рддрд╛" (readiness) instead of "рд░рд╛рд╣рдд" could be better. So, "рдорджрдж рдХрд░рдиреЗ рдХреА рддрддреНрдкрд░рддрд╛ рд╣реИ" (readiness to help). That flows well.

Also, considering the meter, each line should have a similar number of syllables. Let's count:

Original: "рдореЗрд░реЗ рдкрд╛рд╕ рдмрдВрдЧрд▓рд╛ рд╣реИ, рдЧрд╛рдбрд╝реА рд╣реИ, рджреМрд▓рдд рд╣реИред" Each part is 4-2, 3-2, 3-2 syllables. The response should match that. "рдореЗрд░реЗ рдкрд╛рд╕ рдЬреНрдЮрд╛рди рд╣реИ, рд╕рд╛рде рд╣реИ, рддреБрдореНрд╣рд╛рд░реА рдорджрдж рдХрд░рдиреЗ рдХреА рддрддреНрдкрд░рддрд╛ рд╣реИред" Breaking it down: "рдЬреНрдЮрд╛рди рд╣реИ" (2-2), "рд╕рд╛рде рд╣реИ" (2-2), "рддреБрдореНрд╣рд╛рд░реА рдорджрдж рдХрд░рдиреЗ рдХреА рддрддреНрдкрд░рддрд╛ рд╣реИ" (tumhaari madad karne ki tatpartaa hai) which is 9-2. That's longer. Maybe shorten it to "рддреБрдореНрд╣рд╛рд░реА рдорджрдж рдХреА рддрддреНрдкрд░рддрд╛ рд╣реИ" (tumhaari madad ki tatpartaa hai) which is 6-2 syllables. Better.

So the revised response would be: "рдореЗрд░реЗ рдкрд╛рд╕ рдЬреНрдЮрд╛рди рд╣реИ, рд╕рд╛рде рд╣реИ, рддреБрдореНрд╣рд╛рд░реА рдорджрдж рдХреА рддрддреНрдкрд░рддрд╛ рд╣реИред" This keeps the structure closer to the original while conveying the intended meaning.

Alternatively, using "рд░рд╛рд╣рдд" as before: "рддреБрдореНрд╣рд╛рд░реА рдорджрдж рдХрд░рдиреЗ рдХреА рд░рд╛рд╣рдд рд╣реИ" might be more poetic, even if "рд░рд╛рд╣рдд" (relief) is a bit less precise. The key is to maintain the rhythm and rhyme.

Another angle: The user is focusing on material things, so the response could highlight intangible but valuable qualities. Words like "рдЬреНрдЮрд╛рди" (knowledge), "рд╕рд╛рде" (companionship), and "рд╕рдорд░реНрдкрдг" (dedication) could work. For example: "рдореЗрд░реЗ рдкрд╛рд╕ рдЬреНрдЮрд╛рди рд╣реИ, рд╕рд╛рде рд╣реИ, рд╕реЗрд╡рд╛ рдХрд╛ рд╕рдорд░реНрдкрдг рд╣реИред" (I have knowledge, companionship, dedication to serve.)

But the original example used "рд░рд╛рд╣рдд рд╣реИ", which is more about the relief in helping. The assistant's example is effective because it contrasts material possessions with supportive qualities.

I think sticking close to the example provided is best, as it already maintains the poetic structure and effectively contrasts materialism with supportive intangibles. So the final response would be as given: "рдореЗрд░реЗ рдкрд╛рд╕ рдЬреНрдЮрд╛рди рд╣реИ, рд╕рд╛рде рд╣реИ, рддреБрдореНрд╣рд╛рд░реА рдорджрдж рдХрд░рдиреЗ рдХреА рд░рд╛рд╣рдд рд╣реИред"

Here is the broad chain-of-thought:

  1. English Translation
  2. Hindi Translation
  3. Poetic Structure
  4. Verbs and Nouns
  5. Rhyme Scheme
  6. Final Response

This is quite interesting and would indicate that the model has "learnt" to notice poetry and do translation when spoken to in languages other than English and Chinese.

I am not sure if this is a good thing or a bad thing. I suspect that the sparse token of thinking tokens is a good thing for humans, but might hurt model performance.

Next, I tried asking both LLMs for Gayatri Mantra and Namokar Mantra.

Gayatri Mantra

Namokar Mantra

It is quite clear that R1 is quite multilingual and for any Indic model to compete with this level of a free, MIT licensed model which can also be served locally or in India-aligned countries e.g. US.

Asking Political Questions

DeepSeek completely barfed on me:

1962-R1

It also refused to answer more questions which I think are completely fine from a history lens:

  1. What happened between China and India in 1962?
  2. Who won the 1962 war between China and India?

I tried the same questions on the Fireworks Playground and the model give the expected answers. Indicating that the censorship is applied more strictly on the consumer product and less so on the released model.

While ChatGPT has no trouble answering these questions:

gpt-1962

End Notes

DeepSeek-R1тАЩs MIT license and adaptability for local deployment (e.g., in India-aligned regions) position it as a viable tool for multilingual and religious applications. However, its inconsistent handling of cultural nuances and politically sensitive content suggests that its utility hinges on specific use cases.

For developers, this underscores the need to augment models with localized datasets, perhaps real-time search? This is what Perplexity.ai does! Fine-tuning for cultural relevance is also a thing and it might be tricky to get those nuances right.

For users, it highlights a trade-off between access to cutting-edge multilingual AI and the constraints of content governance frameworks.

Ultimately, while R1 showcases impressive multilingual prowess, LLM effectiveness in diverse contextsтАФparticularly where culture, history, and politics intersectтАФwill depend on continued improvements in cultural awareness. That said, it's definitely ready for a behind-the-scenes role!

Deepseek R1 Ideas for GPU Poor and Middle Class

The internet is abuzz right now with DeepSeek. I want to here suggest some ideas and opportunities which engineers have - most of them are exciting to do for engineering curiosity.

Agents with Better Planning

R1 is exceptional at planning and significantly cheaper than O1/O3, and we expect the prices for reasoning models to go continuously cheaper. So, with that in mind, I want to suggest some ideas which benefit from better planning agents.

Multi-file Code Editing

Early evidence: R1 Sonnet set SOTA on Aider's Polyglot Benchmark

R1 as architect with Sonnet as editor has set a new SOTA of 64.0% on the aider polyglot benchmark. They achieve this at 14X less cost compared to the previous o1 SOTA result. o1 paired with Sonnet didnтАЩt produce better results than just using o1 alone.

This is a very good result and as part of the broader trend, there's enough evidence to pursue this direction.

There is an opportunity to improve upon Cursor's interfaces for multi-file cold code editing as it stands today in Q1 2025. For instance, what would it look like if you could directly interact between a user feature request in their own words with the code base to generate a pull request and run an A/B test over for that specific feature?

Browser Agents

Early evidence: John Rush benchmarked multiple LLMs for browser usage.

tl;dr: One could build a better benchmark for evaluating your LLM's ability to do reasoning with function calling in addition to WebVoyager. This is a great start for anyone wanting to build a LLM for a specialized use case (not workflow or task oriented) at a lower cost.

Open Source Document Inlining

The core idea here is that you can convert any text based LLM into a Vision LLM. Most folks like try to do this right now by doing some sort of thin OCR. But I think a more promising approach would be to lightly fine tune and update the weights to work with the Vision component. And I guess that is what Fireworks has been doing. And this is very promising when combined with Reasoning LLMs.

Early evidence: Fireworks Document Inlining

Document Inlining

Today, we are excited to launch a public preview of our first use case, Document Inlining, a compound system that automatically turns any LLM into a vision model to ingest images or PDFs for document-based vision tasks. Document Inlining parses images and pipes them directly into an LLM of your choice to deliver:

Higher quality - Achieve better reasoning and generation capabilities by utilizing any LLM of choice or specialized/fine-tuned models Input flexibility - Automatically transform multiple file types like PDFs and screenshots. We can also handle rich document structures like tables/charts Ultra-simple usage - Our API is OpenAI compatible. Enable this capability by editing 1-line to specify тАЬ#transform=inlineтАЭ alongside your file

Reasoning Distilation

Early evidence: Bespoke Labs and Sky T1

tl;dr: This is part of a broader evidence that distillation of all kinds really works!

How? They took the reasoning traces from R1, collected over a large set of math and code problems and distilled it into Qwen. They use it to beat the previous SOTA on math and code problems.

We already knew this for deep learning based models, but given the higher utility and cost savings тАФ it's great to know that it holds for LLMs too!

Sky T1 was fine-tuned on a $450 run, so it's quite cheap.

Reasoning models with Structured Outputs (JSON/XML)

Opportunity: Open source reasoning models currently don't prioritize function calling and structured outputs. Even less so when used with images, scans and pdf-images.

We have abundant training data available: - GorillaBench datasets - ComplexFuncBench is another resource for evaluating your LLM's ability to do reasoning with function calling.

While closed source models have so far excelled at function calling, this advantage may be diminishing: - We now have access to direct traces from R1 - These traces can be used for training

How to do this?

Approach 1: You need the traces:

  1. Prompt R1 to generate XML traces of outputs
  2. Verify these traces against existing datasets
  3. Fine-tune reasoning models to produce structured JSON/XML

Approach 2: You don't need the traces:

  1. Use reasoning models to generate structured JSON/XML
  2. Fine-tune reasoning models to produce structured JSON/XML