Skip to content

2023

Retrieval Augmented Generation Best Practices

Retrieval and Ranking Matter!

Chunking

  1. Including section title in your chunks improves that, so does keywords from the documents
  2. Different token-efficient separators in your chunks e.g. ### is a single token in GPT

Examples

  1. Few examples are better than no examples
  2. Examples at the start and end have the highest weight, the middle ones are kinda forgotten by the LLM

Re Rankers

Latency permitting — use a ReRanker — Cohere, Sentence Transformers and BGE have decent ones out of the box

Embedding

Use the right embedding for the right problem:

GTE, BGE are best for most support, sales, and FAQ kind of applications.

OpenAI is the easiest for Code Embedding to use.

e5 family does outside English and Chinese

If you can, finetune the embedding to your domain — takes about 20 minutes on a modern laptop or Colab notebook, improves recall by upto 30-50%

Evaluation

Evaluation Driven Development makes your entire "dev" iteration much faster.

Think of these as the "running the code to see if it works"

Strongly recommend using Ragas for something like this. They've Langchain and Llama Index integrations too which are great for real world scenarios.

Scaling

LLM Reliability

Have a failover LLM for when your primary LLM is down, slow or just not working well. Can you switch to a different LLM in 1 minute or less automatically?

Vector Store

When you're hitting latency and throughput limits on the Vector Store, consider using scalar quantization with a dedicated vector store like Qdrant or Weaviate

Qdrant also has Binary Quantization which allows you to scale 30-40x with OpenAI Embeddings.

Finetuning

LLM: OpenAI GPT3.5 will often be as good as GPT4 with finetuning.

Needs about 100 records and you get the 30% latency improvements for free

So quite often worth the effort!

This extends to OSS LLM models. Can't hurt to "pretrain" finetune your Mistral or Zephyr7B for $5

Tax Tips for Consultants with Foreign Income

Managing taxation and financial compliance as a consultant in India involves several considerations: GST to the benefits of Section 44ADA and the complexities of foreign entity considerations, consultants must juggle multiple aspects to ensure compliance and tax optimization. I'm sharing what I know in the hopes it's helpful to others.

1. Compliance: GST, LUT and FIRC

  • Threshold: Acquiring a GST number becomes almost mandatory once your income surpasses some X=20 lakhs INR in a given financial year. This step legitimizes your income and ensures compliance with Indian taxation laws. The X limits keeps changing and I'd recommend you check the latest one with a CA and NOT rely on Google for this. Some online services like ClearTax offer GST registration services, but I'd recommend you get it done via a CA. It's a one-time process and you don't need to renew it every year.

  • Sole Proprietorship: If you are an individual, you get a GST number as a sole proprietor. This is the simplest way to get a GST number and is recommended for most consultants. You can also get a GST number as a private limited company, but that might be overkill for most consultants. My CA also recommended that I get an Udyog Certificate, so I did get one but I'm not sure if it's necessary.

  • Quarterly Filing: If you are a sole proprietor or operating as an individual consultant, quarterly GST filing can be more manageable than doing it monthly. This approach helps streamline administrative tasks and keeps compliance straightforward.

  • FIRC: Every time that you receive a payment from a foreign entity, you will need to get a Foreign Inward Remittance Certificate (FIRC) from your bank. This document is a proof of payment and is required for filing taxes. It is advisable to get a FIRC for each payment, as it can be difficult to get one later. Salt is a great tool for automating this process.

  • Accounting & Compliance: Using a CA to file for managing both your GST and Letter of Undertaking (LUT) is advisable. It serves as an effective tool for official documentation and ensures a smoother filing process.

  • Current Account: Having a current account separate from your personal finances helps in meticulous financial management. It allows you to distinctly separate your business income and expenses, making accounting more cleaner. E.g. you can use this current to pay for your AWS bills. This can help you save on GST, as you can claim GST input credit on these expenses.

  • Letter of Undertaking (LUT): For consultants involved in international transactions, an LUT is crucial. This legal document clarifies that all relevant taxes will be paid exclusively in India, thereby simplifying compliance requirements for cross-border business. It is advisable to get an LUT even if you are not sure if you will be involved in international transactions. You need to mention the LUT in your invoices to foreign entities.

  • GST on Foreign Transactions: GST is not applicable on foreign transactions. However, you will need to mention the LUT in your invoices to foreign entities. This is to ensure that the GST authorities are aware that you are not liable to pay GST on these transactions.

2. Taxation

2.1 Personal Income Tax: Section 44ADA

  • 50% Rule: Section 44ADA offers a benefit by allowing you to assume 50% of your income upto 75 lakhs as an expense for tax filing. This helps significantly in reducing your taxable income, which can be advantageous for many consultants.

Example Math:

If you make 100 lakhs INR in a year, you can assume 37.5 lakhs INR (half of 75 lakhs) as your expenses and pay taxes on the remaining 100-37.5=62.5 lakhs INR. At 30% income tax rate, you'd pay 19.3 lakhs INR in taxes. Effectively, you'd be paying 19% of your income in taxes.

💡 You do NOT need to maintain books of expenses for Section 44ADA. This is a huge benefit as it simplifies the process of filing taxes!

There is another catch on 100 lakhs INR: 15% surcharge on income tax. Source: ClearTax -- I'm unclear if Section 44ADA applies to this surcharge.

  • >50% Expenses: If your actual expenses amount to more than half of your income, then Section 44ADA may not offer you the best tax advantage. This is because you can only claim 50% of your income as an expense, even if your actual expenses are higher. You will have to get audited by a CA and file your taxes later (usually in September) if you want to claim more than 50% of your income as expenses.

2.2 Foreign LLC or C-Corp

  • LLC Abroad: If you are working with a foreign entity, it is sometimes useful to have a Limited Liability Company (LLC) in the country of the entity. E.g. you can open a Delaware LLC via Stripe Atlas and use that to invoice your clients. You will then pay taxes via this entity in the source country. It's only for the amount that you move to India, that you will pay taxes in India.

If I remember correctly, you will be paying 8.7% corporate tax in Delaware, and similarly 8-9% in Dubai. With all of these, you pay additional 30% personal income tax in India. So if you make 100 lakhs INR in a year, you will pay 8.7 lakhs INR in the US. And then you pay personal income tax in India on your living expenses: So say, you move 18 lakhs INR to India, you will pay 0.3*18 = 5.4 lakhs INR in taxes in India. Effectively, you'd be paying 14.1% of your income in taxes. If you are making more than 100 lakhs INR in a year, this might be a good option for you.

This can help simplify the process of receiving payments. However, it is important to note that this option is not always the most tax-efficient. It is advisable to consult a CA to understand the implications of this option for your specific amount, transaction frequency and expected personal expenses.

RemoteIndian has a similar Tax Guide that you might find useful. I'm not a CA, so please consult a CA before making any decisions. I'm sharing what I know in the hopes it's helpful to others. If you have any questions, please feel free to reach out to me on Twitter.

AI4Humans aka Software x LLMs

AI4Bharat, IIT Madras, July 2023

Namaste! 🙏 I'm Nirant and here's a brief of what we discussed in our session.

Why You Should Care?

I have a track record in the field of NLP and machine learning, including a paper at ACL 2020 on Hinglish, the first Hindi-LM, and an NLP book with over 5000 copies sold. I've contributed to IndicGlue by AI4Bharat, built and deployed systems used by Nykaa, and consulted for healthcare enterprises and YC companies. I also manage India’s largest GenAI community with regular meetups since February 2023.

Here's my Github.

AI4Humans: Retrieval Augmented Generation for India

We dived into two main areas:

  1. Retrieval Augmented Generation: Examples of RAG for India, engineering choices, open problems, and how to improve it
  2. LLM Functions: Exploring tool augmentation and "perfect" natural language parsing

Retrieval Augmented Generation (RAG)

RAG is a popular pattern in AI. It's used in various applications like FAQ on WhatsApp, customer support automation, and more. It's the backbone of services like Kissan.ai, farmer.chat and Bot9.ai.

However, there are several open problems in RAG, such as text splitting, improving ranking/selection of top K documents, and embedding selection.

Adding Details to RAG

We can improve RAG by integrating models like OpenAI's GPT4, Ada-002, and others. We can also enhance the system by adding a Cross-Encoder and 2 Pass Search.

RAG Outline

Despite these improvements, challenges remain in areas like evaluation, monitoring, and handling latency/speed. For instance, we discussed how to evaluate answers automatically, monitor model degradation, and improve system latency.

Using LLM to Evaluate

An interesting application of LLM is to use it for system evaluation. For example, we can use LLM to auto-generate a QA test set and auto-grade the results of the specified QA chain. Check out this auto-evaluator as an example.

Addressing Open Problems

We discussed the best ways to improve system speed, including paged attention, caching, and simply throwing more compute at it. We also touched on security concerns, such as the need for separation of data and the use of Role Based Access Control (RBAC).

LLM “Functions”

We explored how LLMs can be used for tool augmentation and converting language to programmatic objects or code. The Gorilla LLM family is a prime example of this, offloading tasks to more specialized, reliable models.

In the context of AgentAI, we discussed how it can help in converting text to programmatic objects, making it easier to handle complex tasks. You can check out the working code here.

Thank you for attending the session! Feel free to connect with me: Twitter, LinkedIn or learn more about me here.

References

Images in this blog are taken from the slides presented during the talk.