About

I’m a Machine Learning Consultant and AI Engineer. I help startups and founders build AI products and services. This includes things like LLMs and Vector Databases, which are amazing tools which unlock a lot of new, intelligent value but also have their own limits.

I recently built FastEmbed, an embedding library built for speed using ONNX. It is maintained with Qdrant - a vector search engine.

Trivia:

  1. Dr. Andrew Ng recommends Awesome NLP, a repo I’ve maintained since 2018 at Stanford’s Deep Learning course CS 230.
  2. Top 5 GenAI Scientists in India, Analytics India Magazine
  3. I have written a book on Natural Language Processing

Background

Machine Learning and AI Engineer with 7+ years of experience in chat bots, retrieval, ranking and language modeling.

As a Machine Learning Engineer, I

  • Trained the first Hindi LM
  • Deployed Sentence Transformers and Annoy (vector search library) for cosine Similarity powered search in 2018 in production
  • Managed a team of 3 engineers to build a support chatbot for 1M chat messages per month

As an AI Engineer,

  • Have built and deployed Question Answering systems for 3+ years, including 2 projects with OpenAI LLMs e.g. text-davinci-003, GPT3.5 and GPT4
  • Hallucination-free summarization and question answering systems

PS: What is an AI Engineer? Here you go: https://latent.space/p/ai-engineer

Book

Book: NLP in Python: Quickstart Guide

Code: Github

I wrote this book in 2018 to make Natural Language Programming more accessible for software engineers and programmers. This had a very design and code-first view of tools and their limitations. Today, most of it is outdated, I do not recommend buying it.

bookcover

Papers and Open Source Contributions

  1. Hinglish: github, paper focussed on code-mixed languages was published in ACL 2019.

  2. Awesome Project Ideas - Curated list of machine learning (mostly deep learning) project ideas with datasets. These ideas range from Vision, Text, Forecasting to Recommender Systems

  3. Awesome NLP Curated list of Natural Language Processing Resources. I’ve been the Primary Maintainer for Awesome-NLP

    • Recommended by Dr. Andrew Ng’s (Stanford) CS 230
    • Featured in Github’s Official Machine Learning Collection since 2016 and
  4. State of the Art Language Modeling in Hindi + new datasets, check the code here at hindi2vec

  5. Comparative Study of Preprocessing and Classification Methods in Character Recognition of Natural Scene Images. In: Machine Intelligence and Signal Processing. Advances in Intelligent Systems and Computing, vol 390. Springer, New Delhi (2015). https://doi.org/10.1007/978-81-322-2625-3_11

Talks

  1. Fifth Elephant MLOps Conf 2021: Slides
  2. PyCon India 2019: Slides and Youtube
  3. inMobi Tech Talks: A Nightmare on the LM Street; Slides
  4. Wingify DevFest: NLP for Indian Languages; Slides, Youtube
  5. PyData Bengaluru Inaugral Talk: Quiz Generation with spaCy; Youtube

Web Mentions - My 5 minutes of Internet fame

  1. Search and Informational Retrieval Ranking Challenge hosted by Bing AI Team (2019)

  2. Won the Kaggle Kernel Prize (2019)

  3. Exploratory Programming Notes found helpful by Nobel Laureate (2018)

    • Tips, Tricks, Best Practices for working with Jupyter Notebook’s was appreciated by Economics Nobel Laureate 2018 Dr. Paul Romer:
  4. FactorDaily’s piece on The great rush to data sciences in India ends with a direct quote from me.

    • FactorDaily is a new age news company which sits at the intersection of technology with life, culture and society in India.
  5. First Runner’s Up at the Future Group Datathon (March 2019)

    • Two stage Machine Learning hackathon called Tathastu, working on recommendation systems and item information extraction problems
  6. Opened AI Hackathon (2019)

    • Awesome NCERT won the Best use of IBM Watson API; blog
    • Idea: Find recent+relevant news articles against any NCERT chapter in sciences and social studies