About
I'm a Machine Learning Consultant and AI Engineer. I help startups and founders build AI products and services. This includes things like LLMs and Vector Databases, which are amazing tools which unlock a lot of new, intelligent value but also have their own limits.
I recently built FastEmbed, an embedding library built for speed using ONNX. It is maintained by Qdrant - a vector search engine.
Trivia:
- Dr. Andrew Ng recommends Awesome NLP, a repo I've maintained since 2018 at Stanford's Deep Learning course CS 230.
- Top 5 GenAI Scientists in India, Analytics India Magazine
- I have written a book on Natural Language Processing
Background¶
Machine Learning and AI Engineer with 7+ years of experience in chat bots, retrieval, ranking and language modeling.
As a Machine Learning Engineer, I
- Trained the first Hindi LM
- Deployed Sentence Transformers and Annoy (vector search library) for cosine Similarity powered search in 2018 in production
- Managed a team of 3 engineers to build a support chatbot for 1M chat messages per month
As an AI Engineer,
- Have built and deployed Question Answering systems for 3+ years, including 2 projects with OpenAI LLMs e.g. text-davinci-003, GPT3.5 and GPT4
- Hallucination-free summarization and question answering systems
PS: What is an AI Engineer? Here you go: https://latent.space/p/ai-engineer
Book¶
Book: NLP in Python: Quickstart Guide
Code: Github
I wrote this book in 2018 to make Natural Language Programming more accessible for software engineers and programmers. This had a very design and code-first view of tools and their limitations. Today, most of it is outdated, I do not recommend buying it.
Papers and Open Source Contributions¶
-
Hinglish: github, paper focussed on code-mixed languages was published in ACL 2019.
-
Awesome Project Ideas - Curated list of machine learning (mostly deep learning) project ideas with datasets. These ideas range from Vision, Text, Forecasting to Recommender Systems
-
Awesome NLP Curated list of Natural Language Processing Resources. I've been the Primary Maintainer for Awesome-NLP
- Recommended by Dr. Andrew Ng's (Stanford) CS 230
- Featured in Github's Official Machine Learning Collection since 2016 and
-
State of the Art Language Modeling in Hindi + new datasets, check the code here at hindi2vec
-
Comparative Study of Preprocessing and Classification Methods in Character Recognition of Natural Scene Images. In: Machine Intelligence and Signal Processing. Advances in Intelligent Systems and Computing, vol 390. Springer, New Delhi (2015). https://doi.org/10.1007/978-81-322-2625-3_11
Talks¶
- Fifth Elephant MLOps Conf 2021: Slides
- PyCon India 2019: Slides and Youtube
- inMobi Tech Talks: A Nightmare on the LM Street; Slides
- Wingify DevFest: NLP for Indian Languages; Slides, Youtube
- PyData Bengaluru Inaugral Talk: Quiz Generation with spaCy; Youtube
Web Mentions - My 5 minutes of Internet fame¶
-
Search and Informational Retrieval Ranking Challenge hosted by Bing AI Team (2019)
-
Won the Kaggle Kernel Prize (2019)
- The Hitchhiker's Guide to NLP in spaCy won the first ever NLP themed Kaggle Kernel award. I won a free licensed copy of Prodi.gy worth $390 with it, and $500 in cash.
-
Exploratory Programming Notes found helpful by Nobel Laureate (2018)
- Tips, Tricks, Best Practices for working with Jupyter Notebook's was appreciated by Economics Nobel Laureate 2018 Dr. Paul Romer:
Nirant, this looks very helpful.
— Paul Romer (@paulmromer) April 15, 2018
Re your recommendation to use f-strings, do you know a good place to learn about them for someone new to Python?
Everything I’ve found seems to be for someone making the transition from older ways that a newbie doesn’t need to learn.
- Tips, Tricks, Best Practices for working with Jupyter Notebook's was appreciated by Economics Nobel Laureate 2018 Dr. Paul Romer:
-
FactorDaily's piece on The great rush to data sciences in India ends with a direct quote from me.
- FactorDaily is a new age news company which sits at the intersection of technology with life, culture and society in India.
-
First Runner's Up at the Future Group Datathon (March 2019)
- Two stage Machine Learning hackathon called Tathastu, working on recommendation systems and item information extraction problems
-
Opened AI Hackathon (2019)
- Awesome NCERT won the Best use of IBM Watson API; blog
- Idea: Find recent+relevant news articles against any NCERT chapter in sciences and social studies