Below you will find pages that utilize the taxonomy term “tech”
Writing
Airbnb's Metric Store: Minerva
Data lineage is a problem because most companies have several tables and queries before humans consume it!
This has well known challenges: changes do not propagate downstream from the source, and reliable (fresh, updated or complete) data is not always available.
What does Minerva do? I was expecting Minerva to a database (collection of tables), but it turns out that Minerva is what I’ll call: Data Transformation Manager.
It overlaps quite a bit with dbt but it’s not a pure execution layer.
Writing
Data Science Org Design for Startups
Data Science Org Design for Startups While there is plenty of good advice on making ML work and making a career as a Data Scientist - I think very little discussion happens on the organization design for Data Science itself.
This blog will hopefully help folks not just build their team, but also understand the ecosystem from which they are hiring.
Organization Design is determined by these 3 broad categories:
Writing
How to Read a Deep Learning Paper
Who is this for? Practitioners who are looking to level up their game in Deep Learning
Why Do We Need Instructions on How to Read a Deep Learning Paper? Quantity: There are more papers than we can humanly read even within our own niche. For instance, consider EMNLP - which is arguably the most popular Natural Language Processing conference selects more than 2K papers across a variety of topics. And NLP is just one area!
Writing
First 90 Days - Software Engineer Version
Aditya Ankur, asked me:
I know that there is a book for the first 90 days as an executive. Is there something similar for programmers?
I don’t quite know of a book/essay which covers this yet sticks to the question. So I am writing one for him.
The First 90 Days for a New Engineer I expect each step to take roughly between 10 and 30 days, depending on the pace of your project + size of the team.
Writing
Building a Data Science Team at a Startup
Hello!
If we are meeting for the first time, a short version of my story so far: After doing research engineering for almost 4 years across startups and a BigCo, I joined as an early machine learning engineer at Verloop.io - a B2B startup that makes customer support automation SaaS in 2019. I was there till April 2021.
We were directly responsible for most Natural Language Processing needs within the business.
Writing
Verloop NLP Interview Prep Guide
Update, September 2021: This guide is a little outdated, but not obsolete. I no longer work at Verloop.io.
Preparation Guide I’ve been an early Machine Learning Engineer at Verloop.io for almost 1.5 years, primarily working on NLP problems and now more in an Engineering Manager-ish role.
This is the guide which I sometimes send to our candidates after they submit the Programming Challenge. If a candidate has relevant open source code sample, specially to other repositories we may choose to waive off the Programming Challenge completely.
Writing
Math for Machine Learning
Algebra, Topology, Differential Calculus, andi Optimization Theory For Computer Science and Machine Learning https://www.cis.upenn.edu/~jean/math-deep.pdf
Mathematics for Machine Learning: https://mml-book.github.io/book/mml-book.pdf
http://d2l.ai/chapter_appendix_math/index.html
Writing
ML Model Monitoring
Mayank asked on Twitter:
Some ideas/papers/tools on monitoring models in production. A use case would be say a classification task over large inputs. I want to visualise how are the predicted values or even confidence scores vary over time? (paraphrased)
Quick Hacks pandas-profiling If you are logging confidence scores, you can begin there. The quickest hack is to visualize with pandas-profiling: https://github.com/pandas-profiling/pandas-profiling/
Rolling means Calculate rolling aggregates (e.g. mean, variance) of your confidence scores.
Writing
Best of Python 3 f-strings
This piece is primarily meant for those new to Python. These include mathematicians, economists, and so on who want to use Python within a Jupyter environment. Here is a quick guide on how to make Best of Jupyter.
Quick Primer If you are familiar with earlier Python versions, here are my top picks on how to move from .format () to this new one:
_fstring = f'Total: {one + two}' # Go f-string!
Writing
The Silent Rise of PyTorch Ecosystem
The Silent Rise of PyTorch Ecosystem While Tensorflow has made peace with Keras as it’s high level API and mxNet now support Gluon — PyTorch is the bare matrix love.
PyTorch has seen rapid adoption in academia and all the industrial labs that I have spoken to as well. One of the reasons people (specially engineers doing experiments) like PyTorch is the ease of debugging.
What I don’t like about PyTorch is it’s incessant requirement of debugging because of inconsistent dimensions problems.
Writing
Tech Talk Tips
Collection of the Best Advice on Internet that I know about on giving a tech talk. Based on responses from my question on Twitter.
Meghana gave a talk based on these tips at PyData Bengaluru
You: Hey, I know something better!
Me: Please tell me about it! Raise a PR. Or reply to the tweet above!
The Mindset Presentations Skills Considered Harmful tl;dr: Do not block the message with your actions.
Writing
How to prepare for a Data Science job from college?
A Getting Started Guide
Let us get our facts straight, shall we?
I am writing from my non-existent but probably relevant experience. I worked in a Machine Learning role at Samsung Research, Bengaluru. It is only 1 of the 4 research enterprises which hire Machine Learning researchers from Indian colleges — the other being Microsoft, Xerox, and IBM Watson.
I am now in a even more Computer Vision focused role for a small enterprise tech company.
Writing
The Case Against Work Life Balance
Note from Nirant: This is an archived blog post from here by Syam Sankar. I am not the author of this post. I’m keeping it here since the original source is no longer available.
I’ve taken the liberty to copy paste the raw text:
Given my journey, you can imagine my first reaction to questions of work-life balance is fairly unsympathetic. I want to protest that, by legitimizing such a false dichotomy, you’re pre-empting a much more meaningful conversation.