2019¶

September 24, 2019
in niranting
2 min read

On Death

Is usually better the sooner it happens. Just like it’s best to retire from your cricket career at your peak for your own sake, it’s best to die when you are at your peak. You just might become Father of the Nation.
No one wishes that they’d accomplished less on their death beds. You might hear that no one wishes that they’d worked more. That’s right too. And there in lies the great human stupidity. You can’t hope to accomplish without working more. You might accomplish less despite working more.

This is also my answer for what I think is true, but almost no one agrees with me (from Peter Thiel’s Zero to One)

No one (with as many safety nets as me) would wish that they’d take less risks in their 20s. Cool people (like cool companies) are polite, go against less empathetic institutions and take risks (h/t Jeff Bezos)

On Purpose of Life

There is no higher purpose. Why would a microscopic life from a floating blue speck in the dark cold space have a higher purpose? It’s arrogant to assume otherwise
Serve the man next to you, then the one next to him and so on
Follow the Paradoxical Commandments, not because they help, but follow them anyway.

On Philosophy of Life

Options, choices are net bad. Decisions are good. Defaults are better. Burn your bridges. All-you-can eat buffets are scams. Having good looks or being visibly rich is a dating disadvantage because it creates options.
When in doubt, look for answers from the dead: Sufis, Stoics, Mystics, and Buddhists.
Eastern philosophy is more tolerable than Western philosophy. Here is how they contrast
Simple systems that fail at edges beat complicated systems that can handle edge cases. As in engineering, so in life. Regret minimization framework (h/t Jeff Bezos) seems like the best candidate and I am testing it as I write this uncomfortably honest piece

On Money

Time value of money is underrated
Money is like sex. It’s good to have a lot of it when you are young. That still won’t fill the hole left in your heart by not having a philosophy
If you’ve money, invest in your learning by paying for coffees with smarter people. Books, blogs, and those Coursera courses are a slow learning curve

On Work-Life Balance

The Case Against Work Life Balance is stronger than ever.
- Related: Keep looking for your Gamma Radiation

An older version of this lives on Medium

September 21, 2019
in tech, machine-learning, production
4 min read

ML Model Monitoring

Mayank asked on Twitter:

Some ideas/papers/tools on monitoring models in production. A use case would be say a classification task over large inputs. I want to visualise how are the predicted values or even confidence scores vary over time? (paraphrased)

Quick Hacks

pandas-profiling

If you are logging confidence scores, you can begin there. The quickest hack is to visualize with pandas-profiling: https://github.com/pandas-profiling/pandas-profiling/

Rolling means

Calculate rolling aggregates (e.g. mean, variance) of your confidence scores. pandas inbuilt. Quite quick. Add them to your set of monitoring and alerting product metrics.

A better version of this would be to do it on cohort level. Actually, doing all the following analysis on cohort level makes sense.

Confidence Scores and Thresholds

One of the most common mistakes is to use static threshold(s) on a confidence score(s).

If you hear someone saying that they do not use thresholds for a classification problem. Stop and think. They are using a threshold, usually 0.5 from within the ML library that you are using.

This is sub-optimal. The better option would be to use a holdout validation set and determine the threshold from that.

Tagging Data

It is obvious that you will tag the predictions for which the model is least confident -- so that the model can learn.

What you should also do is this:

Find out samples which have high confidence and tag them first, this is a form of negative sample mining
For multi-class classification: Figure out samples which did not clear your threshold, and the prediction is correct. Add these back to your new training+validation set
Tag samples which are too close to the threshold. This will help you understand your model and dataset's margin of separation better

Training-Serving

The most common causes of trouble in production ML models is training-serving skews or differences.

The differences can be on 3 levels: Data, Features, Predictions

Data differences can be of several types, the most frequest are these: Schema change - someone dropped a column!, Class Distribution Change - When did this 10% training class have 20% predictions, or Data Input Drift - users have started typing instead of copy-pasting!

Schema skew (from Google's ML Guide)

Training and serving input data do not conform to the same schema. The format of the serving data changes while your model continues to train on old data.

Solution? Use the same schema to validate training and serving data. Ensure you separately check for statistics not checked by your schema, such as the fraction of missing values

Class Distribution check with Great Expectations

Training and serving input data should conform to the same class frequency distribution. Confirm this. If not, update the model by training with updated class frequency distribution.

For monitoring these first two, check out: https://github.com/great-expectations/great_expectations

For understanding data drift, you need to visualize data itself. This is too data-domain specific (e.g. text, audio, image). And more often than not, it is just as better to visualize features or vectors.

Feature Viz for Monitoring

Almost all models for high dimensional data (images or text) vectorize data. I am using features and vectorized embedding as loosely synonymous here.

Let's take text as an example:

Class Level with umap

Use any dimensionality reduction like PCA or umap (https://github.com/lmcinnes/umap) for your feature space. Notice that these are on class level.

umap-tweet-plots

Plot similar plots for both training and test, and see if they have similar distributions.

Prediction Viz for Monitoring

Here you can get lazy, but I'd still recommend that you build data-domain specific explainers

Sample Level with LIME

Consider this for text:

lime-viz

Check out other black box ML explainers: https://lilianweng.github.io/lil-log/2017/08/01/how-to-explain-the-prediction-of-a-machine-learning-model.html by the amazing @lilianweng

Class Level

You can aggregate your predictions across multiple samples on a class level:

agg-lime-viz

Training Data Checks

Expanding on @aerinykim's tweet

Robustness

Adding in-domain noise or perturbations should not change the model training and inference both.

Citations and Resources

[1] Machine Learning Testing in Production: https://developers.google.com/machine-learning/testing-debugging/pipeline/production

[2] Recommended by DJ Patil as "Spot On, Excellent": http://www.unofficialgoogledatascience.com/2016/10/practical-advice-for-analysis-of-large.html

[3] Practical NLP by Ameisen: https://bit.ly/nlp-insight. The images for umap, LIME, and aggregated LIME are all from nlp-insight

[4] Machine Learning:The High-Interest Credit Card of Technical Debt: https://storage.googleapis.com/pub-tools-public-publication-data/pdf/43146.pdf

September 18, 2019
in careers
6 min read

Don't Bask in Reflected Glory

Irshad Kamil

ये जो लोग-बाग हैं, जंगल की आग हैं Ye jo log baag hain, Jungle ki aag hain The people, are jungle fire

क्यूँ आग में जलूँ... Kyun aag mein jalun Why should I burn in the fire?

ये नाकाम प्यार में, खुश हैं ये हार में Ye nakaam pyaar mein, Khush hai yeh haar mein Defeated in love, they're happy in defeat

इन जैसा क्यूँ बनूँ... Inn jaisa kyun banun Why should I be like them?

Identity

I am here to tell you how to keep your identity small. I'll share what I think is the bare minimum context you need to have:

Why should I keep my Identity small?

Paul Graham explains how keeping your identity small can be a competitive advantage:

The most intriguing thing about this theory, if it's right, is that it explains not merely which kinds of discussions to avoid, but how to have better ideas. If people can't think clearly about anything that has become part of their identity, then all other things being equal, the best plan is to let as few things into your identity as possible.

If you know what is part of someone’s identity, you can avoid inflammatory discussions. This can be as simple as avoiding discussions about which shoe brand is the best?
The bigger advantage is this: Better ideas. The ability to think with lucidity, without biases is an unfair advantage. If you act on rational decisions without attacking someone's identity, it's an even bigger advantage which few can compete with.

What do you mean by Identity?

Identity, in the crudest sense are the words (and ideas) that you use to describe yourself - in your head. For instance, while you might be someone who build rocket ships, your identity might be "I am a family man". Or “I am a craftsman engineer making ships for space travel”

Here is a short mental exercise. Fill this blank with at least 3 small phrases:

Your Identity

I am ___

Done?

Awesome. Let's carry on:

Here is what I came up with a few years ago:

My Identity

I am {Indian, guy, techie}

The labels in {} are (in the very crude sense) - my identity. Notice that these labels do not always have to be professional, personal or even Truth in the strictest sense.

They don't have to align with reality in any meaningful way. They simply describe how you perceive yourself.

Want another example? Here is what a friend filled:

My Identity

I am {smart, get things done, make no excuses}

If you have not filled in your three, please pause - take a quick second and type it down somewhere. I'll wait.

Cool, I hope you have your three written or typed somewhere accessible. Let's carry on. We'll come back to it, I promise.

Don't Bask in Reflected Glory

That's it. That is the only thing I want you to do and you will be on your way to learning how do I avoid these labels?

Reflected Glory is the invisible demon which will lead you to the butcher, chop you up into pieces, then slow barbecue you in open air. All this while you and the Demon, both laugh and make merry.

Reflected glory is how Satan wins over honest, smart and hard working people. It is insidious. Sinister. Subtle.

In Professional Circles

Even in certain careers, labeling yourself can be a career limiting move:

Don't Call Yourself a Programmer

If you call yourself a programmer, someone is already working on a way to get you fired - from Don't Call Yourself a Programmer by @patio11

And why are otherwise honest people, so deeply tempted to label themselves professionally?

The best people in most professions are incredibly talented and (often enough) make good money. The professional label is aspirational.

The worse off are your professional skills, the more you need to riff on the shared professional identity of amazing folk in your profession.

Here is an important nuance which I constantly remind myself of:

Stop calling yourself a programmer

Don't bask in Reflected Glory

Keep your identity small

In Politics

Politician's Trick

"People like us ... " - wait, go deeper and think. Your mental dialog will begin: "What does he mean by us? I am nobody. Why does a nobody matter? Is he using my emotions as a stepping stone to get where he wants to go?"

The politician is tricking you into filling in the gaps he left. What do you will fill this with? What makes best sense for you, from your own identity.

He is hacking your identity to make most profit for him. Don't waste your energy. When making rich people richer, get something in return.

But why does this trope work in the first place?

Well, because somewhere in your identity - you allowed words like Indian, American, Patriot or Nationalist to creep in.

Why did you allow them in the first place? Because there are actually positive connotations of being a part of the world's largest and most powerful democracy respectively.

Stop

You walked into my trap.

You basked in the reflected glory of your ancestors.

Both America and India are countries born of blood, sweat and tears. Not enough of which is yours. You haven't done anything meaningfully large to contribute to democracy.

You are basking in the reflected glory of your dead ancestors

Be grateful. Use them as your ideals.

For the sake of good that is oft interred with bones, don't bask in glory of your dead ancestors

Do the work that when you meet your ideals, they can be proud

Do the work such that you can be proud

Don't bask in reflected glory

Keep your identity small

In Casual Conversation

Bragging

"Look at Michael, an Olympic athlete, I knew him in college" - wait, and think. You might end up going this way: "Why is that relevant in this conversation? Is this an offer to make an intro to the Olympian, so that I improve my game? Or is this just banter-bragging? And why brag?"

Brag about achievements of someone you know. Or worked with. Or went to school with.

Everyone around you looks at you for a brief moment of envy, or pride if they love you.

They could have been proud of something you did. And instead they're proud that you know people who do things. You are basking in the reflected glory of others.

The first time I internalized this perspective, I was deeply embarrassed.

Given that you've read this piece so far, I hope something for you.

I hope that when you go to sleep tonight - you will sleep with the resolution to not bask in reflected glory of your friends and family.

No matter how much you love them. You will be proud of them. And you will not bask in their glory. Don't bask in their reflected glory.

It's their spotlight and you will not even a light a candle to steal that

Your Three

Remember the three blanks we filled? Those are the your borrowed labels. No matter how innocent, they cloud your judgment.

Those are the ideas - and people whose reflected glory you are stealing unknowingly

Even the Get things done is basking under the reflected glory of people who actually get things done. It clutters your thinking by making you swing for the extremes. By not letting you slow down.

It might be quite some time before my friend sees that it's about the getting the right things done.

That we are human beings, not human doings.

The moon waxes and wanes because it basks in reflected glory - but even the darkest clouds cannot hide the glory of sun

Borrowed Glory

When your contributions are meaningful enough, you don't need borrowed glory. The sun shines it's own light, it's the moon that steals light

2019¶

Strong Beliefs, Loosely Held

On Death

On Purpose of Life

On Philosophy of Life

On Money

On Work-Life Balance

ML Model Monitoring

Quick Hacks

pandas-profiling

Rolling means

Confidence Scores and Thresholds

Tagging Data

Training-Serving

Data Differences

Schema skew (from Google's ML Guide)

Class Distribution check with Great Expectations

Feature Viz for Monitoring

Class Level with umap

Prediction Viz for Monitoring

Sample Level with LIME

Class Level

Training Data Checks

Robustness

Citations and Resources

Don't Bask in Reflected Glory

Identity

Why should I keep my Identity small?

What do you mean by Identity?

Don't Bask in Reflected Glory

In Professional Circles

In Politics

In Casual Conversation

Your Three