Skip to content

Writing

ML Model Monitoring

Mayank asked on Twitter:

Some ideas/papers/tools on monitoring models in production. A use case would be say a classification task over large inputs. I want to visualise how are the predicted values or even confidence scores vary over time? (paraphrased)

Quick Hacks

pandas-profiling

If you are logging confidence scores, you can begin there. The quickest hack is to visualize with pandas-profiling: https://github.com/pandas-profiling/pandas-profiling/

Rolling means

Calculate rolling aggregates (e.g. mean, variance) of your confidence scores. pandas inbuilt. Quite quick. Add them to your set of monitoring and alerting product metrics.

A better version of this would be to do it on cohort level. Actually, doing all the following analysis on cohort level makes sense.

Confidence Scores and Thresholds

One of the most common mistakes is to use static threshold(s) on a confidence score(s).

If you hear someone saying that they do not use thresholds for a classification problem. Stop and think. They are using a threshold, usually 0.5 from within the ML library that you are using.

This is sub-optimal. The better option would be to use a holdout validation set and determine the threshold from that.

Tagging Data

It is obvious that you will tag the predictions for which the model is least confident -- so that the model can learn.

What you should also do is this:

  • Find out samples which have high confidence and tag them first, this is a form of negative sample mining

  • For multi-class classification: Figure out samples which did not clear your threshold, and the prediction is correct. Add these back to your new training+validation set

  • Tag samples which are too close to the threshold. This will help you understand your model and dataset's margin of separation better

Training-Serving

The most common causes of trouble in production ML models is training-serving skews or differences.

The differences can be on 3 levels: Data, Features, Predictions

Data Differences

Data differences can be of several types, the most frequest are these: Schema change - someone dropped a column!, Class Distribution Change - When did this 10% training class have 20% predictions, or Data Input Drift - users have started typing instead of copy-pasting!

Schema skew (from Google's ML Guide)

Training and serving input data do not conform to the same schema. The format of the serving data changes while your model continues to train on old data.

Solution? Use the same schema to validate training and serving data. Ensure you separately check for statistics not checked by your schema, such as the fraction of missing values

Class Distribution check with Great Expectations

Training and serving input data should conform to the same class frequency distribution. Confirm this. If not, update the model by training with updated class frequency distribution.

For monitoring these first two, check out: https://github.com/great-expectations/great_expectations

For understanding data drift, you need to visualize data itself. This is too data-domain specific (e.g. text, audio, image). And more often than not, it is just as better to visualize features or vectors.

Feature Viz for Monitoring

Almost all models for high dimensional data (images or text) vectorize data. I am using features and vectorized embedding as loosely synonymous here.

Let's take text as an example:

Class Level with umap

Use any dimensionality reduction like PCA or umap (https://github.com/lmcinnes/umap) for your feature space. Notice that these are on class level.

umap-tweet-plots

Plot similar plots for both training and test, and see if they have similar distributions.

Prediction Viz for Monitoring

Here you can get lazy, but I'd still recommend that you build data-domain specific explainers

Sample Level with LIME

Consider this for text:

lime-viz

Check out other black box ML explainers: https://lilianweng.github.io/lil-log/2017/08/01/how-to-explain-the-prediction-of-a-machine-learning-model.html by the amazing @lilianweng

Class Level

You can aggregate your predictions across multiple samples on a class level:

agg-lime-viz

Training Data Checks

Expanding on @aerinykim's tweet

Robustness

Adding in-domain noise or perturbations should not change the model training and inference both.

Citations and Resources

[1] Machine Learning Testing in Production: https://developers.google.com/machine-learning/testing-debugging/pipeline/production

[2] Recommended by DJ Patil as "Spot On, Excellent": http://www.unofficialgoogledatascience.com/2016/10/practical-advice-for-analysis-of-large.html

[3] Practical NLP by Ameisen: https://bit.ly/nlp-insight. The images for umap, LIME, and aggregated LIME are all from nlp-insight

[4] Machine Learning:The High-Interest Credit Card of Technical Debt: https://storage.googleapis.com/pub-tools-public-publication-data/pdf/43146.pdf

Don't Bask in Reflected Glory

Irshad Kamil

ये जो लोग-बाग हैं, जंगल की आग हैं Ye jo log baag hain, Jungle ki aag hain The people, are jungle fire

क्यूँ आग में जलूँ... Kyun aag mein jalun Why should I burn in the fire?

ये नाकाम प्यार में, खुश हैं ये हार में Ye nakaam pyaar mein, Khush hai yeh haar mein Defeated in love, they're happy in defeat

इन जैसा क्यूँ बनूँ... Inn jaisa kyun banun Why should I be like them?

Identity

I am here to tell you how to keep your identity small. I'll share what I think is the bare minimum context you need to have:

Why should I keep my Identity small?

Paul Graham explains how keeping your identity small can be a competitive advantage:

The most intriguing thing about this theory, if it's right, is that it explains not merely which kinds of discussions to avoid, but how to have better ideas. If people can't think clearly about anything that has become part of their identity, then all other things being equal, the best plan is to let as few things into your identity as possible.

  1. If you know what is part of someone’s identity, you can avoid inflammatory discussions. This can be as simple as avoiding discussions about which shoe brand is the best?

  2. The bigger advantage is this: Better ideas. The ability to think with lucidity, without biases is an unfair advantage. If you act on rational decisions without attacking someone's identity, it's an even bigger advantage which few can compete with.

What do you mean by Identity?

Identity, in the crudest sense are the words (and ideas) that you use to describe yourself - in your head. For instance, while you might be someone who build rocket ships, your identity might be "I am a family man". Or “I am a craftsman engineer making ships for space travel”

Here is a short mental exercise. Fill this blank with at least 3 small phrases:

Your Identity

I am ___

Done?

Awesome. Let's carry on:

Here is what I came up with a few years ago:

My Identity

I am {Indian, guy, techie}

The labels in {} are (in the very crude sense) - my identity. Notice that these labels do not always have to be professional, personal or even Truth in the strictest sense.

They don't have to align with reality in any meaningful way. They simply describe how you perceive yourself.

Want another example? Here is what a friend filled:

My Identity

I am {smart, get things done, make no excuses}

If you have not filled in your three, please pause - take a quick second and type it down somewhere. I'll wait.


Cool, I hope you have your three written or typed somewhere accessible. Let's carry on. We'll come back to it, I promise.

Don't Bask in Reflected Glory

That's it. That is the only thing I want you to do and you will be on your way to learning how do I avoid these labels?

Reflected Glory is the invisible demon which will lead you to the butcher, chop you up into pieces, then slow barbecue you in open air. All this while you and the Demon, both laugh and make merry.

Reflected glory is how Satan wins over honest, smart and hard working people. It is insidious. Sinister. Subtle.

In Professional Circles

Even in certain careers, labeling yourself can be a career limiting move:

Don't Call Yourself a Programmer

If you call yourself a programmer, someone is already working on a way to get you fired - from Don't Call Yourself a Programmer by @patio11

And why are otherwise honest people, so deeply tempted to label themselves professionally?

The best people in most professions are incredibly talented and (often enough) make good money. The professional label is aspirational.

The worse off are your professional skills, the more you need to riff on the shared professional identity of amazing folk in your profession.

Here is an important nuance which I constantly remind myself of:

Stop calling yourself a programmer

Don't bask in Reflected Glory

Keep your identity small

In Politics

Politician's Trick

"People like us ... " - wait, go deeper and think. Your mental dialog will begin: "What does he mean by us? I am nobody. Why does a nobody matter? Is he using my emotions as a stepping stone to get where he wants to go?"

The politician is tricking you into filling in the gaps he left. What do you will fill this with? What makes best sense for you, from your own identity.

He is hacking your identity to make most profit for him. Don't waste your energy. When making rich people richer, get something in return.

But why does this trope work in the first place?

Well, because somewhere in your identity - you allowed words like Indian, American, Patriot or Nationalist to creep in.

Why did you allow them in the first place? Because there are actually positive connotations of being a part of the world's largest and most powerful democracy respectively.

Stop

You walked into my trap.

You basked in the reflected glory of your ancestors.

Both America and India are countries born of blood, sweat and tears. Not enough of which is yours. You haven't done anything meaningfully large to contribute to democracy.

You are basking in the reflected glory of your dead ancestors

Be grateful. Use them as your ideals.

For the sake of good that is oft interred with bones, don't bask in glory of your dead ancestors

Do the work that when you meet your ideals, they can be proud

Do the work such that you can be proud

Don't bask in reflected glory

Keep your identity small

In Casual Conversation

Bragging

"Look at Michael, an Olympic athlete, I knew him in college" - wait, and think. You might end up going this way: "Why is that relevant in this conversation? Is this an offer to make an intro to the Olympian, so that I improve my game? Or is this just banter-bragging? And why brag?"

Brag about achievements of someone you know. Or worked with. Or went to school with.

Everyone around you looks at you for a brief moment of envy, or pride if they love you.

They could have been proud of something you did. And instead they're proud that you know people who do things. You are basking in the reflected glory of others.

The first time I internalized this perspective, I was deeply embarrassed.

Given that you've read this piece so far, I hope something for you.

I hope that when you go to sleep tonight - you will sleep with the resolution to not bask in reflected glory of your friends and family.

No matter how much you love them. You will be proud of them. And you will not bask in their glory. Don't bask in their reflected glory.

It's their spotlight and you will not even a light a candle to steal that

Your Three

Remember the three blanks we filled? Those are the your borrowed labels. No matter how innocent, they cloud your judgment.

Those are the ideas - and people whose reflected glory you are stealing unknowingly

Even the Get things done is basking under the reflected glory of people who actually get things done. It clutters your thinking by making you swing for the extremes. By not letting you slow down.

It might be quite some time before my friend sees that it's about the getting the right things done.

That we are human beings, not human doings.

The moon waxes and wanes because it basks in reflected glory - but even the darkest clouds cannot hide the glory of sun

Borrowed Glory

When your contributions are meaningful enough, you don't need borrowed glory. The sun shines it's own light, it's the moon that steals light

Best of Python 3 f-strings

This piece is primarily meant for those new to Python. These include mathematicians, economists, and so on who want to use Python within a Jupyter environment. Here is a quick guide on how to make Best of Jupyter.

Quick Primer

If you are familiar with earlier Python versions, here are my top picks on how to move from .format () to this new one:

{{< highlight python >}}

_fstring = f'Total: {one + two}' # Go f-string! _format = 'Total: {}'.format(one + two) _percent = 'Total: %s' % (one + two) _concatenation = 'Total: ' + str(one + two) assert _fstring == _format == _percent == _concatenation {{< /highlight >}}

f-string Magic

f-strings are how you should use print statements in Python. It is fairly reminiscent of LaTeX in it’s inline notation: {{< highlight python >}}

inline variables, similar to LaTeX

name = "Fred" print(f"He said his name is {name}.")

'He said his name is Fred.'

{{< /highlight >}}

Notice how the variable name can now be used inline. This is a simple and easy to use syntax: just include the variable in surrounding {} while marking the string type as f-string using the ‘f’ in the beginning.

Note to the advanced programmer:

‘f’ may be combined with ‘r’ to produce raw f-string which can be used inside regex or similar functions. ‘f’ may not be combined with ‘u’, this is because all Python3.6+ strings are Unicode by default now. This means, you can write fstrings in Hindi, Chinese, French, Korean and atleast 10 other languages.

You can write fstrings in Hindi, Chinese, French, Korean and any language covered by Unicode.

But why are these called formatted-strings in the first place? Because you can use with some cool formatting hacks.

Simplified Alignment and Spacing

Have you ever tried creating a table such as that for logging or visualization? Arranging the elements becomes a nightmare with several \t tab characters flying around.

This is much easier with Python f-strings using the colon ‘:’ operator, followed by a an alignment operator and field width value.

There are atleast 3 alignment operator: < for left aligned, > for right aligned, and ^ for center aligned. Refer the code example:

{{< highlight python >}} correct = 'correct' phonetic_correct = 'phonetic_correct' typo = 'typo' phonetic_typo = 'phonetic_typo' phonetic_distance = 'phonetic_distance'

{{< /highlight >}} {{< highlight python >}}

print(f'No Spacing:') print(f'{correct}|{phonetic_correct}|{typo}|{phonetic_typo}|{phonetic_distance}|\n')

No Spacing:

correct|phonetic_correct|typo|phonetic_typo|phonetic_distance|

{{< /highlight >}} {{< highlight python >}}

print(f'Right Aligned:') print(f'{correct:>10}|{phonetic_correct:>20}|{typo:>10}|{phonetic_typo:>20}|{phonetic_distance:>20}|\n')

Right Aligned:

correct| phonetic_correct| typo| phonetic_typo| phonetic_distance|

{{< /highlight >}} {{< highlight python >}}

print(f'Left Aligned:') print(f'{correct:<10}|{phonetic_correct:<20}|{typo:<10}|{phonetic_typo:<20}|{phonetic_distance:<20}|\n')

Left Aligned:

correct |phonetic_correct |typo |phonetic_typo |phonetic_distance |

{{< /highlight >}} {{< highlight python >}}

print(f'Centre Aligned:') print(f'{correct:10}|{phonetic_correct:20}|{typo:10}|{phonetic_typo:20}|{phonetic_distance:^20}|')

Centre Aligned:

correct | phonetic_correct | typo | phonetic_typo | phonetic_distance |

{{< /highlight >}}

You also have support for decimal truncation and similar standard formatting utilities: {{< highlight python >}}

auto-resolve variable scope when nested

width = 10 precision = 4 value = decimal.Decimal("12.34567") print(f"result: {value:{width}.{precision}}") # nested fields

'result: 12.35'

{{< /highlight >}}

You might notice something interesting here: width and precision are automatically picked up from the scope. This means you can calculate width and precision using screen width or other inputs from system and use that.

Full Python Expressions Support

The above is only possible because the expression inside {} is actually being evaluated, or in programming terms: being executed.

This implies, that you can make any function call from within those {}.

Though, you should avoid doing this in practice very often because it might make your debugging very difficult. Instead, store the returned value from function in a variable and then add the variable in a fstring print statement.

Those coming from functional programming might miss their lambda functions. Don’t worry, Python has you covered:

Lambda Functions in f-strings

{{< highlight python >}}

If you feel you must use lambdas, they may be used inside of parentheses:

print(f'{(lambda x: x*3)(3)}')

'9'

note that this returned a and not

{{< /highlight >}}

Summary

  • f strings mean you can include variables and function calls inside your print statements
  • Inline variables: these are easier to read and debug for the developer
  • Use f-strings when you can!