Skip to content

Writing

Beyond First 90 Days

This one's gonna be brief and echoes 2 Less Obvious Ideas to the younger me.

I am assuming that you already know the hygiene factors: Make few promises. Keep most of them and exceed few of them atleast. Get to like the top 5% in the skill of effort estimation for your own work at the very least. And so on.

Contribute to Developer Ecosystem

Improving any part of the developer ecosystem is useful and visible at the same time. For instance, let's say you add tests for a code path on which 10 developers are working. You've made the lives of 10 developers easier. They'll remember this when you come to them for help.

For some projects/teams, even the build time is quite large and error prone. Any improvements there also save a lot of contributor or developer time.

As Joel Spolsky (the person behind Stack Overflow) wrote: There is more than 1 way to help:

  • Maintaining an issue tracker
  • Write a decent functional specification

You get the gist. Get creative and figure out points of leverage: low effort, high return on your time.

Engineering Brand Efforts

You already know what are the 1-2 things your team's best is e.g. speed, scale, cadence, or software quality.

Take those 2 topics and write down 5 reasons or points of evidence on why you think those are the 2 topics on which your team is best. For instance, if I was writing "speed" - one of my points would look like, we make 20 releases a week to almost 500K users. Or, we have fewer than 20 bugs for a release thanks to our amazing testing and QA friends.

Now - expand these 5 points into a short, bullet point essay like this one. Ask your manager and other senior engineers for advice. You say something like, "Hey, I wrote down what our team does best - do you think I captured the essence and reasoning?"

Done?

Great, now go write this as an internal and external blog. Submit this at a technical conference which cares about the dimension on which your team is the best. Bringing accolades to the team, with their blessings is much higher returns than reading 10 Medium blogs.

Dos and Don'ts for ML Hiring

This is primarily for my future self. These are observations based on my own experience of 2 years at Verloop.io and helping a few companies hire for similar roles.

Do

  • Seniormost hire first: Start by hiring the senior most person you're going to hire. E.g. start by hiring the ML Lead (assuming you already have a CTO)
  • Have a means to tell that your investment in data science is working out or not
  • Closest to User first: Hire the person who will consume the data to build for the user first
  • Sourcing: Begin sourcing early and over-emphasise two channels: Referrals and Portfolios
    • Typically, in India - expect:
      • ~2 months to close a full time role at early career (0-3 years) and
      • ~3 months to close a mid career (3-7 years) and
      • 6+ months to close a senior hire
  • If a developer has open source code contributions in the last 2-3 years, consider waving off the coding or algorithmic challenge to speed up the interview process
  • Pay above market cash salaries
    • In 12-18 months from now, when your ML Engineer will have internalised all the requirements, company culture and built a bunch of important tooling - she would get an offer which is 2-3x of today. If you're already paying above market salaries, a 20-30% jump is quite often enough to retain many folks
  • Have 3 at least versions of your shipping timeline
  • Do hire full stack Data Science people/teams. If you're hiring for early members for your team, this is practical necessity. An example of T shaped skills could look like

Don't

  • Rely on HR or your usual backend engineering hiring channels to work well for you, in general
  • Don't hire the person who builds the means to move data (e.g. ML before data engineering) before hiring atleast 1-2 stakeholders in ML
    • Why? This is because it's cheaper (and often faster) to change ML modeling approach than to make changes in data engineering pipelines
  • Don't start by hiring an intern to implement papers or take things to production before you've done them
  • Don't expect data science to deliver or ship at the same "user value" pace as Product Engineering
    • Why? Data Science suffers from the twin problems of being new and experiment-driven
  • Don't assume that you've so much data, and since all of it is queryable, it's all usable
  • Hire ultra-specialists e.g. post-docs and PhDs too early, barring products which requires invention and not application

Why I Quit Data Science

Question from a friend: I am interested in knowing how did you come to this decision of moving to SWE from DS/MLE. Since I've been asked a variant of this question quite a few times, I thought it would be good to share my answer.

What kind of research did you do to get to this decision?

I spoke to a lot of people who were both big companies and startups. I also spoke to folks across multiple markets: Singapore, India, US & Europe. I primarily spoke to people with more than 10-12 years of experience. This is a big difference in my perspective.

What were your considerations while making this decision?

Skills

This is how I understand the world today: There are 3 primary functions around data: data engineering, modeling (e.g. predictive) and data analytics.

I could keep going deeper into modeling e.g. learning more about CNNs and Transformers. Between writing the NLP Book and professional Machine Learning, I'd guess that I'm in top 20-30% of the world doing this. The journey to get from here to being the best is hard and I'm not sure if I'm going to be able to do it.

The field also suffers a bit from the Red Queen effect on the applied side of things. I'm not sure if I want to keep doing it 5, 10, or 20 years from now. I started doing Machine Learning because I was interested in the field and I was curious about how it would work.

It's no longer about the thrill of solving a puzzle/problem anymore. The roles I've access to, have the drudgery of making the same pipelines work in similar ways and then applying them to different problems.

I'd much rather add another skill and get to the top 25% in it -- and then quickly rise to the top on it's intersection. This will also be easier as I've tons of novelty and new ideas to learn.

Since analytics roles are neither well respected nor well paid, the process of elimination works. I'd rather be a platform enginer than a data/product analyst.

Competition

Within this, let's talk separately about pre-series D startups and big companies (e.g. FAANG/MAGA) for modeling roles. Startups are usually open to hiring folks without a MS/PhD degree, while big companies are more open to hiring folks with a MS/PhD.

For modeling roles at big companies you will be competing with folks with a PhD. For startups, I often see they end up hiring better trained folks as they scale up and relegate older, less 'specialised' folk to roles closer to engineering (e.g. API Design, uptime) and away from modeling.

It's also much harder for me, personally to find truly exceptional Machine Learning mentors, but relatively easier to find proven, battle-tested quite senior engineers. And as much as you might underestimate the role of coaching, I believe that in our craft - it can save you 4-5 years of learning time.

Title/Impact

I've the least confidence on this being true over a longer duration. But I'm mentioning it here since a lot of senior people do think about this.

Growing within modeling-related roles is hard and you hit the ceiling as Head of Machine Learning. Notice that in most of the cases, you are not even Head of Data, you're Head of Research or ML or some function within Data. The Head of Data in turns reports to the senior most Engineering Leader e.g. the CTO.

This means your influence over things which shape your day: tooling, infrastructure, product direction, org structure, promotions etc. is limited. You can't even learn these things.

I'd like to keep my options of becoming a Engineering Manager/VP Engineering/CTO in a few years. I'd much prefer that to Head of Data Science or Analytics. This option is so much more valuable to me that I'm happy to pay a price to "buy" it.

What was/is the goal of this particular switch? What were you trying to optimize?

I'm optimizing for being great (but not best) at the intersection of 3 things instead of 1 narrowly, clearly defined role. I'm trying to get to the top of the intersection.

I was also bored by the mundanity of problems you encounter in typical early-stage startups. The need to trade off personal-notion-of-quality for speed is sometimes a bit of a problem, but I usually enjoyed the challenge.

Why not Machine Learning Engineering at a Big Company?

I fear that this role combines the worst of two worlds. You've the skills of a backend engineer: you can design microservices, implement them, scale them, deploy them, and manage the infrastructure. But you also have the skills of a Data Scientist: you can build models, train them, deploy them, and manage the experimentation infrastructure.

You don't get paid or recognized for either of them. The backend developer thinks of you as a "ML guy" and the Data Scientist thinks of you as a "Backend guy". This is made worse at a Big Company because they tend to reward specialists via promotions. You're going to get underpaid for both roles.

Not to mention that a large fraction of your knowledge is getting outdated faster than I can learn. Of course, you might be 10x faster, better learner than me - in which case this blog post is not meant for you.

Why not take a 1 year Research focussed Sabbatical?

Well, because companies which ask for skills acquired via a MS/PhD are often not willing to pay for a 1 year research year. It'd not be that much better than being endorsed by Dr. Andrew Ng, writing a book, mentoring folks for ACL papers and speaking at PyCon India.

What information did you find for and against this switch?

Stepping away from Machine Learning Lead roles can be a massive cash and title/designation downgrade. It definitely turned out to be true for me.

My alternate job offer was a Series B/C Machine Learning Lead, instead of a Platform Engineer. I would not be happy with the role, but I'd be very happy with the salary. It'd be 3-4x in cash, and 4-5x in total compensation terms. Another way to look at this, I took at 75% cut on my cash compensation.

I'm betting that I'll have a lot more fun doing this, but I'm also betting that I'll be a little more successful - which will compensate for this over a 4-10 year chapter of the career.

In addition to the cash and title hit, it's a bit of social shaming: People might be inclined to assume that you were not quite good as a Data Scientist and that is why you moved to SWE. I don't care enough about that to influence my decision but I do care about it and hence worth mentioning.

My Machine Learning skills will also atrophy with time. I'll be able to get to similar productivity faster in a few years because the half-life of knowledge in the field is really short and tooling improvements make it easier to ship well.