Question from a friend: I am interested in knowing how did you come to this decision of moving to SWE from DS/MLE. Since I’ve been asked a variant of this question quite a few times, I thought it would be good to share my answer.
What kind of research did you do to get to this decision?
I spoke to a lot of people who were both big companies and startups. I also spoke to folks across multiple markets: Singapore, India, US & Europe. I primarily spoke to people with more than 10-12 years of experience. This is a big difference in my perspective.
What were your considerations while making this decision?
This is how I understand the world today: There are 3 primary functions around data: data engineering, modeling (e.g. predictive) and data analytics.
I could keep going deeper into modeling e.g. learning more about CNNs and Transformers. Between writing the NLP Book and professional Machine Learning, I’d guess that I’m in top 20-30% of the world doing this. The journey to get from here to being the best is hard and I’m not sure if I’m going to be able to do it.
The field also suffers a bit from the Red Queen effect on the applied side of things. I’m not sure if I want to keep doing it 5, 10, or 20 years from now. I started doing Machine Learning because I was interested in the field and I was curious about how it would work.
It’s no longer about the thrill of solving a puzzle/problem anymore. The roles I’ve access to, have the drudgery of making the same pipelines work in similar ways and then applying them to different problems.
I’d much rather add another skill and get to the top 25% in it – and then quickly rise to the top on it’s intersection. This will also be easier as I’ve tons of novelty and new ideas to learn.
Since analytics roles are neither well respected nor well paid, the process of elimination works. I’d rather be a platform enginer than a data/product analyst.
Within this, let’s talk separately about pre-series D startups and big companies (e.g. FAANG/MAGA) for modeling roles. Startups are usually open to hiring folks without a MS/PhD degree, while big companies are more open to hiring folks with a MS/PhD.
For modeling roles at big companies you will be competing with folks with a PhD. For startups, I often see they end up hiring better trained folks as they scale up and relegate older, less ‘specialised’ folk to roles closer to engineering (e.g. API Design, uptime) and away from modeling.
It’s also much harder for me, personally to find truly exceptional Machine Learning mentors, but relatively easier to find proven, battle-tested quite senior engineers. And as much as you might underestimate the role of coaching, I believe that in our craft - it can save you 4-5 years of learning time.
I’ve the least confidence on this being true over a longer duration. But I’m mentioning it here since a lot of senior people do think about this.
Growing within modeling-related roles is hard and you hit the ceiling as Head of Machine Learning. Notice that in most of the cases, you are not even Head of Data, you’re Head of Research or ML or some function within Data. The Head of Data in turns reports to the senior most Engineering Leader e.g. the CTO.
This means your influence over things which shape your day: tooling, infrastructure, product direction, org structure, promotions etc. is limited. You can’t even learn these things.
I’d like to keep my options of becoming a Engineering Manager/VP Engineering/CTO in a few years. I’d much prefer that to Head of Data Science or Analytics. This option is so much more valuable to me that I’m happy to pay a price to “buy” it.
What was/is the goal of this particular switch? What were you trying to optimize?
I’m optimizing for being great (but not best) at the intersection of 3 things instead of 1 narrowly, clearly defined role. I’m trying to get to the top of the intersection.
I was also bored by the mundanity of problems you encounter in typical early-stage startups. The need to trade off personal-notion-of-quality for speed is sometimes a bit of a problem, but I usually enjoyed the challenge.
Why not Machine Learning Engineering at a Big Company?
I fear that this role combines the worst of two worlds. You’ve the skills of a backend engineer: you can design microservices, implement them, scale them, deploy them, and manage the infrastructure. But you also have the skills of a Data Scientist: you can build models, train them, deploy them, and manage the experimentation infrastructure.
You don’t get paid or recognized for either of them. The backend developer thinks of you as a “ML guy” and the Data Scientist thinks of you as a “Backend guy”. This is made worse at a Big Company because they tend to reward specialists via promotions. You’re going to get underpaid for both roles.
Not to mention that a large fraction of your knowledge is getting outdated faster than I can learn. Of course, you might be 10x faster, better learner than me - in which case this blog post is not meant for you.
Why not take a 1 year Research focussed Sabbatical?
Well, because companies which ask for skills acquired via a MS/PhD are often not willing to pay for a 1 year research year. It’d not be that much better than being endorsed by Dr. Andrew Ng, writing a book, mentoring folks for ACL papers and speaking at PyCon India.
What information did you find for and against this switch?
Stepping away from Machine Learning Lead roles can be a massive cash and title/designation downgrade. It definitely turned out to be true for me.
My alternate job offer was a Series B/C Machine Learning Lead, instead of a Platform Engineer. I would not be happy with the role, but I’d be very happy with the salary. It’d be 3-4x in cash, and 4-5x in total compensation terms. Another way to look at this, I took at 75% cut on my cash compensation.
I’m betting that I’ll have a lot more fun doing this, but I’m also betting that I’ll be a little more successful - which will compensate for this over a 4-10 year chapter of the career.
In addition to the cash and title hit, it’s a bit of social shaming: People might be inclined to assume that you were not quite good as a Data Scientist and that is why you moved to SWE. I don’t care enough about that to influence my decision but I do care about it and hence worth mentioning.
My Machine Learning skills will also atrophy with time. I’ll be able to get to similar productivity faster in a few years because the half-life of knowledge in the field is really short and tooling improvements make it easier to ship well.