Skip to content

careers

Why I Quit Data Science

Question from a friend: I am interested in knowing how did you come to this decision of moving to SWE from DS/MLE. Since I've been asked a variant of this question quite a few times, I thought it would be good to share my answer.

What kind of research did you do to get to this decision?

I spoke to a lot of people who were both big companies and startups. I also spoke to folks across multiple markets: Singapore, India, US & Europe. I primarily spoke to people with more than 10-12 years of experience. This is a big difference in my perspective.

What were your considerations while making this decision?

Skills

This is how I understand the world today: There are 3 primary functions around data: data engineering, modeling (e.g. predictive) and data analytics.

I could keep going deeper into modeling e.g. learning more about CNNs and Transformers. Between writing the NLP Book and professional Machine Learning, I'd guess that I'm in top 20-30% of the world doing this. The journey to get from here to being the best is hard and I'm not sure if I'm going to be able to do it.

The field also suffers a bit from the Red Queen effect on the applied side of things. I'm not sure if I want to keep doing it 5, 10, or 20 years from now. I started doing Machine Learning because I was interested in the field and I was curious about how it would work.

It's no longer about the thrill of solving a puzzle/problem anymore. The roles I've access to, have the drudgery of making the same pipelines work in similar ways and then applying them to different problems.

I'd much rather add another skill and get to the top 25% in it -- and then quickly rise to the top on it's intersection. This will also be easier as I've tons of novelty and new ideas to learn.

Since analytics roles are neither well respected nor well paid, the process of elimination works. I'd rather be a platform enginer than a data/product analyst.

Competition

Within this, let's talk separately about pre-series D startups and big companies (e.g. FAANG/MAGA) for modeling roles. Startups are usually open to hiring folks without a MS/PhD degree, while big companies are more open to hiring folks with a MS/PhD.

For modeling roles at big companies you will be competing with folks with a PhD. For startups, I often see they end up hiring better trained folks as they scale up and relegate older, less 'specialised' folk to roles closer to engineering (e.g. API Design, uptime) and away from modeling.

It's also much harder for me, personally to find truly exceptional Machine Learning mentors, but relatively easier to find proven, battle-tested quite senior engineers. And as much as you might underestimate the role of coaching, I believe that in our craft - it can save you 4-5 years of learning time.

Title/Impact

I've the least confidence on this being true over a longer duration. But I'm mentioning it here since a lot of senior people do think about this.

Growing within modeling-related roles is hard and you hit the ceiling as Head of Machine Learning. Notice that in most of the cases, you are not even Head of Data, you're Head of Research or ML or some function within Data. The Head of Data in turns reports to the senior most Engineering Leader e.g. the CTO.

This means your influence over things which shape your day: tooling, infrastructure, product direction, org structure, promotions etc. is limited. You can't even learn these things.

I'd like to keep my options of becoming a Engineering Manager/VP Engineering/CTO in a few years. I'd much prefer that to Head of Data Science or Analytics. This option is so much more valuable to me that I'm happy to pay a price to "buy" it.

What was/is the goal of this particular switch? What were you trying to optimize?

I'm optimizing for being great (but not best) at the intersection of 3 things instead of 1 narrowly, clearly defined role. I'm trying to get to the top of the intersection.

I was also bored by the mundanity of problems you encounter in typical early-stage startups. The need to trade off personal-notion-of-quality for speed is sometimes a bit of a problem, but I usually enjoyed the challenge.

Why not Machine Learning Engineering at a Big Company?

I fear that this role combines the worst of two worlds. You've the skills of a backend engineer: you can design microservices, implement them, scale them, deploy them, and manage the infrastructure. But you also have the skills of a Data Scientist: you can build models, train them, deploy them, and manage the experimentation infrastructure.

You don't get paid or recognized for either of them. The backend developer thinks of you as a "ML guy" and the Data Scientist thinks of you as a "Backend guy". This is made worse at a Big Company because they tend to reward specialists via promotions. You're going to get underpaid for both roles.

Not to mention that a large fraction of your knowledge is getting outdated faster than I can learn. Of course, you might be 10x faster, better learner than me - in which case this blog post is not meant for you.

Why not take a 1 year Research focussed Sabbatical?

Well, because companies which ask for skills acquired via a MS/PhD are often not willing to pay for a 1 year research year. It'd not be that much better than being endorsed by Dr. Andrew Ng, writing a book, mentoring folks for ACL papers and speaking at PyCon India.

What information did you find for and against this switch?

Stepping away from Machine Learning Lead roles can be a massive cash and title/designation downgrade. It definitely turned out to be true for me.

My alternate job offer was a Series B/C Machine Learning Lead, instead of a Platform Engineer. I would not be happy with the role, but I'd be very happy with the salary. It'd be 3-4x in cash, and 4-5x in total compensation terms. Another way to look at this, I took at 75% cut on my cash compensation.

I'm betting that I'll have a lot more fun doing this, but I'm also betting that I'll be a little more successful - which will compensate for this over a 4-10 year chapter of the career.

In addition to the cash and title hit, it's a bit of social shaming: People might be inclined to assume that you were not quite good as a Data Scientist and that is why you moved to SWE. I don't care enough about that to influence my decision but I do care about it and hence worth mentioning.

My Machine Learning skills will also atrophy with time. I'll be able to get to similar productivity faster in a few years because the half-life of knowledge in the field is really short and tooling improvements make it easier to ship well.

Anti Skills

You learn a well-paying skill and years later - it comes back to hurt you in unexpected ways. That's an Anti Skill.

Consider this hypothetical: You start your software engineering career and build a reputation as someone who is good at iOS development. Each year, the money you make keeps improving as you keep getting better at it.

The downside? You'll find it hard to get job offers outside of iOS development [1]

Note that this increased pay might still be less than even starting pay in some other fields, say Data Science -- but you've Golden Handcuffs on you now, don't you?

Congratulations! You've hit a local maxima!

But what if you're someone who enjoys doing iOS development? I'm super happy for you! You'll most likely not just be good at it, but great at this and enjoy it.

What is an Anti Skill?

The discussion gets more interesting when you consider that you're not just a software engineer anymore, but a mobile app developer. When you say "I'm good at iOS", the hiring market hears it as "I know only iOS development".

Unnoticeable to you, the market forces have limited you to a mobile developer. {{< tweet user="ponnappa" id="1415323073444597761" >}}

But when you started, did you know that this would happen? That learning these skills will reduce your optionality in the future?

"Anti Skill" is a set of skills which when advertised take away your future choices.

Anti Skills prevent you from adapting quickly to the changing environment around you. Anti Skills are skills that you don't want to tell people about. This has nothing to do with whether it's a good skill to learn or not. Some of these skills are actually good skills to learn e.g. React, SQL, Android/iOS from top of my mind.

What can we do?

To prevent this, keep your identity small. That includes not self-identifying by a technology, platform or worse, a JS Framework.

This will make it easier to see when/if you're stuck. Other way to say the same thing, if you think of yourself as a mobile developer, you might be better off thinking of yourself as a software developer specialising in iOS development.

This way, when something is not working, you can see what's going on.

If you're in a position where you're stuck at a local maxima -- resist the temptation to go up the ladder. Think of taking titles into your identity on as taking debt. Job titles are not your identity. They're prestige handcuffs, and you're not going to be able to get them off you easily.

It might be harder to get out of these impressions at your existing job, because they tend to be sticky. In such scenarios, moving to a different team within the same company or a different company altogether might be preferred. I know of atleast one case where someone had to change jobs to get a 'soft' career reset.

This reset, speaking from personal experience, is quite uncomfortable and requires you to fall down a skill-cliff and climb up again. If you're considering something like this, feel free to hit me up with your thoughts. I'm here to help.

The worst of these scenarios is that in a few years from now, you are great at something you don't enjoy anymore. And now, it's too late to do something else because you're stuck with it. Act now!


[1] Changing to a different role and/or adding skill sets might still be possible within the organisation you already work at. Lateral shifts are usually harder.

Breaking into NLP

Bulk of this is borrowed from notes made my teammate and friend at Verloop.io's NLP/ML team of our conversations. I've taken the liberty to remove our internal slang and some boring stuff.

I want to build a community around me on NLP. How can I get discovered by others?

Broadly speaking, the aim in forming connections can be split into Long Term and Short term. A short term aim would be where you can receive something immediate out of the connections or a particular connection itself. This could be a collaboration, correspondence, recommendation/advice or anything else.

A more long-term, strategic aim would be a well defined long term goal that requires multiple steps to achieve. A strategic aim could involve multiple tactical steps. This is also, what we like to call friendship in some polite-speak areas of the world.

I have no immediate goals or projects, just need some basic ideas on how to be a part of the ML community.

Find interests of people and do something for them. Many people simply ask questions on Twitter, or you can infer what they are interested in looking at their Linkedin/work and their personal blogs.

What would be a good starting point for this?

A very easy thing to start with is literature review. Specially, for new topics being researched by influential people in the field. A good literature review shows your interest and willingness to help. Opens door to communication.

A good place to find what topics are missing a decent literature review: Go through NLP reddit r/LanguageTechnology or subreddits for Deep Learning, Machine Learning and so on.

Or go through twitter. And help people out there. Answer their question with depth. Do not rush to be the first, but the best. When it comes to technology, almost all platforms behave a bit like StackOverflow, the right answer might not get accepted: but it'll get noticed. Btw, lot of the Huggingface contributors happen to be active on Github and Twitter both. Hanging around on their Slack can't hurt either.

But important thing, try and stick to one medium. The place where you are most at home and gels with your personality. This could actually even be Youtube if you're an English-fluent, attractive looking person.

The other reason you need to stick to one medium is that audience will spend most of their time on 1 or 2 social media channels.

If they see that your content is not that popular on the other channel - They will do the cross posting for you. For instance, we've both seen Twitter content even within ML such as the Gary Marcus debate and attack on Yann LeCun spill over on reddit. And of course, people are still posting Tweets on TikTok!

Word of mouth will be your biggest friend.

Find problems that many people face. Usually a simple problem faced by many is a great problem statement. The python requests library comes to my mind as an excellent example of such a challenge. The work by gensim around shallow vectorization methods like word2vec and Glove was also quite similar in vein for quite a lot of time. Of course, with the rise of Deep Learning and better tooling makes their work less important - but they stuck in my mind, didn't they?

Why is that a great problem statement?

It's maximising the area under the curve. Solve a trivial problem faced by many or a huge problem faced by some. It has the same impact.

What's something that has worked for you in finding interesting problems?

Find intersections with domains that have little to do with each other. For us, there are domains that have little to do with tech/code and can see great benefits from our involvement.

Marketing yourself has nothing to do with marketing but everything to do with the problems you solve and the solutions you come up with. Make sure the solution is accessible to the wider audience. It should not be that only a certain section of the population can use it. If you plan to market yourself, spend 95% of the time on a quality problem and a quality solution and 5% of time talking about it. This is usually enough if the first 95% is done well.

What medium to talk about these in?

The usual are the blog posts or social media posts etc. But there is an open secret within the community. Writing papers is probably the best way to talk about stuff you’ve done.

Why so?

Papers have the halo effect. It improves your reputation and makes it sticky. People might forget a blog post quickly but you can get recognition/perks for around 2 years or so after writing a paper. There are other secondary gains too from doing this. Once you write a paper, you start reading papers differently. You have a better intuition of reading between the lines to understand the author’s intent/pov. Another obvious benefit is you get better at writing papers. Your thought process will start coming across much more clearly.