Writing¶

August 2, 2021
in machine-learning, tech
8 min read

Data Science Org Design for Startups

While there is plenty of good advice on making ML work and making a career as a Data Scientist - I think very little discussion happens on the organization design for Data Science itself.

This blog will hopefully help folks not just build their team, but also understand the ecosystem from which they are hiring.

Organization Design is determined by these 3 broad categories:

Software Engineer vs Research: To what extent is the Machine Learning team responsible for building or integrating with software? How important are Software Engineering skills on the team?

Data Ownership: How much control does the Machine Learning team have over data collection, warehousing, labeling, and pipelining?

Model Ownership: Is the Machine Learning team responsible for deploying models into production? Who maintains the deployed models?

--- Josh Tobin at Full Stack Deep Learning

It's harder for ML/DS teams to succeed than your typical product and platform engineering functions in startups. This is because:

Good folks are hard to retain as their skills are highly transferable across multiple role (i.e., possibly high team attrition)

Management is unclear on what does "success" look like for ML as a function

These were the two key pitfalls which I wanted to solve for when designing the ML team and how it sits in the larger org. The most well known ways in which companies organise Machine Learning Teams are these:

1. Research & Development Labs

Of these, a R&D Lab is ideal for most well capitalized businesses because it enables them to attract talent, which can in turn work on long term business/tech challenges.

Uber AI Labs, Google AI Research, DeepMind, and FAIR are some examples from the top of my head. Microsoft Research - with contributions across almost all of Computer Science, should probably be the gold standard for this. I have personally spent some time working in such an org design¹

The limitation with this org design is that R&D teams don't own the pipeline to their work i.e. inputs (data) and outputs (model performance in production). To clarify, R&D teams in some cases do own data inputs in some places which does make the process more end to end - but often, the deployment in production is still not under them. This makes this org design all but useless for a pre/near Product Market Fit startups.

2. Embedded within Business & Product Teams

By far the most popular org design in Indian startups, is a Data Scientist is embedded into an engineering team along with an analyst which is then assigned to a business or product team. From the top of my head, this is how some of the best Data Science teams like AirBnb, Flipkart, and Facebook organize their ML teams.

I strongly considered this org design, but ultimately opted against this because it would not play to my strengths at all.

I expected these challenges:

Hard to maintain uniform data and engineering standards across the org

In this org structure, the primary, and sometimes only stakeholder for each Data person is their Product manager. There is a lot of work which is repeated e.g. data pipelines, cleaning, pre-processing. The larger organisations enforce some degree of uniformity via their Data Platforms or equivalent. In the early stages, this effort is not worth the decoupling speed.

Management Complexity, in terms of the ever increasing breadth of problems across different features

In the embedded space, each team could itself be working on a wide variety of “small” ML problems e.g. demand forecasting, text embedding, and sentiment analysis could be all worked on by a single Data Scientist.

Since the Product Manager doesn’t have the technical skill to evaluate whether the solution approach was apt or not, it falls on the Data Science Manager to have a lot of breadth and assist several IC Data Scientists across multiple problems at the same time.

3. Data Science Consultant

Small businesses which themselves had a services arm or revenue love this.

Business or product teams bring specific problems to a data science lead, who then scopes out a plan, defines a success criteria and hands it off to a Data Scientist or Machine Learning Engineer within the team.

There are so many understated but commonly known limitations of this:

Less Engaged Team: Since the problem solving and implementation are separated - the engineer feels less creative and empowered to make changes and is less invested in getting the small details right. There is no single owner of the data or models, and thereby no single person responsible for the technical outcomes of the project.
Communication overhead in terms of energy and time both, which happens 2x: first, when the consultant understands the problem from the person on the team and second when the consultant transfers/shares the proposed solution. This is not just slow, it’s error-prone and expensive.

This makes completing feedback loops even harder, since no one person has all the necessary context which can be carried forward to the next project. This was dropped fairly early as a candidate for that specific reason.

4. [Near Future] Productized Data Science Team

When I studied more modern teams, especially in B2B SaaS or eCommerce from outside, I felt they made a small but important change in this model: Instead of a matrix where the Data Scientist was ultimately responsible to their own product/pod and nothing else, they had a central Data Science function to which all Data Scientists reported.

Some teams created a new, "Chief Data Scientist" or "VP, Machine Learning" designations to reflect this increased autonomy and status within the org. Notice that this is quite similar to how some Design teams are organized.

While I had not worked under this org design, I had interned at a place which was small (10-50 employees) and I could understand the limitations of this org design when I was told the same.

The most common warning was the amount of context which any lead/manager Data Science had to keep beyond a certain project count within the company. I expect that the Verloop.io ML Team will evolve into this over the next 12-24 months. I'm estimating this on the basis of the problem complexity and the headcount needed for engineering and data science teams both. If we can have ICs reporting to both the Product Manager and a Data Science org, the added management complexity would be worth it in the faster shipping speed via shared tooling and context.

5. [Verloop Today] Full Stack Data Science

This is the org design at Verloop.io ML today. The defining characteristic of this org design is that every ML person does things end to end - there is no division of labour.

There is a brilliant explanation on this from StitchFix: https://multithreaded.stitchfix.com/blog/2019/03/11/FullStackDS-Generalists/

The goal of data science is not to execute. Rather, the goal is to learn and develop new business capabilities. … There are no blueprints; these are new capabilities with inherent uncertainty. … All the elements you’ll need must be learned through experimentation, trial and error, and iteration. – Eric Colson

As Eric calls out, Data Science, unlike say Platform Engineering functions - is not a pure "execution" function. It's an iteration and discovery function of the organisation. In business terms, you might call this Market Research, but where technology is applied to develop new capabilities.

This full cycle development seems to be endorsed by Netflix Tech officially and Data Science folks at Lazada as well.

Case Study: ML Org at Verloop.io

I hope the above gives you a sense of common data science team organisations. If you’ve seen them in the past, now we both have a shared vocabulary to talk about it.

As a case study, let me share some of the operational things we had at Verloop.io. This was mostly as part of our 0 to 1 journey as a B2B SaaS product.

These are not recommendations, but just how things shaped up in the early days. I hope this gives you a case study to concretely think about what we just discussed.

ML function reported directly to the CEO for the longest time. The CEO directly brings the business context and drives quick wins. The ML Lead needs to negotiate continuously on organisational goals, constantly query for added context, and make long term bets.

Part of the ML Product Manager role also got absorbed into what I'd been doing as a Machine Learning Lead/Manager because we did not have a full time Product Manager in the company for more than 6 months.

Attracting young talent was easier by giving them quite high autonomy. The team also owns model performance and deployment.

The deployment ownership is made possible by a ML System Design decision as well. The strong adherence to multi-tenant models instead of client specific models.

Talent pool is smaller + retention is hard

We had a smaller talent pool for at least 2 reasons: A few data science candidates refused to join the team because they were not interested in engineering, and wanted to focus on modeling tasks exclusively.

In some other cases, the conversation broke down because we couldn’t match their pay expectations.

We managed to make our retention hard because of good intentions, but with bad outcomes:

The engineering org in our early stages did a lot of ad hoc development in smaller, demo-driven sprint cycles. We assumed that separating ML from the rest of the engineering org would allow us to ship faster. It would also allow us to focus longer on one project, without being distracted by ad hoc tasks. This did work to a certain extent.

In hindsight, this was a mistake. It definitely empowered us to ship faster, but teammates felt isolated, and it was hard to complete the feedback loop with our end users via the Product Manager alone. Additionally, if we needed engineering’s help to ship something, they’d pick “their” work over integrating our shipped work. This slowed down our shipping pace itself over a longer duration. This in turn, hurt the morale of the team, and made retention much harder.

I’d do this differently the next time around.

There are 3 things I’d do differently:

Remove the middleman (i.e me): PM and the Data Scientist should work directly with each other. Instead of the information flowing/gathered with me as the nodal person.
Better Retrospectives: We did a few reviews i.e. what went well or wrong, but not enough of “How does this inform our future?”
Add Front End, DevOps Skills: Lot of our releases would reach the end user because the interface was designed, but not implemented. Engineering teams would quite obviously pick their own OKRs above ours. The short term fix is to add Front End and DevOps skills.

Even something as simple as being able to build+deploy Gradio or Streamlit demos would go a long way in convincing the org to prioritise the shipped work.

Ending Note

The terms are borrowed from the amazing blog by Pardis Noorzad: Models for integrating data science teams within companies | by Pardis Noorzad | Medium

Thanks to Eugene Yan and Maneesh Mishra for taking the time to review this piece. A lot of the improvements are thanks to their comments.

Photo by Rahul Chakraborty on Unsplash

Notes

Advanced Technologies Lab at Samsung Research Institute, Bengaluru ↩↩↩

July 21, 2021
in niranting
6 min read

Belief Arbitrage

This is a 3 part essay.

Part 1: Talent

In my teenage years, I first understood that the kind of family you’re born into gives you access to certain kinds of wealth e.g. money, network, habits/knowledge and social capital to do things you want.

Most people in my peer group and even among the adults I could speak to, used these resources to make their own life materially more comfortable.

This is precisely what I wanted: material comforts, but unfortunately, I couldn’t die and pick a family to be born again in. So I kept looking for other means and ways in which I could make it work for me.

So I went hunting. I read a few biographies, centered across genres but mostly dead people viewed from specific lenses of entrepreneurship, learning, politics and the like. I also actively sought out and spoke to people in their 30s and 40s in careers similar to mine.

I realised that sheer raw talent and finding a way to apply those talents is also a unique form of arbitrage. It makes people blissful in what we’d now popularly call flow state. The happiest people I met were craftsmen and had high mastery. They also had a reasonable degree of autonomy over their own shape of the day.

I was also in my early 20s by this and realized that I wanted to optimize for more control of my day and mental energy, and was more than happy to trade off some x% wealth over this.

“Eureka!”, I finally thought to myself. I’d at last found a way, a strategy - if not a playbook which might work for me.

The plan was arguably incredibly simple and had only 2 parts:

be good at a specific craft, which is in demand (Machine Learning for me)
- be good
  - implies that you need to be at least in the top 2000 people in your age group for that craft
- in demand - implies that there should be at least 10K people in that craft already across the globe and different skill levels and a mechanism for them to move out i.e. retirement, outdated, or similar
  - people who matter should know that you’re a skilled craftsman
  - my work is recommended by the likes of Andrew Ng, Paul Romer and a NLP book which has sold over 1K copies.

I think it’d be fair that I executed this strategy of sorts to the best of my then-ability.

In hindsight, I could’ve definitely done 5-10x more but we’ll leave regrets/lessons for a separate essay to retain the cheery narrative nature of this first person essay.

If there is something you want to take away from this essay, it is this: Talent is a valid form of advantage, but not everyone can monetize/utilize this well.

Part 2: Genius

To recap, talent is one’s ability to do something extremely well. I also argued that talent is the only form or advantage which I thought I could build to get material wealth in life.

To me, the idea that Genius can be translated to material wealth is an axiomatic belief. So we’ll not dwell on whether Genius is useful or not.

In the late 2000s, a young man would show up every week with the same idea for approval from the management of this rapidly growing Internet startup. He had previously quit his job at an prestigious consulting company and closed those doors behind him by going to a smaller startup in a completely different role.

For the idea which he had a lot of faith in, he was willing to bet this career and if not, at least his future prospects at that company. His bet? A new Internet explorer toolbar, which would ensure that users could find Google Search more easily and quickly. The man was Sundar Pichai.

To many, that is and was a move of genius. This was a genius of not just invention, but also business, technology and user psychology. There is more than one flavour of genius for sure.

Let’s begin by understanding what we mean by genius first.

Our cultural understanding of intellectual excellence, or more popularly, genius is extremely limited – and dare I say, harmful. We limit these to inventors, mathematical prodigies and their ilk.

Alexey Guzey talks about this cultural misunderstanding here: https://guzey.com/intelligence-killed-genius/

I know a few people who I believe to be geniuses. What happens when I tell them that I believe they’re a genius? They all tell me that there are people smarter than them and that they’re “only pretty good at one or two things”

bitch this is exactly what genius is.

Genius isn’t limited to the ability to solve puzzles, or do rapid arithmetic, or Sheldon’s eccentric quirks.

Genius is having the ability, no – the advantage, to see things which others can’t see.

To me, as someone of not-superhuman intelligence, the question is this then:

“How do I make the leap from Talent to Genius?”

The answer is in the Alexey Guzey essay itself: genius is having a vision.

In sports, the line between talent and genius is often blurred. Is Michael Jordan a really talented player, or a genius for exploiting the blind spots of referees to his advantage?

It’s being able to make bets and act on them with unparalleled insight. It’s executing. It’s doing something which escapes the imagination of most people in the know. It’s a taste for what could work, without being bound by tradition.

As Arthur Schopenhauer summarized: Talent hits a target no one else can hit; Genius hits a target no one else can see.

Part 3: Courage

We touched upon a question of interest to us:

“How do I make the leap from Talent to Genius?”

I don’t have a complete answer yet. So let me offer the part of the puzzle which I am very confident about: Courage.

It’s a key ingredient, a necessary catalyst for one to make the leap from one to another.

A natural question follows, what do we mean by courage? How do you recognize courage?

Of course, there is the Die Hard version of running into a gun nest. That is a subset of courage.

Courage also has some other flavors. Many of us recognize them almost instantly in the form of honor and duty.

The other smell for spotting courage is the acting on something despite fear and/or knowledge of harsh consequences. Many outsized entrepreneurial and military ventures are built with this courage.

There is also a lot of courage which stems from not caring about what people have to say. To neither be bound by shackles of history or tradition, but also respect Chesterton’s Fence.

It frees you to learn about reality from other people, but not be bound to the reality of today. An ability to envision a future state which others don’t imagine.

Pressfield’s Resistance is the most dangerous element to one's life and dreams since its sole mission is to sabotage aspirations. The first step to fighting Resistance is knowing that you can, and you can win a battle here and another there.

That is empowering you to action. That is a form of courage.

It frees you to make bets that’d make others shiver in fear of uncertainty. It frees you to act in ways congruent to your values and aspirations. To seek truth, beauty and craft above all else.

For a craftsman mindset, to do their best work, the ideal state is where they can act freely. For that specific form of high agency, courage is a key ingredient, a catalyst. Without courage, no talent can truly do their best work. The leap is impossible from Talent to Genius.

Courage in any flavor: pride, duty, honor, self-confidence is a foundational belief.

This is a belief that not only is something achievable - but there might be a means and methods to do so. It is a specific form of Arbitrage against Reality. An advantage which you, my dear talented friend, now have over others that don’t have the conviction, or the courage.

July 21, 2021
in machine-learning, tech
2 min read

MLOps for Startups

Start your development by writing the overall impact and feature overview in the Press Release doc and README

If your time to ship is more than 2 weeks, write a functional spec

In case of bug fixes, add bug details or link to Asana/Github Issues

Always. Do trunk-based development. Don’t restrict a deployment trigger to specific people. As soon as you are done, go ahead, deploy and let others deploy.

SERVICE DETAILS

DOCS

Please provide API documentation for your service (e.g. via API definition)
Add auto-generated engineering docs in HTML/Markdown
Who is the maintaining team/person at this moment for the service? All maintainers and committers should be listed Repo README should include instructions to set up this repo for development

Service Component — DATABASE

List down all the database changes, if you added any columns, removed any columns, added or removed any tables.
What kind of indexes do you have? If you added a new column does it require an index? If yes, why? If no, why not?
Do you make changes to records? Do you do frequent deletes or updates?

Service Component — MONITORING

List all the services you own and list down each server monitoring parameters
Alerts for service uptime, service performance degradation e.g. latency, throughput
Alerts for service machine disk/CPU/memory — what’s the threshold and how are they triggered

Please include today’s screenshots for each of them e.g. StackDriver. We need to make sure that you have proper monitoring in place.

Service Component — DEPLOYMENT

TODO Add a step by step documentation on how to write your first service and deploy it in the dev/stage/production environment for your org

CODE STYLING

PYTHON

Add a pre-commit with black, sort and flake8 to your code. Follow the happy path convention. Add type hints
Start writing code by writing APIs, then tests and then implement code. We use tests as API docs
Add proper liveness and/or readiness check for your Kubernetes deployment:

Liveliness check is for Kubernetes to know whether your application is running.

Readiness check is for you to specify when traffic should be sent to the pod. For example, if your pod needs some operations to be done before it can take traffic (such as downloading a dataset), your readiness check should send back 200 only once it's completely ready for taking traffic.

Docs:

Kubernetes Manual
Blog on Best Practices

Aim for reproducible experiments by re-using actively maintained APIs
There is no specific tooling to do reproducible experiments, but consider using something simple like Sacred or Hydra