For a couple of years now, MLOps is probably the most (over)used term in the ML industry. The more models people want to deploy to production, the more they think about how to organize the Ops part of this process.
Naturally, the way to do MLOps has been shaped by the big players on the market – companies like Google, Netflix, and Uber. What they did for the community was (and is) great, but they were solving their MLOps problems.
And most companies don’t have their problems. The majority of ML teams operate on a smaller scale and have different challenges. Yet they are the biggest part of the ML industry, and they want to know what’s the best way to do MLOps at their scale, with their resources and limitations.
The reasonable scale MLOps is addressing this need. “Reasonable scale” is a term coined last year by Jacopo Tagliabue, and it refers to the companies that:
- have ml models that generate hundreds of thousands to tens of millions of USD per year (rather than hundreds of millions or billions)
- have dozens of engineers (rather than hundreds or thousands)
- deal with terabytes (rather than petabytes or exabytes)
- have a finite amount of computing budget
In this guide, you’ll learn more about the MLOps at a reasonable scale, and you’ll get to know the best practices, templates, and examples that will help you understand how to implement them in your work.
Before that, let’s do a few steps back and see why we even talk about reasonable scale.
MLOps vs MLOps at a reasonable scale
Solving the right problem and creating a working model, while still crucial, is no longer enough. At more and more companies, ML needs to be deployed to production to show “real value for the business”.
Otherwise, your managers or managers of your managers will start asking questions about the “ROI of our AI investment”. And that means trouble.
The good thing is that many teams, large and small, are past that point, and their models are doing something valuable for the business. The question is:
How do you actually deploy, maintain and operate those models in production?
The answer seems to be MLOps.
In 2021 so many teams looked for tools and best practices around ML operations that MLOps became a real deal. Dozens of tools and startups were created. 2021 was even called “a year of MLOps”. Cool.
But what does it mean to have MLOps set up?
If you read through online resources, it would be:
- reproducible and orchestrated pipelines,
- alerts and monitoring,
- versioned and traceable models,
- auto-scalable model serving endpoints,
- data versioning and data lineage,
- feature stores,
- and so much more.
But does it have to be all of it?
Do you really need all those things, or is it just a “standard industry best practice”? Where do those “standard industry best practices” come from anyway?
Most of the good blog posts, whitepapers, conference talks, and tools are created by people from super-advanced, hyperscale companies. Companies like Google, Uber, and Airbnb. They have hundreds of people working on ML problems that serve trillions of requests a month.
That means most of the best practices you find are naturally biased toward hyperscale. But 99% of companies are not doing production ML at hyperscale.
Most companies are either not doing any production ML yet or do it at a reasonable scale. Reasonable scale as in five ML people, ten models, millions of requests. Reasonable, demanding, but nothing crazy and hyperscale.
Ok, so the best practices are biased toward hyperscale, but what is wrong with that?
The problem is when a reasonable scale team is going with “standard industry best practice” and tries to build or buy a full-blown, hyperscale MLOps system.
Building hyperscale MLOps with the resources of a reasonable scale ML team just cannot work.
Hyperscale companies need everything. Reasonable scale companies need to solve the most important current challenges. They need to be smart and pragmatic about what they need right now.
The tricky part is to tell what your actual needs are and what are potential, nice-to-have, future needs. With so many blog articles and conference talks out there, it is hard. Once you are clear about your reality, you are halfway there.
But there are examples of pragmatic companies achieving great results by embracing reasonable scale MLOps limitations:
- Lemonade generates $100M+ in annual recurring revenue from ML models with just 2 ML engineers serving 20 data scientists.
- Coveo leverage tools to deliver recommendation systems to thousands of companies with (almost) no ML infrastructure people.
- Hypefactors runs NLP/CV data enrichment pipelines on the entire social media landscape with a team of just a few people.
You probably never heard of them, but their problems and solutions are a lot closer to your use case than that Netflix blog post or Google whitepaper you have opened in the other tab.
Check more stories from reasonable scale companies about how they solved different parts of their ML workflow.
The pillars of MLOps
Ok, so say you want to do the MLOps right, what do you do? Even though MLOps is still developing, there are some things that are clear(ish), e.g. the pillars of MLOps that can be used as a kind of guidance on how to even start thinking about this topic.
The pillars of MLOps – stack components
The first approach is based on the four or five main pillars of MLOps that you need to implement somehow:
- Data ingestion (and optionally feature store)
- Pipeline and orchestration
- Model registry and experiment tracking
- Model deployment and serving
- Model monitoring
I say four or five because the data ingestion part is not always mentioned as one of the pillars. But I believe that it’s a crucial element and shouldn’t be skipped.
Each of those can be solved with a simple script or a full-blown solution depending on your needs.
End-to-end vs a canonical stack of best-in-class tools
The decision boils down to whether you want:
- an end-to-end platform vs a stack of best-in-class point solutions
- to buy vs build vs maintain open-source tools (or buy and build and maintain oss).
The answer, as always, is “it depends”.
Some teams have a fairly standard ML use case and decide to buy an end-to-end ML platform.
By doing so, they get everything-MLOps out of the box, and they can focus on ML.
The problem is that the further away you go from the standard use case, the harder it gets to adjust the platform to your workflow. And everything looks simple and standard at the beginning. Then business needs to change, requirements change, and it is not so simple anymore.
And then there is the pricing discussion. Can you justify spending “this much” on an end-to-end enterprise solution when all you really need is just 3 out of 10 components? Sometimes you can, and sometimes you cannot.
The pillars of reasonable scale MLOps – components
Because of all that, many teams stay away from end-to-end and decide to build a canonical MLOps stack from point solutions that solve just some parts very well.
Some of those solutions are in-house tools, some are open-source, and some are third-party SaaS or on-prem tools.
Depending on their use case, they may have something as basic as bash scripts for most of their ML operations and get something more advanced for one area where they need it.
- You port your models to native mobile apps. You probably don’t need model monitoring but may need advanced model packaging and deployment.
- You have complex pipelines with many models working together. Then you probably need some advanced pipelining and orchestration.
- You need to experiment heavily with various model architectures and parameters. You probably need a solid experiment tracking tool.
By pragmatically focusing on the problems you actually have right now, you don’t overengineer solutions for the future. You deploy those limited resources that you (as a team doing ML at a reasonable scale) have into things that make a difference for your team/business.
The pillars of reasonable scale MLOps – principles
There’s also another approach to MLOps pillars that’s worth mentioning. It was brought up by Ciro Greco, Andrea Polonioli, and Jacopo Tagliabue in the article Hagakure for MLOps: The Four Pillars of ML at Reasonable Scale. The principles they write about are:
- Data is superior to modeling: you can often gain more by iterating on data, not models (Andrew Ng talks about it a lot with ”data-centric AI”)
- Log then transform: you should separate data ingestion (getting raw data) from data processing to get reproducibility and replayability. You can get that, for example with Snowflake + dbt
- PaaS & FaaS is preferable to IaaS: You have limited resources. Focus them where you are making a difference. Instead of building and maintaining every component of the stack, use fully-managed services where you can. Your team’s time is the real cost here, not the subscription.
- Vertical cuts deeper than distributed: in most cases, you don’t really need distributed computing architecture. You can use containerized, cloud-native scaling.
Best practices and tips for setting up MLOps at a reasonable scale
Okay, we’ve talked about the pillars of MLOps and the principles of how to approach them. Now it’s time for the more practical part. You’re probably wondering:
How do reasonable scale companies actually set it up (and how should you do it)?
Here are the resources that will help you build a pragmatic MLOps stack for your use case.
Let’s start with some tips.
Lots of good stuff in there, but there was this one thought I just had to share with you:
“My number 1 tip is that MLOps is not a tool. It is not a product. It describes attempts to automate and simplify the process of building AI-related products and services.
Therefore, spend time defining your process, then find tools and techniques that fit that process.
For example, the process in a bank is wildly different from that of a tech startup. So the resulting MLOps practices and stacks end up being very different too.” – Phil Winder, CEO at Winder Research
So before everything, be pragmatic and think about your use case, your workflow, your needs. Not “industry best practices”.
I keep coming back to Jacopo Tagliabue, Head of AI at Coveo, but the fact is that no reasonable scale ML discussion is complete without him (after all, he’s the one who coined the term, right?). In his pivotal blog post, Jacopo suggests a mindset shift that we think is crucial (especially early in your MLOps journey):
“to be ML productive at a reasonable scale, you should invest your time in your core problems (whatever that might be) and buy everything else.”
You can watch him go deep into the subject in this Stanford Sys seminar video.
The third tip I want you to remember comes from Orr Shilon, ML engineering team lead at Lemonade.
In this episode of mlops.community podcast, he talks about platform thinking.
He suggests that their focus on automation and pragmatically leveraging tools wherever possible were key to doing things efficiently in MLOps.
With this approach, at one point, his team of two ML engineers managed to support the entire data science team of 20+ people. That is some infrastructure leverage.
One more place whit great insights about setting up your MLOps is one of the MLOps community meetups with Andy McMahon, titled “Just Build It! Tips for Making ML Engineering and MLOps Real”. Andy talks about:
- Where to start when you want to operationalize your ML models?
- What comes first – process or tooling?
- How to build and organize an ML team?
- …and much more
It’s a great overview of what he learned when doing all these things in real life. Many valuable lessons there.
Now, let’s look at example MLOps stacks!
MLOps tool stacks
There are many tools that play in many MLOps categories though it is sometimes hard to understand who does what.
From our research into how reasonable scale teams set up their stacks, we found out that:
Pragmatic teams don’t do everything. They focus on what they actually need.
For example, the team over at Continuum Industries needed to get a lot of visibility into testing and evaluation suites of their optimization algorithms.
So they connected Neptune with GitHub actions CICD to visualize and compare various test runs.
GreenSteam needed something that would work in a hybrid monolith-microservice environment.
Because of their custom deployment needs, they decided to go with Argo pipelines for workflow orchestration and deploy things with FastAPI.
Those teams didn’t solve everything deeply but pinpointed what they needed and did that very well.
There are more reasonable scale teams among our customers, here are some case studies that are worth checking:
- Zoined talks about scalable ML workflow with only a few Data Scientists & ML Engineers
- Hypefactors talks about how to manage the process with a variable number of ML experiments
- Deepsense.ai talks about finding a way to keep track of over 100k models
- Brainly talks about managing their experiments when working with SageMaker Pipelines
- InstaDeep talks about building a research-friendly and team-friendly process & stack
If you’d like to see more examples of how reasonable scale teams set up their MLOps, check these articles:
- Monzo’s machine learning stack by Neil Lathia
- Laying the foundation of our open source ML platform with a modern CI/CD pipeline by Theodore Meynard
- The Road to a Serverless ML Pipeline in Production by Gal Shen
- How These 8 Companies Implement MLOps: In-Depth Guide by our Developer Advocate, Stephen Oladele who did a great job researching and writing down setups of 8 more companies (some are reasonable scale, and some are hyperscale)
Also, if you want to go deeper, there is a slack channel where people share and discuss their MLOps stacks.
Here’s how you can join it:
- Join mlops.community slack
- Find the #pancake-stacks channel
- While at it, come say hi in the #neptune-ai channel and ask us about this article, MLOps, or whatever else
Okay, stacks are great, but you probably want some templates, too.
The best reasonable scale MLOps template comes from, you guessed it, Jacopo Tagliabue and collaborators.
In this open-source GitHub repository, they put together an end-to-end (Metaflow-based) implementation of an intent prediction and session recommendation.
It shows how to connect the main pillars of MLOps and have an end-to-end working MLOps system you can build on. It is an excellent starting point that lets you use the default or pick and choose tools for each component.
Another great resource that is worth mentioning is the MLOps Infrastructure Stack article.
In that article, they explain how:
“MLOps must be language-, framework-, platform-, and infrastructure-agnostic practice. MLOps should follow a “convention over configuration” implementation.”
It comes with a nice graphical template from folks over at Valohai.
They explain general considerations, tool categories, and example tool choices for each component. Overall a really good read.
MyMLOps gives you a browser-based tool stack builder that talks briefly about what tools do and in which categories they play. You can also share your stack with others.
You may also look into some of our resources for choosing tools for a particular component of the stack:
- The Best MLOps Tools and How to Evaluate Them
- Best Tools For Data Labeling
- The Best Tools for Machine Learning Model Visualization
- Best Tools for Model Tuning and Hyperparameter Optimization
- 15 Best Tools for ML Experiment Tracking and Management
- Best Tools to Do ML Model Serving
- Best 8 Machine Learning Model Deployment Tools
- Best Workflow and Pipeline Orchestration Tools
- Best 7 Data Version Control Tools
- Top Model Versioning Tools for Your ML Workflow
- Best Tools to Do ML Model Monitoring
- CI/CD Tools for Machine Learning
- … and more articles about tools.
What should you do next?
Okay, now use this knowledge and go build your MLOps stack!
We’ve gathered here quite a lot of resources that should help you. But if you have specific questions on the way or just want to dig deeper into the topic, here’s even more useful stuff.
- MLOps Community – I may be repeating myself, but that’s definitely the best MLOps community out there. Almost 10k practitioners in one place, asking questions, sharing knowledge, and just talking to each other about all things MLOps.
- Apart from the very active Slack channel, MLOps Community also runs a podcast, organizes meetups and reading groups, and sends newsletters. Make sure to check all these resources.
- MLOps Live – It’s a biweekly event organized by us, Neptune.ai, where ML practitioners answer questions from other ML practitioners about one chosen subject related to MLOps. You can watch previous episodes on YouTube or listen to them as a podcast.
- Personal blogs of ML folks – Many ML practitioners have their own blogs, which we highly recommend as well. Make sure to follow e.g. Chip Huyen, Eugene Yan, Jeremy Jordan, Shreya Shankar, or Laszlo Sranger. You can also check the Outerbounds blog.
- MLOps Blog – Our own blog is also full of MLOps-related articles written by Data Scientists and ML Engineers who work in the industry. You’ll find pieces covering best practices, tools, real-life MLOps pipelines, and much more. Here are a few articles I think you should start with:
- Towards Data Science – Probably an obvious resource, but you can find a lot of gold there when it comes to reasonable scale ML teams sharing their solutions and practices.
- apply(conf) – Although there are speakers from hyperscale companies as well, this conference gives a lot of space in their agenda to reasonable scale teams. It’s one of the favorite events of the ML community, so there must be a reason for that.
- Awesome MLOps GitHub repos – There are actually two repos with this name – here and here. They list everything from articles, books, and papers, to tools, newsletters, podcasts, and events.
- If you’d like to take a step back, or you’re just starting to learn about MLOps, no worries. There’s something for everyone. You can check one of the courses: MLOps Fundamentals on Coursera, Zoomcamp organized by DataTalks Club or Made with ML.
MLOps at GreenSteam: Shipping Machine Learning [Case Study]
7 mins read | Tymoteusz Wołodźko | Posted March 31, 2021
GreenSteam is a company that provides software solutions for the marine industry that help reduce fuel usage. Excess fuel usage is both costly and bad for the environment, and vessel operators are obliged to get more green by the International Marine Organization and reduce the CO2 emissions by 50 percent by 2050.
Even though we are not a big company (50 people including business, devs, domain experts, researchers, and data scientists), we have already built several machine learning products over the last 13 years that help some major shipping companies make informed performance optimization decisions.
In this blog post, I want to share our journey to building the MLOps stack. Specifically, how we:
- dealt with code dependencies
- approached testing ML models
- built automated training and evaluation pipelines
- deployed and served our models
- managed to keep human-in-the-loop in MLOps