What is this MLOps thing?
It was the question I had on my mind, but until recently (I’m writing it in the late 2020) , I had only heard about MLOps a few times at big AI conferences, I saw some mentions in papers I read over the years, but I didn’t know anything specific.
Interestingly enough, around the same time, I had a conversation with a friend who works as a Data Mining Specialist in Mozambique, Africa. Recently they started to create their in-house ML pipeline, and coincidentally I was starting to write this article while doing my own research into the mysterious area of MLOps to put everything in one place.
In this conversion, I’ve learned more about the many pain points that both legacy companies (and many tech companies doing commercial ML) have regarding:
- Moving to the cloud;
- Creating and managing ML pipelines;
- Dealing with sensitive data at scale;
- And about a million other problems.
And so I made it my duty to dive in deep and conduct extensive research and learn as much as I could as I was writing down my own notes and ideas.
The result is this article.
But why research this topic now?
According to techjury, every person created at least 1.7 MB of data per second in 2020. For data scientists like you and me, that is like early Christmas because there are so many theories/ideas to explore, experiment with, and many discoveries to be made and models to be developed.
But if we want to be serious and actually have those models touch real-life business problems and real people, we have to deal with the essentials like:
- acquiring & cleaning large amounts of data;
- setting up tracking and versioning for experiments and model training runs;
- setting up the deployment and monitoring pipelines for the models that do get to production.
And we need to find a way to scale our ML operations to the needs of the business and/or users of our ML models.
There were similar issues in the past when we needed to scale conventional software systems so that more people can use them. DevOps’ solution was a set of practices for developing, testing, deploying, and operating large-scale software systems. With DevOps, development cycles became shorter, deployment velocity increased, and system releases became auditable and dependable.
That brings us to MLOps. It was born at the intersection of DevOps, Data Engineering, and Machine Learning, and it’s a similar concept to DevOps, but the execution is different. ML systems are experimental in nature and have more components that are significantly more complex to build and operate.
Let’s dig in!
What is MLOps?
MLOps (Machine Learning Operations) is a set of practices for collaboration and communication between data scientists and operations professionals. Applying these practices increases the quality, simplifies the management process, and automates the deployment of Machine Learning and Deep Learning models in large-scale production environments. It’s easier to align models with business needs, as well as regulatory requirements.
MLOps is slowly evolving into an independent approach to ML lifecycle management. It applies to the entire lifecycle – data gathering, model creation (software development lifecycle, continuous integration/continuous delivery), orchestration, deployment, health, diagnostics, governance, and business metrics.
The key phases of MLOps are:
- Data gathering
- Data analysis
- Data transformation/preparation
- Model training & development
- Model validation
- Model serving
- Model monitoring
- Model re-training.
DevOps vs MLOps
DevOps and MLOps have fundamental similarities because MLOps were derived from DevOps principles. But they’re quite different in execution:
- Unlike DevOps, MLOps is much more experimental in nature. Data Scientists and ML/DL engineers have to tweak various features – hyperparameters, parameters, and models – while also keeping track of and managing the data and the code base for reproducible results. Besides all the efforts and tools, the ML/DL industry still struggles with the reproducibility of experiments. This topic is out of the scope of this article, so for more information check the reproducibility subsection in references at the end.
- Hybrid team composition: the team needed to build and deploy models in production won’t be composed of software engineers only. In an ML project, the team usually includes data scientists or ML researchers, who focus on exploratory data analysis, model development, and experimentation. They might not be experienced software engineers who can build production-class services.
- Testing: testing an ML system involves model validation, model training, and so on – in addition to the conventional code tests, such as unit testing and integration testing.
- Automated Deployment: you can’t just deploy an offline-trained ML model as a prediction service. You’ll need a multi-step pipeline to automatically retrain and deploy a model. This pipeline adds complexity because you need to automate the steps that data scientists do manually before deployment to train and validate new models.
- Production performance degradation of the system due to evolving data profiles or simply Training-Serving Skew: ML models in production can have reduced performance not only due to suboptimal coding but also due to constantly evolving data profiles. Models can decay in more ways than conventional software systems, and you need to plan for it. This can be caused by:
- A discrepancy between how you handle data in the training and serving pipelines.
- A change in the data between when you train and when you serve.
- Feedback loop – when you choose the wrong hypothesis (i.e. objective) to optimize, which makes you collect biased data for training your model. Then, without knowing, you collect newer data points using this flawed hypothesis, it’s fed back in to retrain/fine-tune future versions of the model, making the model even more biased, and the snowball keeps growing. For more information read Fastbook’s section on Limitations Inherent To Machine Learning.
- Monitoring: models in production need to be monitored. Similarly, the summary statistics of data that built the model need to be monitored so that you can refresh the model when needed. These statistics can and will change over time, you need notifications or a roll-back process when values deviate from your expectations.
MLOps and DevOps are similar when it comes to continuous integration of source control, unit testing, integration testing, and continuous delivery of the software module or the package.
May interest you
However, in ML there are a few notable differences:
- Continuous Integration (CI) is no longer only about testing and validating code and components, but also testing and validating data, data schemas, and models.
- Continuous Deployment (CD) is no longer about a single software package or service, but a system (an ML training pipeline) that should automatically deploy another service (model prediction service) or roll back changes from a model.
- Continuous Testing (CT) is a new property, unique to ML systems, that’s concerned with automatically retraining and serving the models.
MLOps vs experiment tracking vs ML model management
We’ve defined what MLOps is, what about experiment tracking and ML model management?
Experiment tracking is a part (or process) of MLOps focused on collecting, organizing, and tracking model training information across multiple runs with different configurations (hyperparameters, model size, data splits, parameters, and so on).
As mentioned earlier, because ML/DL is so experimental in nature, we use experiment tracking tools for benchmarking different models created either by different companies, teams or team members.
To ensure that ML models are consistent and all business requirements are met at scale, a logical, easy-to-follow policy for model management is essential.
MLOps methodology includes a process for streamlining model training, packaging, validation, deployment, and monitoring. This way you can run ML projects consistently from end-to-end.
By setting a clear, consistent methodology for Model Management, organizations can:
- Proactively address common business concerns (such as regulatory compliance);
- Enable reproducible models by tracking data, models, code, and model versioning;
- Package and deliver models in repeatable configurations to support reusability.
Why does MLOps matter?
MLOps is fundamental. Machine learning helps individuals and businesses deploy solutions that unlock previously untapped sources of revenue, save time, and reduce cost by creating more efficient workflows, leveraging data analytics for decision-making, and improving customer experience.
These goals are hard to accomplish without a solid framework to follow. Automating model development and deployment with MLOps means faster go-to-market times and lower operational costs. It helps managers and developers be more agile and strategic in their decisions.
MLOps serves as the map to guide individuals, small teams, and even businesses to achieve their goals no matter their constraints, be it sensitive data, fewer resources, small budget, and so on.
You decide how big you want your map to be because MLOps are practices that are not written in stone. You can experiment with different settings and only keep what works for you.
MLOps best practices
At first, I wanted to just list 10 best practices, but after some research, I came to the conclusion that it would be best to cover the best practices for different components of an ML pipeline, namely: Team, Data, Objective, Model, Code, and Deployment.
The following list is distilled from various sources mentioned in the references:
- Use A Collaborative Development Platform
- Work Against a Shared Backlog
- Communicate, Align, and Collaborate With Others
- Use Sanity Checks for All External Data Sources
- Track, identify, and account for changes in data sources.
- Write Reusable Scripts for Data Cleaning and Merging
- Combine and modify existing features to create new features in human-understandable ways
- Ensure Data Labelling is Performed in a Strictly Controlled Process
- Make Data Sets Available on Shared Infrastructure (private or public)
Objective (Metrics & KPIs)
- Don’t overthink which objective you choose to directly optimize, track multiple metrics at first.
- Choose a simple, observable and attributable metric for your first objective
- Set Governance Objectives
- Enforce Fairness and Privacy
- Keep the first model simple and get the infrastructure right
- Starting with an interpretable model makes debugging easier.
- Capture the Training Objective in a Metric that is Easy to Measure and Understand
- Actively Remove or Archive Features That are Not Used
- Peer Review Training Scripts
- Enable Parallel Training Experiments
- Automate Hyper-Parameter Optimisation
- Continuously Measure Model Quality and Performance
- Use Versioning for Data, Model, Configurations and Training Scripts
- Plan to launch and iterate.
- Automate Model Deployment
- Continuously Monitor the Behaviour of Deployed Models
- Enable Automatic Rollbacks for Production Models
- When performance plateaus, look for qualitatively new sources of information to add rather than refining existing signals.
- Enable Shadow Deployment
- Keep ensembles simple
- Log Production Predictions with the Model’s Version, Code Version and Input Data
- Human Analysis of the System & Training-Serving Skew
These best practices will serve as the foundation on which you will build your MLOps solutions, with that said we can now dive into the implementation details.
How to implement MLOps
According to Google, there are three ways you can go about implementing MLOps:
- MLOps level 0 (Manual process)
- MLOps level 1 (ML pipeline automation)
- MLOps level 2 (CI/CD pipeline automation)
MLOps level 0
This is typical for companies that are just starting out with ML. An entirely manual ML workflow and the data-scientist-driven process might be enough if your models are rarely changed or trained.
- Manual, script-driven, and interactive process: every step is manual, including data analysis, data preparation, model training, and validation. It requires manual execution of each step and manual transition from one step to another.
- Disconnect between ML and operations: the process separates data scientists who create the model, and engineers who serve the model as a prediction service. The data scientists hand over a trained model as an artifact for the engineering team to deploy on their API infrastructure.
- Infrequent release iterations: the assumption is that your data science team manages a few models that don’t change frequently—either changing model implementation or retraining the model with new data. A new model version is deployed only a couple of times per year.
- No Continuous Integration (CI): because few implementation changes are assumed, you ignore CI. Usually, testing the code is part of the notebooks or script execution.
- No Continuous Deployment (CD): because there aren’t frequent model version deployments, CD isn’t considered.
- Deployment refers to the prediction service (i.e. a microservice with REST API)
- Lack of active performance monitoring: the process doesn’t track or log model predictions and actions.
The engineering team might have their own complex setup for API configuration, testing, and deployment, including security, regression, and load + canary testing.
In practice, models often break when they’re deployed in the real world. Models fail to adapt to changes in the dynamics of the environment or changes in the data that describes the environment. Forbes has a great article on this: Why Machine Learning Models Crash and Burn in Production.
To address the challenges of this manual process, it’s good to use MLOps practices for CI/CD and CT. By deploying an ML training pipeline, you can enable CT, and you can set up a CI/CD system to rapidly test, build, and deploy new implementations of the ML pipeline
MLOps level 1
The goal of MLOps level 1 is to perform continuous training (CT) of the model by automating the ML pipeline. This way, you achieve continuous delivery of model prediction service.
This scenario may be helpful for solutions that operate in a constantly changing environment and need to proactively address shifts in customer behavior, price rates, and other indicators.
- Rapid experiment: ML experiment steps are orchestrated and done automatically.
- CT of the model in production: the model is automatically trained in production, using fresh data based on live pipeline triggers.
- Experimental-operational symmetry: the pipeline implementation that’s used in the development or experiment environment is used in the preproduction and production environment, which is a key aspect of MLOps practice for unifying DevOps.
- Modularized code for components and pipelines: to construct ML pipelines, components need to be reusable, composable, and potentially shareable across ML pipelines (i.e. using containers).
- Continuous delivery of models: the model deployment step, which serves the trained and validated model as a prediction service for online predictions, is automated.
- Pipeline deployment: in level 0, you deploy a trained model as a prediction service to production. For level 1, you deploy a whole training pipeline, which automatically and recurrently runs to serve the trained model as the prediction service.
- Data and model validation: the pipeline expects new, live data to produce a new model version that’s trained on the new data. Therefore, automated data validation and model validation steps are required in the production pipeline.
- Feature store: a feature store is a centralized repository where you standardize the definition, storage, and access of features for training and serving.
- Metadata management: information about each execution of the ML pipeline is recorded in order to help with data and artifacts lineage, reproducibility, and comparisons. It also helps you debug errors and anomalies
- ML pipeline triggers: you can automate ML production pipelines to retrain models with new data, depending on your use case:
- On a schedule
- On availability of new training data
- On model performance degradation
- On significant changes in the data distribution (evolving data profiles).
This setup is suitable when you deploy new models based on new data, rather than based on new ML ideas.
However, you need to try new ML ideas and rapidly deploy new implementations of the ML components. If you manage many ML pipelines in production, you need a CI/CD setup to automate the build, test, and deployment of ML pipelines.
MLOps level 2
For a rapid and reliable update of pipelines in production, you need a robust automated CI/CD system. With this automated CI/CD system, your data scientists rapidly explore new ideas around feature engineering, model architecture, and hyperparameters.
This level fits tech-driven companies that have to retrain their models daily, if not hourly, update them in minutes, and redeploy on thousands of servers simultaneously. Without an end-to-end MLOps cycle, such organizations just won’t survive.
This MLOps setup includes the following components:
- Source control
- Test and build services
- Deployment services
- Model registry
- Feature store
- ML metadata store
- ML pipeline orchestrator.
- Development and experimentation: you iteratively try out new ML algorithms and new modeling where the experiment steps are orchestrated. The output of this stage is the source code of the ML pipeline steps, which are then pushed to a source repository.
- Pipeline continuous integration: you build source code and run various tests. The outputs of this stage are pipeline components (packages, executables, and artifacts) to be deployed in a later stage.
- Pipeline continuous delivery: you deploy the artifacts produced by the CI stage to the target environment. The output of this stage is a deployed pipeline with the new implementation of the model.
- Automated triggering: the pipeline is automatically executed in production based on a schedule or in response to a trigger. The output of this stage is a newly trained model that is pushed to the model registry.
- Model continuous delivery: you serve the trained model as a prediction service for the predictions. The output of this stage is a deployed model prediction service.
- Monitoring: you collect statistics on model performance based on live data. The output of this stage is a trigger to execute the pipeline or to execute a new experiment cycle.
The data analysis step is still a manual process for data scientists before the pipeline starts a new iteration of the experiment. The model analysis step is also a manual process.
Building vs buying vs hybrid MLOps infrastructure
Cloud computing companies have invested hundreds of billions of dollars in infrastructure and management.
To give you a bit of context, a canalys report states that public cloud infrastructure spending reached $77.8 billion in 2018, and it grew to $107 billion in 2019. According to another study by IDC, with a five-year compound annual growth rate (CAGR) of 22.3%, cloud infrastructure spending is estimated to grow to nearly $500 Billion by 2023.
Spending on cloud infrastructure services reached a record $30 billion in the second quarter of 2020, with Amazon Web Services (AWS), Microsoft, and Google Cloud accounting for half of customer spend.
From a vendor perspective, AWS market share remained at a “long-standing mark” of around 33% during the second quarter of 2020, followed by Microsoft at 18%, and Google Cloud at 9%. Meanwhile, Chinese cloud providers now account for over 12% of the worldwide market, led by Alibaba, Tencent and Baidu.
These companies invest in research & development of specialized hardware, software, and SaaS applications, but also MLOps software. Two great examples come to mind:
- AWS with its Sagemaker, a fully managed end-to-end cloud ML-platform that enables developers to create, train, and deploy machine-learning models in the cloud, embedded systems, and edge-devices.
- Google with its recently announced AI Platform Pipelines for building and managing ML pipelines, leveraging TensorFlow Extended (TFX’s) pre-built components and templates that do a lot of model deployment work for you.
Now, should you build or buy your infrastructure? Maybe you should go hybrid?
Tech companies that want to survive long-term usually have in-house teams and build custom solutions. If they have the skills, knowledge, and tools to tackle complex problems, there’s nothing wrong with that approach. But there are other factors that are worth taking into account, like:
- time and effort
- human resources
- time to profit
- opportunity cost.
May interest you
Time and effort
According to a survey by cnvrg.io, data scientists often spend their time building solutions to add to their existing infrastructure in order to complete projects. 65% of their time was spent on engineering heavy, non-data science tasks such as tracking, monitoring, configuration, compute resource management, serving infrastructure, feature extraction, and model deployment.
This wasted time is often referred to as ‘hidden technical debt’, and is a common bottleneck for machine learning teams. Building an in-house solution, or maintaining an underperforming solution can take from 6 months to 1 year. Even once you’ve built a functioning infrastructure, just to maintain the infrastructure and keep it up-to-date with the latest technology requires lifecycle management and a dedicated team.
Operationalizing machine learning requires a lot of engineering. For a smooth machine learning workflow, each data science team must have an operations team that understands the unique requirements of deploying machine learning models.
Investing in an end-to-end MLOps platform, these processes can be completely automated, making it easier for operations teams to focus on optimizing their infrastructure.
Having a dedicated operations team to manage models can be expensive on its own. If you want to scale your experiments and deployments, you’d need to hire more engineers to manage this process. It’s a major investment, and a slow process to find the right team.
An out-of-the-box MLOps solution is built with scalability in mind, at a fraction of the cost. After calculating all the different costs associated with hiring and onboarding an entire team of engineers, your return on investment drops, which brings us to our next factor.
Time to profit
It can take over a year to build a functioning machine learning infrastructure. It can take even longer to build a data pipeline that can produce value for your organization.
Companies like Uber, Netflix, and Facebook have dedicated years and massive engineering efforts to scale and maintain their machine learning platforms to stay competitive.
For most companies, an investment like this is not possible, and also not necessary. The machine learning landscape has matured since Uber, Netflix and Facebook originally built their in-house solutions.
There are more pre-built solutions that offer all you need out-of-the-box, at a fraction of the cost. For example, cnvrg.io customers can deliver profitable models in less than 1 month. Instead of building all the infrastructure necessary to make their models operational, data scientists can focus on research and experimentation to deliver the best model for their business problem.
As mentioned above, one survey shows that 65% of a data scientist’s time is spent on non-data science tasks. Using an MLOps platform automates technical tasks and reduces DevOps bottlenecks.
Data scientists can spend their time doing more of what they were hired to do – deliver high-impact models – while the cloud provider takes care of the rest.
Adopting an end-to-end MLOps platform has a considerable competitive advantage that allows your machine learning development to scale massively.
What about Hybrid MLOps infrastructure?
Some companies have been entrusted with private & sensitive data. It can’t leave their servers because in the chance of a small vulnerability, the ripple effect would be catastrophic. This is where Hybrid cloud infrastructure for MLOps comes in.
At the moment, cloud infrastructure exists side-by-side with on-premise systems in most cases.
Hybrid cloud management is complex, but often necessary. According to the 2020 Cloud infrastructure report by Cloudcheckr, today’s infrastructure is a mix of cloud and on-prem.
Cloud infrastructure is increasingly popular, but it’s still rare to find a large company that has completely abandoned on-premise infrastructure (most of them for obvious reasons, like sensitive data).
Another study by RightScale shows that Hybrid cloud adoption grew to 58% in 2019 from 51% in 2018. It’s understandable because there’s a wide range of reasons for continuing to keep infrastructure on-prem.
Why does your company keep maintaining on-prem infrastructure?
Managing hybrid infrastructure is challenging
It’s not a walk in the park to manage any type of enterprise technology infrastructure. There are always issues related to security, performance, availability, cost, and much more.
Hybrid cloud environments add an additional layer of complexity that makes managing IT even more challenging.
The vast majority of cloud stakeholders (96%) face challenges managing both on-prem and cloud infrastructure.
What challenges does your company face in managing both on-prem and cloud infrastructure?
“Other” issues reported included the need for a completely different skill set, lack of access to specialized compute and storage. Also, having to shift existing employees roles to dedicate them to manage the on-prem systems and finally dealing with ongoing reliability issues of the same (i.e. Timeout, Data resource missing, Computing resource missing, Software failure, Database failure, Hardware failure, and Network failure).
Building your own platform and infrastructure will take more and more of your focus and attention as demand increases. The time that could be spent on model R&D and data collection will be taken by infrastructure management. This isn’t great unless it’s part of your core business (if you’re a cloud service provider, PaaS or IaaS).
Buying a fully managed platform gives you great flexibility and scalability, but then you’re faced with compliance, regulations, and security issues.
Hybrid cloud infrastructure for MLOps is the best of both worlds, but it poses unique challenges, so it’s up to you to decide if it fits your business model.
Note: I have a few ideas on possible future directions on securing, streaming, allowing statistical studies on sensitive data, but that’s a different topic for a future article perhaps.
Now that you have identified which level your company is at, you can go with one of two MLOps solutions:
- Custom-built MLOps solution (the ecosystem of tools)
End-to-end MLOps solution
These are fully managed services that provide developers and data scientists with the ability to build, train, and deploy ML models quickly. The top commercial solutions are:
- Amazon Sagemaker, a suite of tools to build, train, deploy, and monitor machine learning models
- Microsoft Azure MLOps suite:
- Google Cloud MLOps suite:
Custom-built MLOps solution (the ecosystem of tools)
End-to-end solutions are great, but you can also build your own with your favorite tools, by dividing your MLOps pipeline into multiple microservices.
This approach can help you avoid a single point of failure (SPOF), and make your pipeline robust — this makes your pipeline easier to audit, debug, and more customizable. In case a microservice provider is having problems, you can easily plug in a new one.
The most recent example of SPOF was the AWS outage, it’s very rare but it can happen. Even Goliath can fall.
Microservices ensure that each service is interconnected instead of embedded together. For example, you can have separate tools for model management and experiment tracking.
Finally, there are many MLOps tools available, I’m just going to mention my top 7 picks with one honorable mention:
- Project Jupyter
- Honorable mention: neptune.ai (for its easy and fast experiment tracking and compatibility with a lot of tools like Sagemaker and MLflow; if there isn’t an integration guide or pre-built solution, you can use their Python client API to build a custom integration)
By leveraging these and many other tools, you can build an end-to-end solution by joining various micro-services together.
For more detailed information on the best MLOps tools available, see Best MLOps Tools by Jakub Czakon.
MLOps is a fresh area that’s rapidly developing, with new tools and processes coming out all the time. If you get on the MLOps train now, you’re gaining a huge competitive advantage.
In order to help you do so, below is a ton of references for you to check out and devour. Have fun!
Special thanks to my dear friend Richaldo Elias whom I mentioned in the introduction. He always brings up topics or problems that inspire my creativity, and this article wouldn’t have been the same without him sharing some of the issues that he has had while building ML Projects at Scale.
- NOTES FROM THE AI FRONTIER MODELING THE IMPACT OF AI ON THE WORLD ECONOMY
MLOps – methods and tools
- https://towardsdatascience.com/a-simple-mlops-pipeline-on-your-local-machine-db9326addf31 (Recommended for DIY die-hards)
MLOps best practices
- Software for ML
- Google’s Rules of ML
- Governance Objectives:
Build vs Buy vs Hybrid