Every machine learning model always has a model life cycle behind the scenes – it starts with data versioning, then data validation, often some pre-processing (to map real-world data to something that our model understands), model training, architecture, some tuning, validation for analysis, and finally deployment.
If you’re a data scientist, you will be repeating this cycle over and over again, so it makes sense to automate this. Plus, another big problem for data scientists is that most ML projects never go beyond the experimental phase.
So, as a solution to these problems, a new trend has emerged: MLOps.
In this article, we will explore MLOps, and compare popular MLOps platforms, both managed and open-source.
Table of contents
What is MLOps?
The name MLOps is a fusion of “machine learning” and “operations”. MLOps is the intersection of machine learning, DevOps and data engineering.
It’s a set of methods for automating the lifecycle of ML algorithms in production. This way you have automation and monitoring at all steps of ML system construction, from initial model training to deployment and retraining against new data.
With MLOps, data scientists and IT teams collaborate and combine skills, techniques, and tools used in data engineering, machine learning, and DevOps. It promotes rapid innovation through robust machine learning lifecycle management.
The Big Question is: do any tools manage all parts of the machine learning life cycle? Well, yes and no.
In fact, for this article, we’ll be looking at end-to-end MLOps platforms. Most of these platforms offer powerful tools for managing ML pipelines from model training to deployment and beyond, but it’s important to mention that data collection and labeling are left for other tools built specifically for these tasks.
Enough introduction. Let’s look at some awesome end-to-end systems to manage machine learning pipelines.
MLOps platforms overview
There are several MLOps frameworks for managing the life cycle of machine learning. Here are the top 11 end-to-end MLOps platforms:
Securely govern your machine learning operations with a healthy ML lifecycle.
An end-to-end enterprise-grade platform for data scientists, data engineers, DevOps, and managers to manage the entire machine learning & deep learning product life-cycle.
An end-to-end machine learning platform to build and deploy AI models at scale.
Platform democratizing access to data and enabling enterprises to build their own path to AI.
AI platform that democratizes data science and automates the end-to-end ML at scale.
An open source leader in AI with a mission to democratize AI for everyone.
Automates MLOps with end-to-end machine learning pipelines, transforming AI projects into real-world business outcomes.
Dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable, and scalable.
Combines data lineage with end-to-end pipelines on Kubernetes, engineered for the enterprise.
A platform for reproducible and scalable machine learning and deep learning on Kubernetes.
Takes you from POC to production while managing the whole model lifecycle.
Comparison based on MLOps tasks
To compare platforms from our list, let’s group them into the following four categories:
- Data and Pipeline Versioning: doing data management with version control for datasets, features, and their transformations
- Model and Experiment Versioning: managing model lifecycle (tracking experiments, parameters, metrics, and the performance of model training runs)
- Hyperparameter Tuning: doing systematic optimization of hyperparameter values for a given model
- Model Deployment and Monitoring: managing which model is deployed in production and tracking performance
Which tasks do our platforms take care of?
This comparison poses a tricky question: is it better to choose a platform that does all the tasks, or stitch multiple specialized platforms together?
It doesn’t make it easier that each platform can address one task with various levels of depth.
Comparison based on supported libraries
Every data scientist uses different programming languages, libraries, and frameworks for developing ML models. So, we need an MLOps platform that supports the libraries in your project. Let’s compare our platforms based on a list of popular libraries and frameworks.
Open table in new window
In my opinion, a data science engineer cares the most about the support level for their programming language, in this case especially for Python and/or R.
The table tells us that Pachyderm and Algorithmia cover more libraries compared to other platforms. TensorFlow is clearly the most supported library.
Comparison of MLOps platforms based on CLI and GUI
In this section, the focus of our comparison will shift to the expertise of data scientists.
Some MLOps platforms focus on capabilities with less engineering expertise to build and deploy ML models. They focus on the GUI (graphical user interface), a visual tool allowing access through a web client.
Other platforms target expert data scientists with engineering expertise. These platforms tend to use a command-line interface (CLI) or API when integrating the platform with existing tools, and a web UI might not be important for expert users.
The table below contains an approximate and personal comparison of whether the usage of each platform is designed fundamentally around CLI or GUI.
A range of platforms like cnvrg.io and Volahai come with more CLI focus with further GUI support. Other platforms, namely Datarobot come with a GUI focus. However, most managed platforms fall somewhere between, e.g Allegro.ai and Iguazio.
List of MLOps platforms
Algorithmia manages all stages of the ML lifecycle within existing operational processes. It puts models into production quickly, securely, and cost-effectively. The platform automates ML deployment, provides maximum tooling flexibility, optimizes collaboration between operations and development, leverages existing SDLC and CI/CD practices, and includes advanced security and governance features.
- Easy deployment and hassle-free
- Version management: useful for testing any version.
- GPU support
- Currently, Algorithmia does not support SAS.
- High cost for startups
Allegro is a pioneering end-to-end enterprise-grade platform for data scientists, data engineers, DevOps, and managers to manage experiments, orchestrate workloads, and manage data, all in a simple tool that integrates with whatever toolchain a team is using already. The company’s platform supports on-prem, private cloud, multi-cloud tenants, and custom configurations. Continuous learning and model personalization for an indefinite number of devices.
- Fully differentiable data management & version control solution on top of object-storage (S3/GS/Azure/NAS)
- Automagical experiment tracking, environments and results
- Automation, Pipelines & Orchestration solution for ML/DL jobs
- Lacking a bit in term of customizability
- Does not support R language
Cnrvg.io is an end-to-end platform that manages, builds, and automates the entire ML life cycle from research to production. Actually, it’s designed by data scientists and built to organize every stage of a data science project, including research, information collection, code writing, and model optimization.
- Platform that allows users to build compact AI models in just a few clicks
- Adaptable to most libraries, and frameworks
- There are some missing features like customizable templates, predictive analytics and problem management, etc.
Dataiku democratizes access to data and enables enterprises to build their own path to AI in a human-centric way. It lets you create, share, and reuse applications that leverage data and machine learning to extend and automate decision-making. The platform provides a common ground for data experts and explorers, a repository of best practices, shortcuts to machine learning and AI deployment/management, and a centralized, controlled environment.
- The best tool for data cleaning and transformation according to different business requirements.
- The user interface is intuitive and allows you to upload data into a project with a few clicks.
- Does not scale well for more number of users
- Could have better support on platform installation and maintenance
DataRobot is the leading end-to-end enterprise AI platform that automates and accelerates every step of your path from data to value. It’s a central hub to deploy, monitor, manage, and govern machine learning models in production to maximize the investments in data science teams and to manage risk and regulatory compliance.
– Ease of use for IT organizations with a good company support
– The Ability to easily build machine learning models algorithms ranging from simplistic regressions to complex gradient boosted trees
– Inputting a big data may takes a lot of time
– Lack connectors to RDBMS type databases like mysql or postgres for data sources
H2O.ai is the open source leader in AI and automatic machine learning with a mission to democratize AI for everyone. It offers a platform with data manipulation, various algorithms, cross-validation, grid search for hyperparameter tuning, feature ranking, and model serialization. Furthermore, it helps data scientists across the world in every industry to be more productive and to deploy models in a quicker, simpler, and cheaper way.
- Top-quality open source tool, including the H2O-3 and AutoML families.
- The interfaces with R and Python enable a smooth transition of pre-existing workflows into the H2O framework.
- The combination of proprietary and open-source tools, Driverless AI and H2O, provide tools across a full range of use cases.
- H2O Frames have very limited data processing options compared to python pandas or pyspark dataframes.
- H20 bugs do not return human-readable debugging statements.
Iguazio is a Data Science Platform to automate machine learning pipelines. It accelerates the development, deployment, and management of AI applications at scale with MLOps and end-to-end automation of machine learning pipelines. Which enables data scientists to focus on delivering better, and more powerful solutions instead of spending their time on infrastructure. We should mention that it uses Kubeflow for workflow orchestration.
- The capability to deploy in seconds from a notebook or IDE
- Integrated with most popular frameworks and ML libraries
- Miss the scenario of a CI/CD pipeline
Kubeflow is a platform for data scientists who want to build and experiment with ML pipelines. It is also for ML engineers and operational teams who want to deploy ML systems to various environments for development, testing, and production-level serving. Kubeflow is an open-source Kubernetes-native platform to facilitate the scaling of ML models. Plus, it’s a cloud-native platform based on Google’s internal ML pipelines. The project is dedicated to making deployments of ML workflows on Kubernetes simple, portable, and scalable. It can be used with other MLOps platforms as a complementary tool.
- Multi-framework integration
- Perfect for Kubernetes users
- Hard to set up and configure manually.
- High availability is not automatic and needs to be manually configured.
👉 See comparison between Neptune and Kubeflow.
Pachyderm is a robust MLOps tool that lets users control an end-to-end machine learning cycle. From data lineage, through building and tracking experiments, to scalability options. In fact, it’s a simple choice for data scientists and teams because of its prompt and accurate tracking knowledge and reproducibility skills. It helps develop scalable ML/AI pipelines and, as we saw in our comparison based on supported libraries, it’s highly flexible with the most languages, frameworks, and libraries.
- Integrated with most popular frameworks and ML libraries
- It can keep branches of your data sets when you are testing new transformation pipelines
- Based on containers, which makes your data environments portable and easy to migrate to different cloud providers.
- More of a learning curve due to so many moving parts, such as the Kubernetes server required to manage Pachyderm’s free version.
👉 See comparison between Neptune and Pachyderm.
Polyaxon is a platform for automating and reproducing deep learning and machine learning applications on Kubernetes. It lets users iterate faster on their research and model creation. The platform includes a wide range of features, from tracking and optimization of experiments to model management and regulatory compliance. It allows workload scheduling with smart container and node management, and turns GPU servers into shared, self-service resources for your team or organization.
- Possibility to adapt the software version to your own needs
- End-to-end process support
- Makes it easy to schedule training on a Kubernetes cluster
- Missing some features
👉 See comparison between Neptune and Polyaxon.
Valohai is a deep learning management platform that helps enterprises automate deep learning infrastructure. The platform enables data scientists to manage machine orchestration, version control, and data pipelines. It makes DL development auditable, reducing compliance risk and cutting labor & infrastructure costs.
Valohai offers a host of features including parallel hypermeter sweeps, custom scripts, training sessions visualization, data exploration, Jupyter Notebook extension, deployment, and production monitoring. The platform allows users to build models with multiple central processing units (CPUs) or graphics processing units (GPUs) on cloud or on-premise environments. Plus, it’s compatible with any language or framework, along with many different tools and apps. Valohai is also teamwork-oriented software, which helps team leaders manage collaboration, share projects, assign members, track experiment progress, and view real-time data models.
- Allows easy management for deep learning
- Full and automatic version control for the models
- Helpful customer service and monthly checkup
- High cost for startups
There are several MLOps platforms for managing the life cycle of machine learning. Make sure you take relevant factors into consideration when selecting the platform.
Throughout this article, I’ve explored different factors to consider in the decision process that best matches your given needs. I hope this helps you make a decision.
Now that you have the list of the best end-to-end platforms, it all boils down to your specific use case.
The Best MLOps Tools and How to Evaluate Them
12 mins read | Jakub Czakon | Updated August 25th, 2021
In one of our articles—The Best Tools, Libraries, Frameworks and Methodologies that Machine Learning Teams Actually Use – Things We Learned from 41 ML Startups—Jean-Christophe Petkovich, CTO at Acerta, explained how their ML team approaches MLOps.
According to him, there are several ingredients for a complete MLOps system:
- You need to be able to build model artifacts that contain all the information needed to preprocess your data and generate a result.
- Once you can build model artifacts, you have to be able to track the code that builds them, and the data they were trained and tested on.
- You need to keep track of how all three of these things, the models, their code, and their data, are related.
- Once you can track all these things, you can also mark them ready for staging, and production, and run them through a CI/CD process.
- Finally, to actually deploy them at the end of that process, you need some way to spin up a service based on that model artifact.
It’s a great high-level summary of how to successfully implement MLOps in a company. But understanding what is needed in high-level is just a part of the puzzle. The other one is adopting or creating proper tooling that gets things done.
That’s why we’ve compiled a list of the best MLOps tools. We’ve divided them into six categories so you can choose the right tools for your team and for your business. Let’s dig in!Continue reading ->