You don’t need to spend a lot on MLOps tools to bring the magic of DevOps to your machine learning projects. There is plenty of open-source tools to choose from. It’s a good solution when you’re trying to address unique problems and a community to rely on is needed. But there are some prons to open source tools too.
- First, be careful—open-source tools aren’t always 100% free all of the time. For example, Kuberflow has client and server components, and both are open. However, some tools might open-source only one of these components. The client is open, but the vendor controls everything server-side.
- Free open-source tools can cost you in other ways too. If you consider that you have to host and maintain the tool long-term, you’ll find that open-source can be quite costly after all.
- Finally, if something goes awry, you probably won’t have 24/7/365 vendor support to rely on. Community can help you but, obviously, they don’t bear any responsibility for the result you’re left with.
Ultimately, open-source tools can be tricky. Before you choose the tool for your project, you need to carefully study its pros and cons. Moreover, you need to make sure that the tools work well with the rest of your stack. This is why I prepared a list of popular and community-approved MLOps tools for different stages of the model development process.
Top 18 MLOps open-source tools
In this list, you’ll find full-fledged platforms for machine learning, as well as specialized tools that will help you with data exploration, deployment, and testing.
Full-fledged platforms contain tools for all stages of the machine learning workflow. Ideally, once you get a full-fledged tool, you won’t have to set up any other tools. In practice, it depends on the needs of your project and personal preferences.
Almost immediately after Kubernetes established itself as the standard for working with a cluster of containers, Google created Kubeflow—an open-source project that simplifies working with ML in Kubernetes. It has all the advantages of this orchestration tool, from the ability to deploy on any infrastructure to managing loosely-coupled microservices, and on-demand scaling.
This project is for developers who want to deploy portable and scalable machine learning projects. Google didn’t want to recreate other services. They wanted to create a state-of-the-art open-source system that can be applied alongside various infrastructures—from supercomputers to laptops.
With Kuberflow, you can benefit from the following features:
- Jupyter notebooks
Create and customize Jupyter notebooks, immediately see the results of running your code, create interactive analytics reports.
- Custom TensorFlow job operator
This functionality helps train your model, and apply a TensorFlow or Seldon Core serving container to export the model to Kubernetes.
- Simplified containerization
Kuberflow eliminates the complexity involved in containerizing the code. Data scientists can perform data preparation, training, and deployment in less time.
All in all, Kuberflow is a full-fledged solution for the development and deployment of end-to-end ML workflows.
MLflow is an open-source platform for machine learning engineers to manage the ML lifecycle through experimentation, deployment, and testing. MLflow comes in handy when you want to track the performance of your models. It’s like a dashboard, one place where you can:
- monitor the ML pipeline,
- store model metadata, and
- pick the best-performing model.
Right now, there are four components provided by MLflow:
The MLflow Tracking component is an API and UI for logging parameters, code versions, metrics, and output files for running the code and visualizing the results. You can do log and query experiments using Python, REST, R API, and Java APIs. You can also record the results.
MLflow Project is a tool for ML teams to package data science code in a reusable and reproducible way. It comes with an API and command-line tools to connect projects into workflows. It helps you run projects on any platform.
MLflow Model makes it easy to package machine learning models to be used by various downstream tools, like Apache Spark. With this, deploying machine learning models in diverse serving environments is much more manageable.
Overall, users love MLflow because it’s easy to use locally without a dedicated server, and has a fantastic UI where you can explore your experiments.
Netflix created Metaflow as an open-source MLOps platform for building and managing large-scale, enterprise-level data science projects. Data scientists can use this platform for end-to-end development and deployment of their machine learning models.
- Great library support
Metaflow supports all popular data science libraries, like TensorFlow and scikit-learn, so you can keep using your favorite tool. Metaflow supports Python and R, making it even more flexible in terms of library and package choice.
- Powerful version control toolkit
What is excellent about Metaflow is that it versions and keeps track of all your experiments automatically. You won’t lose anything important, and you can even inspect the results of all the experiments in notebooks.
As it was mentioned above, Metaflow was specifically created for large-scale machine learning development. The AWS cloud powers the solution, so there are built-in integrations to storage, compute, and machine learning services from AWS if you need to scale. You don’t have to rewrite or change the code to use any of it.
MLReef is an MLOps platform for teams to collaborate and share the results of their machine learning experiments. Projects are built on reusable ML modules realized either by you or by the community. This boosts the speed of development and makes the workflow more efficient by promoting concurrency.
MLReef provides tools in four directions:
- Data management
You have a fully-versioned data hosting and processing infrastructure for setting up and managing your models.
- Script repositories
Every developer has access to containerized and versioned script repositories that you can use in your machine learning pipelines.
- Experiment management
You can use MLReef for experiment tracking across different iterations of your project.
This solution helps you optimize pipeline management and orchestration, automating routine tasks.
Moreover, MLReef feels welcoming to projects of any size. Newcomers can use it for small-scale projects, experienced developers―for small, medium-sized, and enterprise projects.
If you don’t have much experience developing ML models, you’ll find a user-friendly interface and community support for whatever problem you may face.
MLReef lets you build your project on Git while taking care of all the DevOps mess for you. You can easily monitor progress and outcomes in an automated environment.
MLReef for enterprise is easy to scale and control on the cloud or on-premises.
All in all, MLReef is a convenient framework for your ML project. With just a couple of easy setups, you’ll be able to develop, test, and optimize your ML solution brick-by-brick.
Kedro is a Python framework for machine learning engineers and data scientists to create reproducible and maintainable code.
This framework is your best friend if you want to organize your data pipeline and make ML project development much more efficient. You won’t have to waste time on code rewrites and will have more opportunities for focusing on robust pipelines. Moreover, Kedro helps teams establish collaboration standards to limit delays and build scalable, deployable projects.
Kedro has many good features:
- Project templates
Usually, you have to spend a lot of time understanding how to set up your analytics project. Kedro provides a standard template that will save you time.
- Data management
Kedro will help you load and store data to stop being alarmed about the reproducibility and scalability of your code.
- Configuration management
This is a necessary tool when you’re working with complex software systems. If you don’t pay enough attention to configuration management, you might encounter serious reliability and scalability problems.
Kedro promotes a data-driven approach to ML development and maintains industry-level standards while decreasing operational risks for business.
May be useful
If you want to have all the benefits of a nicely organized Kedro pipeline with a powerful Neptune UI for organizing and comparing ML metadata generated in pipelines and node – check Neptune-Kedro plugin.
Tools for development and deployment
Tools for development and deployment in MLOps automate routine tasks of manual deployment across multiple environments. You can deploy via tools that are more convenient for you, depending on the platform stack that you use.
ZenML is an MLOps framework for orchestrating your ML experiment pipeline. It provides you with tools to:
- Preprocess data
ZenML helps you convert raw data into analysis-ready data.
- Train your models
Among other tools for convenient training, the platform uses declarative pipeline configs, so you can switch between on-premise and cloud environments easily.
- Conduct split testing
ZenML creators claim that the platform’s key benefits are automated tracking of the experiments and guaranteed comparability between experiments.
- Evaluate the results
XML focuses on making ML development reproducible and straightforward for both individual developers and large teams.
This framework frees you from all the troubles of delivering machine learning models with traditional tools. If you struggle with providing enough experiment data that prove the reproducibility of results, want to reduce waste and make the reuse of code simpler, ZenML will help.
MLRun is a tool for ML model development and deployment. If you’re looking for a tool that conveniently runs in a wide variety of environments and supports multiple technology stacks, it’s definitely worth a try. MLRun offers a comprehensive approach to managing data pipelines.
MLRun has a layered architecture that offers the following powerful functionality:
- Feature and artifact store
This layer helps you to handle the preparation and processing of data and store it across different repositories.
- Elastic serverless runtimes layer
Convert simple code into microservices that are easy to scale and maintain. It’s compatible with standard runtime engines like Kubernetes jobs, Dask, and Apache Spark.
- Automation layer
For you to concentrate on training the model and fine-tuning the hyperparameters, the pipeline automation tool helps you with data preparation, testing, and real-time deployment. You’ll only need to provide your supervision to create a state-of-the-art ML solution.
- Central management layer
Here, you get access to a unified dashboard to manage your whole workflow. MLRun has a convenient user interface, a CLI, and an SDK that you can access anywhere.
With MLRun, you can write code once and then use automated solutions to run it on different platforms. The tool manages the build process, execution, data movement, scaling, versioning, parameterization, output tracking, and more.
CML (Continuous Machine Learning) is a library for continuous integration and delivery (CI / CD) of machine learning projects. The library was developed by the creators of DVC, an open-source library for versioning ML models and experiments. Together with DVC, Tensorboard, and cloud services, CML should facilitate the process of developing and implementing ML models into products.
- Automate pipeline building
CML was designed to automate some of the work of ML engineers, including training experiments, model evaluation, datasets, and their additions.
- Integrate APIs
The tool is positioned as a library that supports GitFlow for data science projects, allows automatic generation of reports, and hides complex details of using external services. Examples of external services include cloud platforms: AWS, Azure, GCP, and others. For infrastructure tasks, DVC, docker, and Terraform are also used. Recently, there is an infrastructural aspect of ML projects attracting more attention.
The library is flexible and provides a wide range of functionality; from sending reports and publishing data, to distributing cloud resources for a project.
9. Cortex Lab
Cortex Labs is an early-stage startup developed by Berkeley scientists. They’re working on a convenient tool for neuroscientists, to help them understand how the brain works. The uses of this application turn out to be much broader.
The Cortex Lab project is interesting because it allows you to deploy, manage, and scale containers without worrying about Kubernetes. It provides containers as a service on AWS.
If you decide to include Cortex in your tech stack, you will benefit from the following features:
- Serverless workloads
Cortex can process requests in real-time and autoscale based on in-flight request volumes.
- Automated cluster management
Cortex makes it easy to scale with cluster autoscaling. It’s easy to create clusters with different parameters.
- CI/CD and observability integrations
Cortex allows you to create provision clusters with declarative configuration.
Cortex Lab is built to integrate smoothly with AWS. It runs on top of EKS to scale workloads reliably and cheaply. You can use this tool to deal with data-intensive models for image and video processing. Microservice architecture makes it easy to scale without any resource limits.
10. Seldon Core
Seldon Core is a platform for ML model deployment on Kubernetes. This tool helps developers build models in a robust Kubernetes environment, with features like custom resource definitions to manage model graphs. You can also merge this tool with your continuous integration and deployment tools.
- Build scalable models
Seldon core can convert your model built on TensorFlow, PyTorch, H2O, and other frameworks into a scalable microservice architecture based on REST/GRPC.
- Monitor model performance
It will handle scaling for you, and give you advanced solutions for measuring model performance, detecting outliers, and conducting A/B testing out-of-the-box.
- Robust and reliable
Seldon Core can boast the robustness and reliability of a system supported through continuous maintenance and security policy updates.
Optimized servers provided by Seldon Core allow you to build large-scale deep learning systems without having to containerize them or worry about their security.
AutoKeras is an open-source library for Automated Machine Learning (AutoML). With AutoML frameworks, you can automate the processing of raw data, choose a machine learning model, and optimize the hyperparameters of the learning algorithm.
- Streamline ML model development
AutoML reduces the biases and variances that happen when humans develop machine learning models, and streamlines the development of a machine learning model.
- Enjoy automated hyperparameter tuning
AutoKeras is the tool that provides functionality to match the architecture and hyperparameters of deep learning models automatically.
- Build flexible solutions
AutoKeras is most famous for its flexibility. In this case, the code you write will be executed regardless of the backend. It supports Theano, Tensorflow, and other frameworks.
AutoKeras has several training datasets inside. They’re already put in a form that’s convenient for work, but it doesn’t show you the full power of AutoKeras. In fact, it contains tools for suitable preprocessing of texts, pictures, and time series. In other words, the most common data types, which make the data preparation process much more manageable. The tool also has built-in visualization for models.
12. H2O AutoML
H2O.ai is a software platform that optimizes the machine learning process using AutoML. H2O claims that the platform can train models faster than popular machine learning libraries such as scikit-learn.
H2O is a machine learning, predictive data analytics platform for building machine learning models and generating production code for them in Java and Python, all at the click of a button.
- Implement ML models out-of-the-box
It has implementations of supervised and unsupervised algorithms such as GLM and K-Means, and an easy-to-use web interface called Flow.
- Tailor H2O to your needs
The tool is helpful for both beginner and seasoned developers. It equips the coder with a simple wrapper function that manages modeling-related tasks in a few lines of code. Experienced ML engineers appreciate this function, since it allows them to focus on other, more thought-intensive processes of building models (like data exploration and feature engineering).
Overall, H2O is a powerful tool for solving machine learning and data science problems. Even beginners can extract value from data and build robust models. H2O continues to grow and release new products while maintaining high quality across the board.
Data validation is the process of checking data quality. During this stage, you make sure that there are no inconsistencies or missing data in your sets. Data validation tools automate this routine process and improve the quality of data cleansing.
Hadoop is a freely redistributable set of utilities, libraries, and frameworks for developing and executing programs running on clusters. This fundamental technology for storing and processing Big Data is a top-level project of the Apache Software Foundation.
The project consists of 4 main modules:
- Hadoop Common
Hadoop Common is a set of infrastructure software libraries and utilities that are used in other solutions and related projects, in particular, for managing distributed files and creating the necessary infrastructure.
- HDFS is a distributed file system
Hadoop Distributed File System is a technology for storing files on various data servers with addresses located on a special name server. HDFS provides reliable storage of large files, block-by-block distributed between the nodes of the computing cluster.
- YARN is a task scheduling and cluster management system
YARN is a set of system programs that provide sharing, scalability, and reliability of distributed applications.
- Hadoop MapReduce
This is a platform for programming and performing distributed MapReduce calculations using many computers that form a cluster.
Today, there’s a whole ecosystem of related projects and technologies in Hadoop used for data mining and machine learning.
Apache Spark helps you to process semi-structured in-memory data. The main advantages of Spark are performance and a user-friendly programming interface.
The framework has five components: a core and four libraries, each solving a specific problem.
- Spark Core
This is the core of the framework. You can use it for scheduling and core I/O functionality.
- Spark SQL
Spark SQL is one of four framework libraries that comes in handy when working with processing data. To run faster, this tool uses DataFrames and can act as a distributed SQL query engine.
- Spark Streaming
This is an easy-to-use streaming data processing tool. It breaks data into micro-batch mode. The creators of Spark claim that performance does not suffer much from this.
This is a high-speed distributed machine learning system. It’s nine times faster than its competitor, the Apache Mahout library, when benchmarked against the alternating least squares (ALS) algorithm. MLlib includes popular algorithms for classification, regression, and recommender systems.
GraphX is a library for scalable graph processing. GraphX is not suitable for graphs that change in a transactional manner, for example, databases.
Spark is entirely autonomous but also compatible with other standard ML instruments, like Hadoop, if needed.
Data exploration software is created for automated data analysis that provides streamlined pattern recognition and easy insights visualization. Data exploration is a cognitively intense process, you need powerful tools that will help you track and execute code as you go.
15. Jupyter Notebook
Jupyter Notebook is a development environment where you can immediately see the result of executing code and its fragments. The difference from a traditional IDE is that the code can be broken into chunks and performed in any order. You can load a file into memory, check its contents separately, and also process the contents separately.
- Multi-language support
Often when we talk about Jupyter Notebook, we mean working with Python. But, in fact, you can work with other languages, such as Ruby, Perl, or R.
- Integration with cloud
The easiest way to start working with a Jupyter Notebook in the cloud is using Google Colab. This means that you just need to launch your browser and open the desired page. After that, the cloud system will allocate resources for you and allow you to execute any code.
The plus is that you don’t need to install anything on your computer. The cloud takes care of everything, and you just write and run code.
Data version control systems
There will be multiple ML model versions before you finish up. To make sure nothing gets lost, use a robust and trustworthy data version control system where every change is trackable.
16. Data Version Control (DVC)
DVC is a tool designed for managing software versions in ML projects. It’s useful both for experimentation and for deploying models to production. DVC runs on top of Git, uses its infrastructure, and has a similar syntax.
- Fully-automated version control
DVC creates metafiles to describe pipelines and versioned files that need to be saved in the Git history of your project. If you transfer some data under the control of DVC, it will start tracking all changes.
- Git-based modification tracking
You can work with data the same way as with Git: save a version, send to a remote repository, get the required version of the data, change and switch between versions. The DVC interface is intuitively clear.
Overall, DVS is an excellent tool for data and model versioning. If you don’t need pipelines and remote repositories, you can version data for a specific project working on a local machine. DVC allows you to work very quickly with tens of gigabytes of data.
However, it also allows you to exchange data and models between teams. For data storage, you can use cloud solutions.
Pachyderm is a Git-like tool for tracking transformations in your data. It keeps track of data lineage and ensures that data is kept relevant.
Pachyderm is useful because it provides:
You want your data to be fully traceable from the moment it’s raw to the final prediction. With its version control for data, Pachyderm gives you a fully transparent view of your data pipelines. It can be a challenge; for example, when multiple transformers use the same dataset, it can be hard to say why you get this or that result.
Pachyderm is a step forward to the reproducibility of your data science models. You will always be assured that your clients can get the same results after the model is handed down to them.
Pachyderm stores all your data in one central location and updates all the changes. No transformation will pass unnoticed.
Testing and maintenance
The final step of ML development is testing and maintenance after the main jobs are done. Special tools allow you to make sure that the results are reproducible in the long run.
If you’re looking for a tool that will take care of tracking and maintenance for your machine learning project, have a look at Flyte. This is a platform for the maintenance of machine learning projects released by Lyft.
- Large-scale project support
Flyte has helped them to execute large-scale computing that’s crucial to their business. It’s not a secret that scaling and monitoring all pipeline changes can be pretty challenging, especially if the workflows have complex data dependencies. Flyte successfully deals with tasks of higher complexity, so developers can focus on business logic rather than machines.
- Improved reproducibility
This tool can also help you be sure of the reproducibility of the machine learning models you build. Flyte tracks changes, does version control, and containerizes the model alongside its dependencies.
- Multi-language support
Flyte was created to support complex ML projects in Python, Java, or Scala.
Flyte has been tested out by Lyft internally before they released it to the public. It has a proven record of managing more than 7,000 unique workflows totaling 100,000 executions every month.
Open-source MLOps tools are necessary. They help you automate a large amount of routine work without costing a fortune. Fully-fledged platforms offer a wide selection of tools for different purposes, for whatever technological stack you might desire. In practice, however, it often turns out that you still need to integrate them with specialized tools that are more intuitive to use. Luckily, most open-source tools make the integration as seamless as possible.
However, an important thing to understand about open-source tools is that you shouldn’t expect them to be completely free of charge: the costs of infrastructure, support, and maintenance of your projects will still be on you.
The Best MLOps Tools and How to Evaluate Them
12 mins read | Jakub Czakon | Updated August 25th, 2021
In one of our articles—The Best Tools, Libraries, Frameworks and Methodologies that Machine Learning Teams Actually Use – Things We Learned from 41 ML Startups—Jean-Christophe Petkovich, CTO at Acerta, explained how their ML team approaches MLOps.
According to him, there are several ingredients for a complete MLOps system:
- You need to be able to build model artifacts that contain all the information needed to preprocess your data and generate a result.
- Once you can build model artifacts, you have to be able to track the code that builds them, and the data they were trained and tested on.
- You need to keep track of how all three of these things, the models, their code, and their data, are related.
- Once you can track all these things, you can also mark them ready for staging, and production, and run them through a CI/CD process.
- Finally, to actually deploy them at the end of that process, you need some way to spin up a service based on that model artifact.
It’s a great high-level summary of how to successfully implement MLOps in a company. But understanding what is needed in high-level is just a part of the puzzle. The other one is adopting or creating proper tooling that gets things done.
That’s why we’ve compiled a list of the best MLOps tools. We’ve divided them into six categories so you can choose the right tools for your team and for your business. Let’s dig in!Continue reading ->