As a Data Scientist, ML/DL Researcher, or Engineer you might have come across or heard about MLflow, Kubeflow, and Neptune. Due to the large adoption of ML and DL, many questions arose around deployment, scalability, and reproducibility. Thus MLOps was born as a hybrid of Data engineering, DevOps, and Machine Learning.
We had to come up with this new way of doing this for ML because ML Development is complex.
The natural question is why?
Naturally, you might think it’s because Math, Algorithms, resources needed (GPUs, TPUs, CPUs…), data, APIs, libraries, and frameworks. Well, some of it is true but not entirely because nowadays most of it abstracted away for us. If we take Hugging face or fast.ai for example you just call an instance of a particular class and boom the framework/library does all the heavy lifting for you. Furthermore, with the development of transfer learning we no longer need vast amounts of data to train a model.
Then where does the complexity come from?
The complexity comes from a few of things:
- ML is experimental in nature
- It has more parts to account for, such as: data (gathering, labelling, versioning), model (training, eval, versioning, and deployment), and configuration (hyperparameters and so on).
- The paradigm of how we do traditional software development (DevOps) is different from how we do ML (MLOps).
As MLOps matures many tools have been and are being created to address different parts of the workflow and of the many these 3 tools play key roles in an MLOps workflow to reduce the complexity and solve problems which we are going to talk about in later sections.
Now, what exactly do they do and how do they compare against each other?
In this article, we are going to answer those questions and more. The following are the points we are addressing:
- Which one should you use and when?
- High-level feature comparison table
Let’s dive right in!
It is an open-source MLOps platform that was born from learning the standards of Big Tech with the focus on creating transferable knowledge, ease of use, modularity and compatibility with popular ML libraries and frameworks. It was designed for a 1 or 1000+ person organisation.
MLFlow allows you to develop, track (and compare experiments), package and deploy locally or remotely. It handles everything from data versioning, model management, experiment tracking till deployment except data sourcing, labeling and pipelining.
It is pretty much the jack of all trades and/or swiss knife of the MLOps workflow.
This platform is made of a of 4 components:
- MLflow Tracking
- MLflow Projects
- MLflow Models
- Just Model Registry
Let’s go deeper and see the importance of every single one of these components and how they work.
The MLflow Tracking component is an API and UI for logging parameters, code versions, metrics, and output files when running your machine learning code and for later visualizing and comparing the results. MLflow Tracking lets you log and query experiments using Python, REST, R API, and Java API APIs.
As mentioned before MLFlow allows for local or remote development, therefore both entity and artifact store are customisable meaning you can save locally or on the cloud ( AWS s3, GCP and so on)
Key concepts in Tracking
- Parameters: key-value inputs to your code
- Metrics: numeric values (can be update overtime)
- Tags & Notes: information about the run
- Artifacts: Files, Data & Models
- Source: what code ran?
- Version: what version of the code ran?
- Run: an instance of code that run by MLFlow where metrics and parameters will be logged
- Fluent MLFlow APIs (High-level)
- MLFlow client (Low-level)
An MLflow Project is a self contained unit of execution that bundles the following:
To deploy it either locally or on a remote server.
This format helps with reproducibility and allows for the creation of a multi-step workflow with separate projects (or entry points in the same project) as the individual steps.
In other words MLflow Projects are just a convention for organizing and describing your code to let other data scientists (or automated tools) run it. Each project is simply a directory of files, or a Git repository, containing your code. MLflow can run some projects based on a convention for placing files in this directory (for example, a conda.yaml file is treated as a Conda environment), but you can describe your project in more detail by adding a MLproject file, which is basically a YAML formatted text file.
An MLflow Model is a standard format for packaging machine learning models that can be used in a variety of downstream tools—for example, real-time serving through a REST API or batch inference on Apache Spark. The format defines a convention that lets you save a model in different “flavors” that can be understood by different downstream tools.
Flavors are the key concept that makes MLFlow Models powerful: they are a convention that deployment tools can use to understand the model. Basically we abstract the model by creating an intermediate format that packages the model that you want to deploy into a variety of environments — much like a docker file for models or a lambda function that you can deploy to a desired environment and just invoke its scoring function called predict.
The MLflow Model Registry component is a centralized model store, set of APIs, and UI, to collaboratively manage the full lifecycle of an MLflow Model. It provides model lineage (which MLflow experiment and run produced the model), model versioning, stage transitions (for example from staging to production), and annotations.
Kubeflow is an open-source project that leverages Kubernetes to build scalable MLOps pipelines and orchestrate complicated workflows. You can view it as a machine learning (ML) toolkit for Kubernetes.
Note: Kubernetes (or K8s for short) is a container orchestration tool.
Now, two questions arise:
- Why containerize your ML applications?
- Why ML on K8s?
Why containerize you ML applications
Usually environments are different for different people in a team setting and these differences can go as far as:
- Dependencies (Libraries, Frameworks and versions)
- Code (helper functions, Training and evaluation)
- Configurations (data transformations, network architecture, batch size and so on)
- Software and Hardware
This can cause various problems if two or members are to collaborate or take after someone’s work and make improvements.
But through containers one can simply send a docker image and as long as the other person has docker installed locally or in his cloud env. he can easily recreate the same environment, experiments and results.
Benefits of containers
- Helps create ML envs that are:
Why ML on K8s?
As I mentioned before K8s is a container orchestration tool. It makes automating deployment, scaling, and management of containerized applications. But the trouble is in managing k8s itself which can be heptic. But nowadays there exist different providers of managed k8s as a service such as: AWS EKS, Google GKE and Azure AKS.
Using a managed k8s as a service allows ML practitioners to take full advantage of the benefits that k8s bring such as:
- Or it’s already part of the company or team workflow
Now that we got that out of the way, let’s take a more detailed look at Kubeflow.
Kubeflow is composed of various projects/tools but here we are going to focus on the 4 major ones:
Kubeflow includes services to create and manage interactive Jupyter notebooks. You can customize your notebook deployment and your compute resources to suit your data science needs. Experiment with your workflows locally, then deploy them to a cloud when you’re ready.
This is perhaps the most famous project and the reason a lot of teams opt for kubeflow. In a nutshell kubeflow pipelines is a platform for building and deploying portable, scalable machine learning (ML) workflows based on Docker containers – it is available as a kubeflow component or as a standalone installation.
At the heart of this project lie two components:
- Pipeline – is a description of an ML workflow, including all of the components in the workflow and how they combine in the form of a graph. The pipeline includes the definition of the inputs (parameters) required to run the pipeline and the inputs and outputs of each pipeline component.
- Pipeline component – is a self-contained set of user code, packaged as a Docker image, that performs one step in the pipeline. For example, a component can be responsible for data preprocessing, data transformation, model training, and so on.
- A user interface (UI) for managing and tracking experiments, jobs, and runs.
- An engine for scheduling multi-step ML workflows.
- An SDK for defining and manipulating pipelines and components.
- Notebooks for interacting with the system using the SDK.
- Reusability: enabling you to re-use components and pipelines without having to rebuild each time.
This project offers you different frameworks for training ML models such as:
- Chainer Training
- MPI Training
- MXNet Training
- PyTorch Training
- Job Scheduling
- TensorFlow Training (TFJob)
Here you can execute training jobs, monitor the training and much more. One of the cool features is actually being able to easily define and take advantage of kubernetes replicas which allows you to spin multiple identical versions of a container image. Therefore, if one or more replicas fails during a training job your progress is not completely lost because you have another version running in parallel.
When it comes to serving models kubeflow offers great support.
Kubeflow has a component called KFServing that enables serverless inferencing on Kubernetes and provides performant, high abstraction interfaces for common machine learning (ML) frameworks like TensorFlow, XGBoost, scikit-learn, PyTorch, and ONNX to solve production model serving use cases.
KFServing can be used to do the following:
- Provide a Kubernetes Custom Resource Definition for serving ML models on arbitrary frameworks.
- Encapsulate the complexity of autoscaling, networking, health checking, and server configuration to bring cutting edge serving features like GPU autoscaling, scale to zero, and canary rollouts to your ML deployments.
- Enable a simple, pluggable, and complete story for your production ML inference server by providing prediction, pre-processing, post-processing and explainability out of the box.
Furthermore, besides KFserving, Kubeflow supports TensorFlow Serving containers to export trained TensorFlow models to Kubernetes. It is also integrated with Seldon Core, an open source platform for deploying machine learning models on Kubernetes, and NVIDIA Triton Inference Server for maximized GPU utilization when deploying ML/DL models at scale. Finally, it also supports BentoML, an open-source platform for high-performance ML model serving. It makes building production API endpoint for your ML model easy and supports all major machine learning training frameworks, including Tensorflow, Keras, PyTorch, XGBoost, scikit-learn and etc
But it doesn’t end there, on top of everything you can run Kubeflow on Kubernetes Engine and AWS, GCP or Azure. Let’s take AWS for example, Kubeflow has an integration with AWS Sagemaker that allows you to take full advantages of scale that come with such a managed service.
In my opinion I don’t think end-to-end ML platforms are the way to go. For more details you can later read this article where I explain this in detail, once you finish this one.
I believe microservices give you more flexibility to plug in any new service to your pipeline or replace a broken service/component or tool but such integrations as kubeflow and these different cloud providers can let you build more robust solutions.
Neptune is a metadata store for MLOps, built for research and production teams that run a lot of experiments.
It gives you a central place to log, store, display, organize, compare, and query all metadata generated during the machine learning lifecycle.
Thousands of ML engineers and researchers use Neptune for experiment tracking and model registry both as individuals and inside teams at large organizations.
Now, a question might arise: why a metadata store?
Why a metadata store?
Unlike notes, organization protocols or open-source tools a metadata store is as I mentioned before a centralized place but it is also lightweight, automatic, and maintained by the organization (in this case Neptune) or community so that people can focus on actually doing ML rather than metadata bookkeeping.
Furthermore, a metadata store is a tool that serves as a connector between different parts/phases/tools of the MLOps workflow.
Benefit of a metadata store
- Log and display all metadata types including Parameters, Images, HTML, Audio, Video
- Organize and compare experiments in a dashboard
- See model training live
- Have it (metadata store) maintained and backed up by someone (not you)
- Debug and compare experiments and models with no extra effort
- Both database and dashboard scale with thousands of experiments
- Help ease the transition from research to production
- Easy to build custom libs/tools on top of it
Now that we got that out of the way, let’s take a more detailed look at Neptune.
Neptune is made of 3 major components:
- Data versioning
- Experiment tracking
- Model registry
Version control systems help developers manage changes to source code. While data version control is a set of tools and processes that tries to adapt the version control process to the data world to manage the changes of models in relationship to datasets and vice-versa. In other words this feature helps track which dataset, or subset of the dataset, we used to train a particular version of the model and thus enabling and facilitating experiment reproducibility.
With the data versioning functionality in Neptune, you can:
- Keep track of a dataset version in your model training runs with artifacts
- Query the dataset version from previous runs to make sure you are training on the same dataset version
- Group your Neptune Runs by the dataset version they were trained on
This feature of Neptune helps you to organize your ML experimentation in a single place by:
- Logging and displaying metrics, parameters, images, and other ML metadata
- Searching, grouping, and comparing experiments with no extra effort
- Visualizing and debugging experiments live as they are running
- Sharing results by sending a persistent link
- Querying experiment metadata programmatically
This feature allows you to have your model development under control by organizing your models in a central model registry, making them repeatable and traceable.
Meaning you can version, store, organize, and query models during the model development till deployment. The metadata saved includes:
- Dataset, code, env config versions
- Parameters and evaluation metrics
- Model binaries, descriptions, and other details
- Testset prediction previews and model explanations
Furthermore, it also enables teams either geographically close or distant to collaborate on experiments because everything that your team logs to Neptune is automatically accessible to every team member. So reproducibility is no longer a problem.
You can access model training run information like the code, parameters, model binary, or other objects via an API.
With Neptune, you can replace folder structures, spreadsheets, and naming conventions with a single source of truth where all your model building metadata is organized, easy to find, share, and query.
This tool gives you control over models and experiments by keeping a record of everything that happens during model development.
This equals less time spent looking for configs and files, context switching, unproductive meetings and more time for quality ML work. With Neptune, you don’t have to implement loggers, maintain databases or dashboards, or teach people how to use them.
You can get the most out of your computational resources by keeping track of all ideas you have already tried and how much resources you used. Monitor your ML runs live and react quickly when runs fail, or models stop converging.
Finally, Neptune allows you to build reproducible, compliant, and traceable models by versioning all your model training runs, and also allows you to know who built the production model, which dataset and parameters were used, and how it performed at any time.
Now, just tell me which one and when to use it
If you want a MLOps platforms that is powered by the open-source community that allows you to:
- Track, visualize and compare experiment metadata
- UI that allows you to visualize and compare experiment results
- Develop (package and deploy) models
- A platform that allows you to create a multi-step workflow (much like Kubeflow pipelines but without using containers)
And a way to abstract the model thus allowing to easily deploy it into a variety of environments then MLflow is the way to go.
If you want a end-to-end open-source platform that allows you to:
- Manage and set resource quotas across different teams as well as to code, run and track experiment metadata either locally or in the cloud
- The ability to build reproducible pipelines with components that span the entire ML Lifecycle (from data gathering all the way to model building and deployment) then kubeflow is the way to go
- UI that allows you to visualize your pipeline and experiment metadata as well as compare experiment results.
- Built-in Notebook server service
Finally, your K8s environment might have limited resources but both K8s and kubeflow have an integration with AWS Sagemaker that enable the use of fully managed Sagemaker ML tools across the ML workflow natively from Kubernetes or Kubeflow which means you can take advantage of it’s capability to scale resources (i.e. GPU instances) and it’s services (i.e. Sagemaker Ground Truth, Model Monitor etc).
This eliminates the need for you to manually manage and optimize your Kubernetes-based ML infrastructure while still preserving control over orchestration and flexibility.
If you want centralized place:
- To store all your metadata (data versioning, experiment tracking and model registry)
- That has Intuitive and customizable UI that allows you to visualize and compare experiment results as well as arrange the displayed data as you wish
- Has a project wiki that facilitates sharing reports, insights, and remarks about the project’s progress, runs and data exploration Notebooks
- Notebook checkpointing (for Jupyter)
- That has easy and seamless integrations with most of best tools as well as MLOps platforms in the industry
- For example, Neptune has an integration with MLflow and many other libraries, tools and ML/DL Frameworks.
- If an integration is not available you can add it to your notebook, .py project or containerized ML project (in case you are using Kubernetes or Kubeflow) powered by your favorite libraries, tools and framework such as Pytorch using the python client.
Finally, if you want a fully managed service or if you want more control there is the server version, then Neptune is the way to go.
High-level feature comparison table
Free Plan limitations
Free for individuals, non-profit and educational research
Paid for teams
Easy to use
There is a learning curve
Managed service version
In the end, the choice is in your hands, it depends on your requirements and needs but I want you to know that this is not an either-or situation. These tools are not mutually exclusive from one another, you can mix and match them as per your requirements and wishes.
It could be Kubeflow with MLflow or Kubeflow with Neptune as well as MLflow with Neptune.
Let me elaborate, for example Kubeflow and MLflow or Kubeflow and Neptune, in these two cases Kubeflow might not have a direct integration but you can add MLflow or Neptune to the pipeline component (aka containerized app).
Now when it comes to MLflow and Neptune it is much easier because Neptune has an integration with MLflow.
Thus, you are not stuck using only one tool.
With that we have come full circle, below is a ton of references for you to check out and devour. Have fun!