While working on a machine learning project, getting good results from a single model-training run is one thing. But keeping all of your machine learning experiments well organized and having a process that lets you draw valid conclusions from them is quite another.
The answer to these needs is experiment tracking. In machine learning, experiment tracking is the process of saving all experiment-related information that you care about for every experiment you run.
ML teams implement experiment tracking in different ways, may it be by using spreadsheets, GitHub, or self-built platforms. Yet, the most effective option is to do it with tools designed specifically for tracking and managing ML experiments.
In this article, we overview and compare the 15 best tools that will allow you to track and manage your ML experiments. Youâll get to know their main features and see how they are different from each other. Hopefully, this will help you evaluate them and choose the right one for your needs.
How to evaluate an experiment tracking tool?
Thereâs no one answer to the question âwhat is the best experiment tracking tool?â. Your motivation and needs may be completely different when you work individually or in a team. And, depending on your role, you may be looking for various functionalities.
If youâre a Data Scientist or a Researcher, you should consider:
- If the tool comes with a web UI or itâs console-based;
- If you can integrate the tool with your preferred model training frameworks;
- What metadata you can log, display, and compare (code, text, audio, video, etc.);
- Can you easily compare multiple runs? If so, in what format – only table, or also charts;
- If organizing and searching through experiments is user-friendly;
- If you can customize metadata structure and dashboards;
- If the tool lets you track hardware consumption;
- How easy it is to collaborate with other team members – can you just share a link to the experiment or you have to use screenshots as a workaround?
As an ML Engineer, you should check if the tool lets you:
- Easily reproduce and re-run experiments;
- Track and search through experiment lineage (data/models/experiments used downstream);
- Save, fetch, and cache datasets for experiments;
- Integrate it with your CI/CD pipeline;
- Easily collaborate and share work with your colleagues.
Finally, as an ML team lead, youâll be interested in:
- General business-related stuff like pricing model, security, and support;
- How much infrastructure the tool requires, how easy it is to integrate it into your current workflow;
- Is the product delivered as commercial software, open-source software, or a managed cloud service?
- What collaboration, sharing, and review feature it has.
I made sure to keep these motivations in mind when reviewing the tools that are on the market. So letâs take a closer look at them.
The best tools for ML experiment tracking and management
Before we dig into each tool, hereâs a high-level comparison of features and integrations of the 15 best experiment tracking and management tools.
Note: This table was last updated on 20 December 2021. Some information may be outdated today. See some incorrect info? Let us know, and weâll update it.
Focus
|
Metadata Storage, Experiment Tracking, Model Registry
|
Experiment Management
|
Experiment Management
|
Experiment Management
|
Entire Lifecycle
|
Experiment Management
|
Experiment Management
|
Experiment Management
|
Experiment Management
|
Entire Lifecycle
|
Entire Lifecycle
|
Run Orchestration
|
Entire Lifecycle
|
Entire Lifecycle
|
Data Versioning
|
Price
|
|
|
|
Free
|
Free
|
Free
|
Free
|
Free or paid, depending on the plan |
NA
|
Free
|
|
|
|||
Standalone component or a part of a broader ML platform?
|
Standalone component. ML metadata store that focuses on experiment tracking and model registry
|
Standalone component
|
Stand-alone tool with community, self-serve and managed deployment options
|
Omniboard is a web dashboard for the Sacred machine learning experiment management tool
|
Open-source platform which offers four separate components for experiment tracking, code packaging, model deploymnet, and model registry
|
Open source tool which is a part of the TensorFlow ecosystem
|
Stand-alone open-source platform
|
Standalone tool
|
Standalone open-source platform
|
Standalone component
|
Standalone tool
|
Part of the Kubernetes environment
|
Standalone component
|
Part of the AWS SageMaker ecosystem
|
Standalone component
|
Commercial software, open-source software, or a managed cloud service?
|
Managed cloud service
|
Managed cloud service
|
Managed cloud service
|
Open-source
|
The standalone product is open-source, while the Databricks managed version is commercial
|
TensorBoarsd is open-source, while TensorBoard.dev is available as a free managed cloud service
|
Open-source
|
Offers both an open-source community plan, and various managed cloud options
|
Available both as an open-source platform, and a managed cloud service
|
Managed Cloud Service
|
Base package is open-source, with an enterprise grade commercial offering available
|
The base product is open source, with managed distributions made available by cloud providers
|
Available both as an open-source platform, and a managed cloud service
|
Managed Cloud Service
|
Open-source and managed cloud service
|
Hosted version or deployed on-premise?
|
Tracking is hosted on a managed server, and can also be deployed on-premises and in a public/private cloud server
|
Yes
|
Yes, you can deploy Comet on any cloud environment or on-premise
|
Can be deployed both on-premises and/or on the cloud, but has to be self-managed
|
Tracking is hosted on a local/remote server (on-prem or cloud). Is also available on a managed server as part of the Databricks platform
|
TensorBoard is hosted locally. TensorBoard.dev is available on a managed server as a free service
|
Can be deployed both on-premises and/or on the cloud, but has to be self-managed
|
Can be hosted both on-prem and on the cloud
|
Available both on-prem and on cloud
|
Available both on-prem and on cloud
|
Can be hosted both on-prem and on the cloud
|
Almost all popular cloud providers maintain their own distribution of Kubeflow. It can also be installed on-premises manually. Read about the different installation options available here
|
Available both on-prem and on cloud
|
AWS Sagemaker is available only as a fully managed cloud service
|
Tracking is hosted on a managed server, and can also be deployed on-premises and in a public/private cloud server
|
How much do you have to change in your training process?
|
Minimal. Just a few lines of code needed for tracking
|
Minimal. Just a few lines of code needed for tracking
|
Minimal. Few lines of code needed for tracking
|
Minimal. Only a few lines of code need to be added
|
Minimal. Few lines of code needed for tracking
|
Minimal if already using the TensorFlow framework, else significant
|
No code change required for basic tracking
|
Extensive code and infrastructure changes required
|
Minimal. Just a few lines of code needed for tracking.
|
No code change required. Some additional workflow steps are added though
|
Extensive code and infrastructure changes required
|
Extensive code and infrastructure changes required
|
Minimal. Few lines of code needed for tracking
|
AWS Application Migration Service lets you lift-and-shift your code to AWS without any change required. Minimal change required to migrate Jupyter notebooks from local to Sagemaker Studio
|
Minimal. Just a few lines of code needed for tracking
|
Web UI or console-based?
|
Web UI
|
Web UI
|
Web UI
|
Web UI
|
Web UI
|
Web UI
|
Both CLI and Web UI
|
Both Web UI and CLI
|
Web UI
|
Both Web UI and CLI
|
Both Web UI and CLI
|
Web UI
|
Web UI
|
Both web UI and CLI
|
Both web and console UI
|
Log and display of metadata
|
|||||||||||||||
– Dataset
|
Limited |
Limited
|
Limited
|
Limited
|
Limited
|
Limited
|
|
Limited
|
Limited
|
Limited
|
Limited
|
Limited
|
|
Limited
|
Limited
|
– Code Versions
|
|
|
|
Limited
|
Limited
|
|
Limited
|
|
Limited
|
Limited
|
|
Limited
|
|
|
|
– Parameters
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
– Metrics and losses
|
|
|
|
Limited
|
|
Limited
|
Limited
|
Limited
|
Limited
|
|
|
Limited
|
|
|
Limited
|
– Images
|
|
Limited
|
N/A
|
|
|
|
|
Limited
|
|
|
|
N/A
|
|
|
|
– Audio
|
|
|
|
|
|
|
|
|
|
|
N/A
|
|
|
|
|
– Video
|
|
|
|
|
|
|
|
|
|
|
N/A
|
|
|
|
|
– Hardware consumption
|
Limited
|
Limited
|
Limited
|
Limited
|
|
|
|
Limited
|
Limited
|
|
|
|
|
Limited
|
|
Comparing experiment
|
|||||||||||||||
– Table format diff
|
|
|
|
|
|
|
|
|
|
Limited
|
|
|
|
|
|
– Overlayed learning curves
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
– Code
|
Limited
|
Limited
|
|
Limited
|
|
|
Limited
|
|
|
|
|
|
|
|
|
Organizing and searching experiments and metadata
|
|||||||||||||||
– Experiment table customization
|
|
Limited
|
Limited
|
Limited
|
|
|
|
|
Limited
|
|
|
Limited
|
|
|
|
– Custom dashboards
|
Limited
|
|
Limited
|
|
|
|
|
Limited
|
|
|
|
Limited
|
Limited
|
Limited
|
|
– Nested metadata structure support in the UI
|
|
|
|
|
|
|
|
|
|
|
|
Limited
|
|
|
|
Reproducibility and traceability
|
|||||||||||||||
– One-command experiment re-run
|
|
|
|
|
|
|
Limited
|
|
|
|
N/A
|
N/A
|
|
|
|
– Experiment lineage
|
|
|
Limited
|
|
|
|
|
Limited
|
|
|
|
Limited
|
|
|
|
– Environment versioning
|
|
|
|
|
|
|
|
|
|
|
N/A
|
N/A
|
|
|
|
– Saving/fetching/caching datasets for experiments
|
|
|
|
|
|
|
Limited
|
|
|
|
N/A
|
N/A
|
|
|
|
Collaboration and knowledge sharing
|
|||||||||||||||
– User groups and ACL
|
Only for Teams and Enterprise customers
|
Only for paid plans
|
Only for Teams Pro and Enterprise
|
|
Only in the managed version
|
|
|
Only available in the commercial version
|
Only for enterprise customers
|
Only for enterprise customers
|
Only in the enterprise edition
|
|
Only for enterprise customers
|
|
Only for Teams and Enterprise customers
|
– Sharing UI links with project members
|
|
|
|
|
|
Only in TensorBoard.dev
|
Limited
|
|
|
|
N/A
|
N/A
|
|
|
|
– Sharing UI links with external people
|
|
|
|
|
|
Only in TensorBoard.dev
|
Limited
|
|
|
|
N/A
|
N/A
|
|
|
|
– Commenting
|
|
|
|
|
|
|
|
|
|
|
|
N/A
|
Limited
|
|
|
Integrations
|
|||||||||||||||
R
|
Limited
|
|
|
|
|
Limited
|
|
|
|
|
|
|
|
|
|
TensorBoard
|
Limited
|
|
|
|
|
N/A
|
|
|
|
|
|
|
|
|
|
MLFlow
|
Limited
|
|
|
|
N/A
|
|
|
|
|
|
|
|
|
|
|
Sacred
|
|
|
|
N/A
|
|
|
|
|
|
|
|
|
|
|
|
Amazon SageMaker
|
|
|
|
|
|
|
|
|
|
|
|
|
|
N/A
|
|
Google Colab
|
|
|
|
|
Limited
|
|
|
|
|
|
|
|
|
|
|
Kubeflow
|
|
|
|
|
|
|
|
|
|
|
|
N/A
|
|
|
|
Keras
|
|
|
|
|
|
|
Limited
|
|
|
|
|
|
|
|
|
Tensorflow
|
|
|
|
|
|
|
Limited
|
|
|
|
|
|
|
|
|
Pytorch
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Scikit-Learn
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
LightGBM
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
XGBoost
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
fastai
|
|
|
No native integrations, only examples available
|
|
|
|
|
|
|
|
|
|
|
|
|
skorch
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
PyTorch Lightning
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
PyTorch Ignitet
|
Limited
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Catalyst
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Optuna
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Scikit-Optimize
|
Limited
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
RayTune
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Huggingface
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Prophet
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Amazon’s own Prophet distribution
|
|
1. Neptune
Neptune is a metadata store for any MLOps workflow. It was built for both research and production teams that run a lot of experiments. It lets you monitor, visualize, and compare thousands of ML models in one place.
Neptune supports experiment tracking, model registry, and model monitoring and itâs designed in a way that enables easy collaboration.
Users can create projects within the app, work on them together, and share UI links with each other (or even with external stakeholders). All this functionality makes Neptune the link between all members of the ML team.
Neptune is available in the cloud version and can be deployed on-premise. Itâs also integrated with 25+ other tools and libraries, including multiple model training and hyperparameter optimization tools.
Main advantages:
- Possibility to log and display all metadata types including parameters, model weights, images, HTML, audio, video, etc.;
- Flexible metadata structure that allows you to organize training and production metadata the way you want to;
- Easy to navigate web UI that allows you to compare experiments and create customized dashboards.
2. Weights & Biases
Weight & Biases is a machine learning platform built for experiment tracking, dataset versioning, and model management. For the experiment tracking part, its main focus is to help Data Scientists track every part of the model training process, visualize models, and compare experiments.
W&B is also available in the cloud and as an on-premise tool. In terms of integrations, Weights & Biases support multiple other frameworks and libraries including Keras, PyTorch environment, TensorFlow, Fastai, Scikit-learn, and more.
Main advantages:
- A user-friendly and interactive dashboard thatâs a central place of all experiments in the app. It allows users to organize and visualize results of their model training process;
- Hyperparameter search and model optimization with W&B Sweeps;
- Diffing and deduplication of logged datasets.
Recommended for you
3. Comet
Comet is an ML platform that helps data scientists track, compare, explain and optimize experiments and models across the modelâs entire lifecycle, i.e. from training to production. In terms of experiment tracking, data scientists can register datasets, code changes, experimentation history, and models.
Comet is available for teams, individuals, academics, organizations, and anyone who wants to easily visualize experiments, facilitate work and run experiments. It can be used as a hosted platform or deployed on-premise.
Main advantages:
- Fully-customizable experiment table within the web-based user interface;
- Extensive comparison featuresâcode, hyperparameters, metrics, predictions, dependencies, system metrics, and more;
- Dedicated modules for vision, audio, text, and tabular data that allow for easy identification of issues with the dataset.
You may like
ď¸ Comet vs Neptune
4. Sacred + Omniboard
Sacred is open-source software that allows machine learning researchers to configure, organize, log, and reproduce experiments. Sacred doesnât come with its proper UI but there are a few dashboarding tools that you can connect to it, such as Omniboard (but you can also use others, such as Sacredboard, or Neptune via integration).
Sacred doesnât have the scalability of the previous tools and has not been adapted to team collaboration (unless integrated with another tool), however, it has great potential when it comes to individual research.
Main advantages:
- Possibility to connect it to the preferred UI;
- Possibility to track any model training developed with any Python library;
- Extensive experiment parameters customization options.
See also
5. MLflow
MLflow is an open-source platform that helps manage the whole machine learning lifecycle. This includes experimentation, but also model storage, reproducibility, and deployment. Each of these four elements is represented by one MLflow component: Tracking, Model Registry, Projects, and Models.
The MLflow Tracking component consists of an API and UI that support logging various metadata (including parameters, code versions, metrics, and output files) and later visualizing the results.
Main advantages:
- Focus on the whole lifecycle of the machine learning process;
- Strong and big community of users that provide community support;
- Open interface that can be integrated with any ML library or language.
Learn more
6. TensorBoard
TensorBoard is the visualization toolkit for TensorFlow, so itâs often the first choice of TensorFlow users. TensorBoard offers a suite of features for the visualization and debugging of machine learning models. Users can track experiment metrics like loss and accuracy, visualize the model graph, project embeddings to a lower-dimensional space, and much more.
Thereâs also TensorBoard.dev that lets you upload and share your ML experiment results with anyone (collaboration features are missing in TensorBoard). TensorBoard is open-sourced and hosted locally, while TensorBoard.dev is available on a managed server as a free service.
Main advantages:
- Well-developed features related to working with images, e.g. TensorBoardâs Projector that allows you to visualize any vector representation like word embeddings and images;
- The What-If Tool (WIT), thatâs an easy-to-use interface for expanding understanding of black-box classification and regression ML models
- Strong and big community of users that provide community support.
May be useful
7. Guild AI
Guild AI is an experiment tracking system for machine learning, available under the Apache 2.0 open source license. Itâs equipped with features that allow you to run analysis, visualization, and diffing, automate pipelines, tune hyperparameters with AutoML, do scheduling, parallel processing, and remote training.
Guild AI also comes with multiple integrated tools for comparing experiments, such as:
- Guild Compare – a curses-based application that lets you view runs in a spreadsheet format including flags and scalar values,
- Guild View – a web-based application that lets you view runs and compare results,
- Guild Diff – a command that lets you compares two runs.
Main advantages:
- No need to change the code, it runs scripts written in any language or framework;
- Doesnât require additional software or systems like databases or containers
- Strong and big community of users that provide community support.
Check also
8. Polyaxon
Polyaxon is a platform for reproducible and scalable machine learning and deep learning applications. It includes a wide range of features from tracking and optimization of experiments to model management, run orchestration, and regulatory compliance. The main goal of its developers is to maximize the results and productivity while saving costs.
In terms of experiment tracking, Polyaxon allows you to automatically record key model metrics, hyperparameters, visualizations, artifacts, and resources, as well as version control code and data. To later display the logged metadata, you can use Polyaxon UI or integrate it with another board, e.g. TensorBoard.
Polyaxon can be deployed on-premise or on a cloud provider of your choice. It also supports major ML and DL libraries, such as TensorFlow, Keras, or Scikit-learn.
Main advantages:
- Polyaxon UI thatâs represented by the Runs Dashboard with visualization capabilities, collaboration features, and extendable interface;
- Collaboration features and project management tools;
- Scalable solution – offers different plans from open source to cloud and enterprise.
May interest you
9. ClearML
ClearML is an open-source platform, a suite of tools to streamline your ML workflow, supported by the team behind Allegro AI. The suite includes model training logging and tracking, ML pipelines management and data processing, data management, orchestration, and deployment. All these features are reflected in 5 ClearML modules:
- ClearML Python Package for integrating ClearML into your existing code-base;
- ClearML Server storing experiment, model, and workflow data, and supporting the Web UI experiment manager;
- ClearML Agent which is the ML-Ops orchestration agent, enabling experiment and workflow reproducibility, and scalability;
- ClearML Data that provides data management and versioning on top of file-systems/object-storage;
- ClearML Session that allows you to launch remote instances of Jupyter Notebooks and VSCode.
ClearML is integrated with many frameworks and libraries, including model training, hyperparameter optimization, and plotting tools, as well as storage solutions.
Main advantages:
- ClearML Web UI that lets you track and visualize experiments;
- An option to work with tasks in Offline Mode, in which all information is saved in a local folder;
- Multiple users collaboration enabled by the ClearML Server.
10. Valohai
Valohai is an MLOps platform that automates everything from data extraction to model deployment. The team behind this tool says that Valohai âoffers Kubeflow-like machine orchestration and MLflow-like experiment tracking without any setupâ. Although experiment tracking is not the main focus of this platform, it provides some functionality such as experiments comparison, version control, model lineage, and traceability.
Valohai is compatible with any language or framework, as well as many different tools and apps. It can be set up on any cloud vendor or in an on-premise setup. The software is also teamwork-oriented and has many features that facilitate it.
Main advantages:
- Significant acceleration of the model building process;
- Focused on the entire lifecycle of machine learning;
- Since itâs a platform built mainly for enterprises, privacy and security are their driving principles.
11. Pachyderm
Pachyderm is an enterprise-grade, open-source data science platform that makes it possible for its users to control an end-to-end machine learning cycle. From data lineage, through building and tracking experiments, to scalability options.
The software is available in three different versions:
- Community – free and source-available version of Pachyderm built and backed by a community of experts;
- Enterprise Edition – a complete version-controlled platform that can be deployed on the Kubernetes infrastructure of usersâ choice;
- Hub Edition – Hosted and managed version of Pachyderm.
Main advantages:
- Possibility to adapt the software version to your own needs;
- End-to-end process support;
- Established and backed by a strong community of experts.
See also
12. Kubeflow
Kubeflow is the machine learning toolkit for Kubernetes. Its goal is to use the Kubernetes potential to facilitate the scaling of machine learning models. The platform has some tracking capabilities but itâs not the main focus of the project. It consists of a few components, including:
- Kubeflow Pipelines – a platform for building and deploying portable, scalable machine learning (ML) workflows based on Docker containers. Itâs probably the most commonly used functionality of Kubeflow;
- Central Dashboard – the central user interface (UI) in Kubeflow;
- Notebook Servers – a service for creating and managing interactive Jupyter notebook;
- KFServing – Kubeflow model deployment and serving toolkit
- Training Operators – for the training of ML models in Kubeflow through operators (e.g. PyTorch, TensorFlow).
Main advantages:
- A user interface (UI) for managing and tracking experiments, jobs, and runs
- An end-to-end open-source platform;
- Built-in Notebook server service.
Read more
13. Verta.ai
Verta is an enterprise MLOps platform. Its main features can be summarized in four words: track, collaborate, deploy and monitor. These functionalities are reflected in Vertaâs main products: Experiment Management, Model Registry, Model Deployment, and Model Monitoring. The software has been created to facilitate the management of the entire machine learning lifecycle.
The Experiment Management component allows you to track and visualize ML experiments, log various metadata, search through and compare experiments, ensure model reproducibility, collaborate on ML projects within a team, and much more.
Verta supports many popular ML frameworks including TensorFlow, PyTorch, XGBoost, ONNX, and more. Itâs available as an open-source service, SaaS and Enterprise.
Main advantages:
- Possibility to built customizable dashboards and visualize the modeling results;
- Collaboration features and user management;
- Scalable solution that covers multiple steps of the MLOps pipeline.
14. SageMaker Studio
SageMaker Studio is part of the AWS platform. It allows data scientists and developers to prepare, build, train, and deploy high-quality machine learning (ML) models. It claims to be the first integrated development environment (IDE) for ML. It has four components: prepare, build, train & tune, deploy & manage. The experiment tracking functionality is covered by the third one, train & tune. Users can log, organize and compare experiments, automate hyperparameter tuning, and debug training runs.
Main advantages:
- Built-in debugger and a profiler that lets you identify and reduce training errors and performance bottlenecks;
- Possibility to track thousands of experiments;
- Integration with a wide range of Amazon tools for ML related tasks.
You may also like
15. DVC Studio
DVC Studio is part of the DVC group of tools powered by iterative.ai. Originally, DVC was an open-source version control system created specifically for machine learning projects. This component still exists, with the aim to enable data scientists to share the ML models and make them reproducible. However, now, thereâs also the DVC studio, a visual interface for ML projects, that was created to help users track experiments, visualize them and collaborate on them with the team.
The DVC Studio application can be accessed online or hosted on-premises.
Main advantages:
- DVC Studio is a visual interface that can be connected to GitHub, GitLab, or Bitbucket;
- It extracts metadata (model metrics and hyperparameters) from JSON files and presents them in a nice UI;
- Applies existing software engineering stack for ML teams.
Learn more
ď¸ DVC vs Neptune
Final thoughts
Tracking machine learning experiments has always been an important element of the ML development process, however, earlier the process was very manual, time-consuming, and error-prone.
Over the last few years, the market of modern experiment tracking and experiment management tools for machine learning has grown and matured. The range of available options is broad and diversified now. No matter if youâre looking for an open-source or enterprise solution, if you prefer a standalone experiment tracking framework or an end-to-end platform – youâll certainly find the right tool.