While working on a machine learning project, getting good results from a single model-training run is one thing. But keeping all of your machine learning experiments well organized and having a process that lets you draw valid conclusions from them is quite another.
The answer to these needs is ML experiment tracking. In machine learning, experiment tracking is the process of saving all experiment-related information that you care about for every experiment you run.
ML teams implement experiment tracking in different ways, may it be by using spreadsheets, GitHub, or self-built platforms. Yet, the most effective option is to do it with tools designed specifically for tracking and managing ML experiments.
In this article, we overview and compare the 15 best tools that will allow you to track and manage your ML experiments. You’ll get to know their main features and see how they are different from each other. Hopefully, this will help you evaluate them and choose the right one for your needs.
How to evaluate an experiment tracking tool?
There’s no one answer to the question “what is the best experiment tracking tool?”. Your motivation and needs may be completely different when you work individually or in a team. And, depending on your role, you may be looking for various functionalities.
If you’re a Data Scientist or a Researcher, you should consider:
- If the tool comes with a web UI or it’s console-based;
- If you can integrate the tool with your preferred model training frameworks;
- What metadata you can log, display, and compare (code, text, audio, video, etc.);
- Can you easily compare multiple runs? If so, in what format – only table, or also charts;
- If organizing and searching through experiments is user-friendly;
- If you can customize metadata structure and dashboards;
- If the tool lets you track hardware consumption;
- How easy it is to collaborate with other team members – can you just share a link to the experiment or you have to use screenshots as a workaround?
As an ML Engineer, you should check if the tool lets you:
- Easily reproduce and re-run experiments;
- Track and search through experiment lineage (data/models/experiments used downstream);
- Save, fetch, and cache datasets for experiments;
- Integrate it with your CI/CD pipeline;
- Easily collaborate and share work with your colleagues.
Finally, as an ML team lead, you’ll be interested in:
- General business-related stuff like pricing model, security, and support;
- How much infrastructure the tool requires, how easy it is to integrate it into your current workflow;
- Is the product delivered as commercial software, open-source software, or a managed cloud service?
- What collaboration, sharing, and review feature it has.
I made sure to keep these motivations in mind when reviewing the tools that are on the market. So let’s take a closer look at them.
The best tools for ML experiment tracking and management
Before we dig into each tool, here’s a high-level comparison of features and integrations of the 15 best experiment tracking and management tools.
Note: This table was last updated on 20 December 2021. Some information may be outdated today. See some incorrect info? Let us know, and we’ll update it.
- Individual: Free
- Academia: Free
- Team: Paid
Free or paid, depending on the plan
- Open-source: free
- Hosted: paid
- DVC: free
- DVC Studio: free or paid, depending on the plan
Neptune is a metadata store for any MLOps workflow. It was built for both research and production teams that run a lot of experiments. It lets you monitor, visualize, and compare thousands of ML models in one place.
Neptune supports experiment tracking, model registry, and model monitoring and it’s designed in a way that enables easy collaboration.
Users can create projects within the app, work on them together, and share UI links with each other (or even with external stakeholders). All this functionality makes Neptune the link between all members of the ML team.
Neptune is available in the cloud version and can be deployed on-premise. It’s also integrated with 25+ other tools and libraries, including multiple model training and hyperparameter optimization tools.
- Possibility to log and display all metadata types including parameters, model weights, images, HTML, audio, video, etc.;
- Flexible metadata structure that allows you to organize training and production metadata the way you want to;
- Easy to navigate web UI that allows you to compare experiments and create customized dashboards.
Weight & Biases is a machine learning platform built for experiment tracking, dataset versioning, and model management. For the experiment tracking part, its main focus is to help Data Scientists track every part of the model training process, visualize models, and compare experiments.
W&B is also available in the cloud and as an on-premise tool. In terms of integrations, Weights & Biases support multiple other frameworks and libraries including Keras, PyTorch environment, TensorFlow, Fastai, Scikit-learn, and more.
- A user-friendly and interactive dashboard that’s a central place of all experiments in the app. It allows users to organize and visualize results of their model training process;
- Hyperparameter search and model optimization with W&B Sweeps;
- Diffing and deduplication of logged datasets.