Blog » ML Tools » 15 Best Tools for Tracking Machine Learning Experiments

15 Best Tools for Tracking Machine Learning Experiments

While working on a machine learning project, getting good results from a single model-training run is one thing, but keeping all of your machine learning experiments organized and having a process that lets you draw valid conclusions from them is quite another. That’s what machine learning experiment management helps with. 

In this article, I will explain why you, as data scientists and machine learning engineers, need a tool for tracking machine learning experiments and what is the best software you can use for that.

Tools for tracking machine learning experiments – who needs them and why?

  • Data Scientists: Currently many organizations are either including ML in their products as an added value or are AI-first companies. These organizations have to adhere to MLOps processes and tools such as experiment tracking to ensure better collaborations between individuals, teams, and the success of the ML project. Without them when a data scientist or data science team wants to come back to an idea, re-run a model from a couple of months ago, or simply compare and visualize the differences between runs, the need for a system or tool for tracking ML experiments becomes (painfully) apparent.
  • Machine Learning Engineers: Right after the data scientist or data science team finalizes model development and is ready to launch the said model, the MLEs are the ones who take all the research/dev code, model and turn it all into a production-ready version as well as deploy it. But in between this handoff process from DS’ to MLEs, there is more information about the model than just the weights that are needed to ensure a successfully deployed solution (one that is debuggable, maintainable, reproducible, and comparable). That’s when experiment metadata is very important – from the dataset, code, and model version to hyperparameters and other configurations used to train the model.
  • Managers/Business people: tracking software creates an opportunity to involve other team members like managers or business stakeholders in your machine learning projects. Thanks to the possibility to prepare visualizations, add comments and share the work, managers and co-workers can easily track the progress and cooperate with the machine learning team.

Here is an in-depth article about experiment tracking for those of you who want to learn more.


RELATED
Best Tools to Manage Machine Learning Projects


The best tools for tracking machine learning experiments (also deep learning experiments)

Here’s a comparison of the features and integrations of the 15 best experiment management tools.

This comparison table was last updated on 29/April/2020. Some information may be outdated today. See some incorrect info? -> tell us we’ll update it.

Neptune Weights & Biases Comet Sacred + Omniboard MLflow TensorBoard Guild AI PolyAxon TRAINS Valohai Pachyderm Kubeflow Verta.ai SageMaker Studio DVC
Overview
Focus Experiment Management Experiment Management Experiment Management Experiment Management Entire Lifecycle Experiment Management Experiment Management Experiment Management Experiment Management Entire Lifecycle Entire Lifecycle Run orchestration Entire Lifecycle Entire Lifecycle Data Versioning
Price
  • Free
  • Team Research: $0
  • Team: from $49 (Team trial available)
  • Enterprise: from $499
  • Personal
  • Startup: $35 per user
  • Team: $100 per user
  • Enterprise: NA
  • Free
  • Startup: $39 per user
  • Teams: $179 per user
  • Enterprise: NA
Free Free Free Free Enterprise: NA Free
  • Free
  • Pro: $649
  • Enterprise: NA
  • Community Edition
  • Enterprise Edition: NA
Free Enterprise: NA You pay extra on top of compute Free
Free plan limitation
  • Free: individual users
  • Team Research: no limits
  • Free: individual and open source users
  • Team Free: academic teams
  • individual users
  • no private links (sharing)
only 3 days of data available
Open – source limited limited
Easy integration
  • Somewhat difficult
  • Depends on the on backend
Lightweight Somewhat
Experiment Tracking Features
Data Versioning limited limited limited limited

main focus

Notebook Versioning
Model Versioning limited limited limited limited limited limited limited

main focus

Code Versioning

main focus

Environment Versioning limited limited limited limited limited limited
Resource Monitoring
Logging Images and Charts limited limited limited limited limited
Logging Audio limited limited limited limited limited limited
Logging Video limited limited limited limited limited limited limited
Logging Artifacts
Update Finished Experiment
UI/App Features
User Management
Customizable Dashboard limited limited limited limited limtied limited
Experiment Organization limited limited limited limited limited limited limited limited limited
Saving Views of Experiment Dashboard limited limited
View Sharing limited limited limited limited limited
Run Comparison limited limited limited
Run Grouping limited limited limited limited
Code Diffs
Notebook Diffs
Comments limited
Reports limited
Product Features
Fetching experiments via API
Can be deployed on-premise
Hosted version available

on top of Databricks platform

on GCP

Scales to millions of runs
Dedicated User Support
Integrations
R
TensorBoard
MLFlow
Sacred
Amazon SageMaker
Google Colab
Kubeflow
Keras
Tensorflow
Pytorch
Scikit-Learn
LightGBM
XGBoost
fastai
skorch
PyTorch Lightning
PyTorch Ignite
Catalyst
Optuna
Scikit-Optimize
RayTune
HiPlot

1. Neptune

Neptune is a metadata store for any MLOps workflow, built for both research and production teams that run a lot of experiments.

Individuals and organizations use Neptune for data versioning, experiment tracking, and model registry to have control over their experimentation and model development. These core features allow you to replace folder structures, spreadsheets, and naming conventions with a single source of truth where all metadata generated during the machine learning lifecycle is organized, easy to find, query, analyze as well as share with a team and managers. Additionally, Neptune is very flexible, works with many other frameworks, and thanks to its stable user interface, it enables great scalability (to millions of runs).

Main advantages:

  • Log and display all metadata types including parameters, model weights, images, HTML, audio, video, etc.
  • Tools for efficient team collaboration and project supervision
  • Jupyter notebook tracking included
  • Live model training
  • Both database and dashboard scale with thousands of runs  
Be more productive

2. Weights & Biases

Weight & Biases targets the most advanced deep learning teams. It allows them to record experiments and visualize every part of the research. Weight & Biases has been created to facilitate collaboration between data scientists and offers many useful features in this matter. All of it with well-designed user experience.

Main advantages:

  • Created for deep learning experiment tracking
  • Easy integration process
  • Customizable visualization and reporting tools

➡️ See the comparison between Weights & Biases and Neptune.

3. Comet

Similar to the previously described tools, Comet was built to enable tracking of machine learning projects. The team behind this software has a mission to help data scientists better organize and manage their experiments. Comet provides the possibility to easily compare experiments and keep a record of the collected data, as well as collaborate with other team members.

Main advantages: 

  • Quick and easy adaptation with any machine 
  • Works well with existing ML libraries
  • Safeguard IP

➡️ See the comparison between Comet and Neptune.

4. Sacred + Omniboard

“Every experiment is sacred…” as they say in the Sacred tool description. Sacred is open-source software and allows machine learning engineers to configure, organize, log and reproduce experiments. Sacred doesn’t come with its proper UI but there are a few dashboarding tools that you can connect to it, such as Omniboard, Sacredboard or Neptune. Also, it doesn’t have the scalability of previous tools and has not been adapted to team collaboration, however, it has great potential when it comes to individual research.

Main advantages: 

  • Open-source tool 
  • Extensive experiment parameters customization options
  • Easy integration

➡️ See the comparison between Sacred + Omniboard and Neptune.

➡️ See also: The Best Sacred + Omniboard Alternatives

5. MLflow

MLflow is an open-source platform that helps manage the whole machine learning lifecycle. This includes experimentation, but also reproducibility and deployment. Each of these three elements represented by one MLflow component: Tracking, Projects, and Models. That means a data scientist who works with MLflow is able to track an experiment, organize it, describe it for other ML engineers and pack it into a machine learning model. It’s been designed to enable scalability from one person to big organization, however, it works best for an individual user. 

Main advantages: 

  • Focus on the whole lifecycle of the machine learning process
  • Compatible with many additional tools and platforms
  • Open interface integrated with any ML library or language

➡️ See the comparison between MLflow and Neptune.

6. TensorBoard

TensorBoard is another experiment tracking tool. It’s open-source and offers a suite of tools for visualization and debugging of machine learning models. TensorBoard is the most popular solution on the market and thus it’s widely integrated with many other tools and applications. What’s more, it has an extensive network of engineers using this software and sharing their experience and ideas. This makes a powerful community ready to solve any problem. The software, itself however, is best suited for an individual user.

Main advantages: 

  • Large library of pre-built tracking tools
  • Integration with many other tools and applications
  • Well prepared problem-solving materials and community

➡️ See the comparison between TensorBoard and Neptune.

➡️ See also: The Best TensorBoard Alternatives (2020 Update).

7. Guild AI

The team behind Guild AI states that “The faster and more effective you can apply experiments, the sooner you’ll complete your work.” In order to make this process well organized they created this open-source experiment tracking software, which is best suited for individual projects. It’s lightweight and equipped with many useful features that make it easier to run, analyze, optimize and recreate machine learning experiments. What’s more, Guild AI includes a variety of analytics tools making the experiments comparison process much easier.

Main advantages: 

  • The automated machine learning process
  • Integrated with any language and library
  • Remote training and backup possibility

➡️ See the comparison between Guild AI and Neptune.

8. Polyaxon

Polyaxon is a platform that focuses on both, the whole life cycle management of machine learning projects as well as the facilitation of the ML team collaboration. It includes a wide range of features from tracking and optimization of experiments to model management and regulatory compliance. The main goal of its developers is to maximize the results and productivity while saving costs. It’s worth mentioning, however, that Polyaxon needs to be integrated into your infra/cloud before it’s ready to use.

Main advantages: 

  • Integrated with most popular deep learning frameworks and ML libraries
  • Designed to serve different groups of interests including data scientists, team leads and architects
  • Team collaboration possibilities

➡️ See the comparison between Polyaxon and Neptune.

9. Trains

Trains was built to track the “glorious but messy process of training production-grade deep learning models”, as stated by its creators. The main focus of the software is to help keep track of machine learning and deep learning experiments in an effortless, yet effective way. Trains is an open-source platform that is still in the beta stage, however, it is being constantly developed and upgraded. 

Main advantages: 

  • Quick and easy implementation process
  • Possibility to boost team collaboration
  • Useful features designed to track the experiment process and save data to one centralized server

10. Valohai

Valohai has been designed with data scientists in mind and its main benefit is that it makes the model building process faster. It does it with large-scale automation but needs to be integrated with your infrastructure/private cloud first. Valohai is compatible with any language or framework, as well as many different tools and apps. The software is also teamwork-oriented and has many features that facilitate it. 

Main advantages: 

  • Significant acceleration of the model building process
  • Helpful customer service and monthly checkup
  • Focused on the entire lifecycle of machine learning

11. Pachyderm

Pachyderm is a tool that makes it possible for its users to control an end-to-end machine learning cycle. From data lineage, through building and tracking experiments, to scalability options – with Pachyderm, it’s all covered. The software is available in three different versions, Community Edition (open-source, with ability to be used anywhere), Enterprise Edition (complete version-controlled platform) and Hub Edition (still a beta version, combining characteristics of the two previous versions). It needs to be integrated with your infrastructure/private cloud, thus, it’s not as lightweight as some of the other tools mentioned before.

Main advantages: 

  • Possibility to adapt the software version to your own needs
  • End-to-end process support 
  • Established and backed by a strong community of experts

➡️ See the comparison between Pachyderm and Neptune.

12. Kubeflow

Kubeflow is a software with the main goal of run orchestration and making deployments of machine learning workflows easier. It’s known as the machine learning toolkit for Kubernetes and aims to use the Kubernetes potential to facilitate the scaling of machine learning models. The team behind Kubeflow is constantly developing its features and does its best to make data scientists’ life easier. There are some tracking capabilities but it’s not the main focus of the project. It can be easily used with other tools on this list as a complementary tool.

Main advantages: 

  • Multi-framework integration
  • Perfect for Kubernetes users
  • Open-source character

➡️ See the comparison between Kubeflow and Neptune.

13. Verta.ai

Verta’s main features can be summarized in four words: track, collaborate, deploy and monitor. As one can see, the software has been created to facilitate the management of the entire machine learning lifecycle. And it’s equipped with the necessary tools to assist ML teams in every stage of the process. The variety of features, however, causes the platform to be more complex and thus, not as lightweight as other options we mention.

Main advantages: 

  • Compatibility with other ML frameworks
  • Assistance in the end-to-end machine learning process 
  • User-friendly design

14. SageMaker Studio

SageMaker Studio is an Amazon tool that allows data scientists to manage an entire machine learning lifecycle. From building and training to deploying ML models. The idea behind this software is to make it easier and less time-consuming to develop high-quality experiments. It’s a web-based tool and comes with the whole toolset designed to help data scientists improve their performance.

Main advantages: 

  • Possibility to track thousands of experiments 
  • Integration with a wide range of Amazon tools for ML related tasks
  • Fully managed

➡️ See the comparison between SageMaker Studio and Neptune.

15. DVC

The last project is an open-source version control system created specifically for machine learning projects. Its aim is to enable data scientists to share the ML models and make them reproducible. DVC user interface can cope with versioning and organization of big amounts of data and store them in a well-organized, accessible way. It focuses on data and pipeline versioning and management but has some (limited) experiment tracking functionalities. It can be easily used with other tools on this list as a complementary tool. 

Main advantages: 

  • Adaptable to any language and framework
  • Possibility to version large amount of data 
  • Open-source character

➡️ See the comparison between DVC and Neptune.

Final thoughts

Tracking machine learning experiments has always been an important element of the ML development process, however, in the past, it required a lot of effort from data scientists. The tracking tools were limited and thus the process was manual and time-consuming. 

For this reason, data scientists and engineers often neglected this part of the machine learning lifecycle or created home-grown solutions. It shouldn’t be the case anymore.

Over the last few years, tools for tracking machine learning experiments have matured a lot and are extremely accessible and easy to use. The apps and platforms we listed today are the best examples. Hopefully, every data scientist finds here the software that will make his or her life easier!


READ NEXT

Setting up a Scalable Research Workflow for Medical ML at AILS Labs [Case Study]

8 mins read | Ahmed Gad | Posted June 22, 2021

AILS Labs is a biomedical informatics research group on a mission to make humanity healthier. That mission is to build models which might someday save your heart from illness. It boils down to applying machine learning to predict cardiovascular disease development based on clinical, imaging, and genetics data.

Four full-time and over five part-time team members. Bioinformaticians, physicians, computer scientists, many on track to get PhDs. Serious business.

Although business is probably the wrong term to use because user-facing applications are not on the roadmap yet, research is the primary focus. Research so intense that it required a custom infrastructure (which took about a year to build) to extract features from different types of data:

  • Electronic health records (EHR),
  • Diagnosis and treatment information (time-to-event regression methods),
  • Images (convolutional neural networks),
  • Structured data and ECG.

With a fusion of these features, precise machine learning models can solve complex issues. In this case, it’s risk stratification for primary cardiovascular prevention. Essentially, it’s about predicting which patients are most likely to get cardiovascular disease

AILS Labs has a thorough research process. For every objective, there are seven stages:

  1. Define the task to be solved (e.g., build a risk model of cardiovascular disease).
  2. Define the task objective (e.g., define expected experiment results).
  3. Prepare the dataset.
  4. Work on the dataset in interactive mode with Jupyter notebooks; quick experimenting, figuring out the best features for both the task and the dataset, coding in R or Python. 
  5. Once the project scales up, use a workflow management system like Snakemake or Prefect to transform the work into a manageable pipeline and make it reproducible. Without that, it would be costly to reproduce the workflow or compare different models.
  6. Create machine learning models using Pytorch Lightning integrated with Neptune, where some initial evaluations are applied. Log experiment data.
  7. Finally, evaluate model performance and inspect the effect of using different sets of features and hyperparameters.

5 problems of scaling up Machine Learning research

AILS Labs started as a small group of developers and researchers. One person wrote code, and another reviewed it. Not a lot of experimenting. But collaboration became more challenging, and new problems started to appear along with the inflow of new team members:

  1. Data privacy,
  2. Workflow standardization,
  3. Feature and model selection,
  4. Experiment management,
  5. Information logging.
Continue reading ->
Experiment tracking in project management

How to Fit Experiment Tracking Tools Into Your Project Management Setup

Read more
ML_experiment_tracking

ML Experiment Tracking: What It Is, Why It Matters, and How to Implement It

Read more

The Best MLOps Tools You Need to Know as a Data Scientist

Read more
GreenSteam MLOps toolstack

MLOps at GreenSteam: Shipping Machine Learning [Case Study]

Read more