MLOps Blog

13 Best Tools for ML Experiment Tracking and Management in 2024

13 min
22nd April, 2024

While working on a machine learning project, getting good results from a single model-training run is one thing. But keeping your machine learning experiments well organized and having a process that lets you draw valid conclusions is quite another.

The answer to these needs is ML experiment tracking. In machine learning, experiment tracking is the process of saving all experiment-related information that you care about for every experiment you run.

ML teams implement experiment tracking in different ways, for example, through spreadsheets, GitHub, or self-built platforms. Yet, the most effective option is using tools designed specifically for tracking and managing ML experiments.

This article reviews and compares the 13 best tools for tracking and managing your ML experiments. You’ll learn their main features and see how they differ. Before diving into the individual tools, we’ll also discuss what to consider when evaluating ML experiment trackers and how to choose the right one for your team.

What should you expect from ML experiment tracking tools?

Machine-learning experiment tracking is the process of saving experiment-related information for every experiment you run. This allows you to analyze experiments, compare models, and ensure reproducible training.

Information you might want to log using an ML experiment tracker includes:

  • Anything that you need to replicate an experiment run: Training and data preparation scripts, environment configuration files, the data used for training and evaluation, model and training parameter configurations, as well as any code or notebooks used for evaluation.
  • Anything that you need to use the experiment’s outcome in production: Model artifacts, such as model weights, definitions, or binaries, as well as preprocessing components, such as fitted tokenizers or feature scalers.
  • Analysis results that enable you to assess and compare experiments: Values for evaluation metrics, plots of training progress and other performance visualizations, example predictions, and data about hardware consumption.

This is a lot of different information in a broad range of formats. ML experiment tracking tools aim to make collecting, cataloging, searching, and retrieving this data easy and convenient.

An experiment-tracking tool consists of three main components:

  • A way to store and catalog the metadata and artifacts. Typically, a database and an artifact registry. Some tools rely on an external solution to store large files, just keeping track of a link.
  • A client library you can integrate into your model training and evaluation scripts to log metrics and upload files to the experiment tracker.
  • A user interface to view the data, often including dashboards, means to find past experiment runs, and ways to collaborate with team members. Many experiment trackers also provide an API to fetch data programmatically, e.g., to re-run an experiment on new data. Some ship with a CLI tool.

ML experiment tracking tools available in the market differ widely – just as the needs of machine-learning teams looking to adopt one of them. Thus, before we turn our attention to specific tools, we’ll discuss what you should consider when choosing an ML experiment tracker.

How should you evaluate ML experiment-tracking tools?

There is no one correct answer to the question, “What is the best experiment-tracking tool?”

If you’re a solo data scientist without a budget, an open-source solution you can host on your laptop might be a good fit. If you’re an ML researcher exploring novel models and algorithms, you’ll require the flexibility to integrate with exotic frameworks and track custom metrics. And if you’re managing a department at a Fortune 500, cross-team collaboration and compliance will be first of mind.

Therefore, before you look into individual tools, it’s paramount that you understand your essential requirements.

An experiment tracker has to fit your team’s workflow

Data science teams come in all shapes and sizes.

Some teams are business-oriented, working on analyses and forecasts and producing reports and results. Members often have a business, economics, or statistics background but no software engineering experience. They might be comfortable using Jupyter or R notebooks for data wrangling and model training but prefer a rich UI for most other tasks.

Other teams are staffed with computer scientists who transitioned into ML engineering. Typically, this group likes to interact with tools through CLIs and configure them through code, using UIs only for monitoring purposes. Often, their work’s outcomes are distributed workflows for automatically training models and deploying them as part of a larger application.

Even from these brief and stereotypical descriptions, it’s clear that the teams’ needs differ widely. Some questions to consider:

  • Interfaces: Does your team want to interact through a UI? Are your team members comfortable using a CLI? Do you want to retrieve and analyze the collected data through code?
  • Customizability: Are the built-in views in the UI sufficient? Does the tool allow you to build the custom dashboards you need now or in the future?
  • Integrations: Does the experiment tracker mesh well with the databases and training environments? If you’re organizing your training and experiments through Git and a CI/CD solution, does the tool support this type of workflow?
  • Support for projects and experiments: How easy is it to keep track of a large number of experiments or experiment runs? Can you easily re-run an experiment? How much effort is it to create a new project or experiment?
  • Access to metadata and artifacts: Does the experiment tracker enable you to find the data you’re looking for? Can you retrieve metadata, models, or analysis results in a way that suits your needs?

An experiment tracker has to work with your machine-learning frameworks and models

Most experiment-tracking tools boast compatibility with the most relevant machine-learning and deep-learning libraries. However, since virtually all experiment trackers allow users to log arbitrary data points, that’s an easy claim to make.

You should investigate whether a tool ships with access to ready-made callbacks or integrations for your ML framework of choice. Some frameworks, such as HuggingFace’s transformers, include third-party integrations themselves.

Aside from the framework, you should also determine what types of models you’ll train and experiment with. A tool that excels for computer vision models might have little to offer for NLP tasks, and time-series data requires a different set of metrics and visualizations than graph learning.

Here are some questions to discuss with your team:

  • Frameworks and libraries: What frameworks are you using to create and train your models? Do you have any plans to adopt additional frameworks soon?
  • Types of models and tasks: What algorithms and model architectures do you work with? Do you have a predominant machine-learning task, such as classification or text generation? What does a typical model output look like?
  • Metadata to log: What metrics do you log? What plots do you typically look at? Do you need to log and analyze images, text, or tables? What about binary, CSV, or JSON files?
  • Resource monitoring: Is tracking hardware utilization and resource consumption a concern for you? Do you monitor CPU or GPU load and memory usage during training

An experiment tracker has to meet your collaboration needs

Few machine-learning practitioners work on their own. Most of us are part of a team where we regularly discuss our experiments, develop new ideas to try, or review what our predecessors have tried in the past.

Teams sharing an office can do that on a whiteboard or sit together in front of the same screen. However, many teams work remotely, have collaborators on the other side of the planet, or cooperate with departments in different locations.

Experiment trackers can enable and facilitate efficient collaboration and communication. Consider the following questions to understand your team’s requirements:

  • Shared project workspaces: Do several people contribute to the same experiment? Would you like to add comments to artifacts or use an internal chat functionality?
  • User and access management: Should everyone see all projects or experiments? Should everyone be allowed to add runs, modify dashboards, or create analyses? How often do you need to onboard and offboard team members or collaborators?
  • Sharing results across teams: Do you need to set up teams and assign them read access to certain projects of other teams? Would you like to have shared dashboards or experiment catalogs?
  • Sharing results with third parties: Can you easily generate links and configure access to share individual run analyses or dashboards with people outside your organization?

An experiment tracker has to meet requirements imposed by the business

So far, we have discussed possible features an experiment tracking tool or platform should offer from the perspective of ML practitioners and their managers.

However, there are also more general questions you need to consider before looking for a software platform or tool:

  • Open-source or proprietary software: Many of today’s machine-learning libraries and MLOps tools are open-source, and many data scientists freely share their knowledge. Naturally, this leads teams to gravitate to open-source tools.

    At the same time, many of us have grown accustomed to working on cloud platforms that primarily consist of proprietary services – and offer SLAs and support. (We’ll return to this and other potential benefits below.)

    When it comes to ML experiment trackers, the main advantage of open-source software is the ability to audit and adapt the code to your needs. When looking for an experiment tracking tool, consider how likely you’ll want to change a software’s code rather than use a plugin interface that many proprietary tools provide.
  • Self-hosting or managed platform: Independent of their license and source availability, many experiment trackers are available for self-hosting as well as managed platforms.

    Self-hosting typically gives you complete control, and you can keep your data and models within your organization’s boundaries. A managed platform, however, gives you peace of mind regarding setup, maintenance, and updates. Typically, operational costs are much more transparent and predictable. Some vendors offer hybrid models where they host their software within your cloud or on-premises infrastructure.

    Many data science teams underestimate the ongoing maintenance effort and overestimate their DevOps skills. So ask yourself: How much time and money can we spend? Could we task a dedicated infrastructure team to host the experiment tracker?
  • Security and compliance requirements: Depending on the type of organization and the field you work in, security and compliance are first-of-mind. Your IT department and management will likely be more interested in these details than a tool’s experiment tracking capabilities.

    Before starting your market research, you should get answers to questions like the following: Do you need role-based access control (RBAC for short)? Does your data need to be encrypted? Do you need to keep an access log? Does your experiment tracker have to integrate with your single sign-on (SSO) solution?
  • Costs: Compared to training deep learning models on GPU clusters, hosting an experiment tracker is undoubtedly on the cheap side. Nevertheless, costs between different options can differ widely and are often difficult to estimate for self-hosted solutions because cost items are not necessarily obvious. For example, remember to factor in personnel expenses for setup, ongoing administration, and maintenance.
  • Pricing: Does a managed product charge per seat, per model, per experiment, per project, or a flat license fee? What does that mean for your team and its growth trajectory? It’s not uncommon that a pricing scheme is attractive for small teams but leads to prohibitively high costs once new members join.
  • Support: As with any new software, you’ll likely get stuck somewhere or wonder whether there are best practices for your usage scenario. When evaluating ML experiment trackers, ask yourself: Is the documentation detailed and up-to-date? Does it include how-tos for your team’s typical workflow steps? Is there a dedicated community support forum, like a Discord or Slack channel, or will you have to rely on StackOverflow and blog posts by fellow ML engineers? Do you have the means to pay for commercial support? Would you like help with the initial setup and have someone come in for introductory workshops?

Comparison of ML experiment tracking tools

Before we dig into each tool, here’s a high-level comparison of features and integrations of the 13 best experiment tracking and management tools:

neptune.ai
Weights & Biases
Comet
DagsHub
ClearML
MLflow
Sacred & Omniboard
Azure Machine Learning
SageMaker Studio
Vertex.ai
Tensorboard
neptune.ai
Weights & Biases
Comet
DagsHub
ClearML
MLflow
Sacred & Omniboard
Azure Machine Learning
SageMaker Studio
Vertex.ai
Tensorboard
Overview
Focus
Entire Model Lifecycle
Entire Model Lifecycle
Entire Model Lifecycle
Experiment Management
End-to-End
Experiment Management
Entire Model Lifecycle
Data Versioning
Experiment Management
End-to-End
End-to-End
End-to-End
Experiment Management
Price

* Individual: Free (+ usage above free quota)

* Academia: Free

* Team: Paid

* Individual: Free (+ usage above free quota)

* Academia: Free

* Team: Paid

* Individual: Free (+ usage above free quota)

* Academia: Free

* Team: Paid

Free

* Individual: Free

* Academia: Free

* Team: Paid

Free or paid, depending on the plan

Free

* DVC: free

* DVC Studio: free or paid, depending on the plan

Free

Free

Is the experiment tracker a standalone component or a part of a broader ML platform?
Standalone component
Standalone component
Standalone component
Standalone component
ML platform
ML platform
ML platform
ML platform
Standalone component
ML platform
ML platform
ML platform
Standalone component
Commercial software, open-source software, or a managed cloud service?
Managed cloud service
Managed cloud service
Managed cloud service
Open-source software
Managed cloud service
Both
Open-source software
Both
Open-source software
Managed cloud service
Managed cloud service
Managed cloud service
Open-source software
On-premise version available?
N/A
N/A
N/A
N/A
Web UI or console-based?
Web UI
Web UI
Web UI
Web UI
Web UI
Web UI
Web UI
Both Web UI and console-based
Console-based (Sacred) / Web UI (Omniboard)
Both web UI and CLI
Both web UI and CLI
Both web UI and CLI
Web UI
Experiment Tracking Features
Log and display of metadata
– Dataset
Limited
Limited
Limited
Limited
Limited
Limited
Limited
Limited
Limited
– Code Versions
Limited
Limited
Limited
Limited
Limited
– Parameters
– Metrics and losses
Limited
Limited
Limited
Limited
– Images
N/A
Limited
– Audio
– Video
– Hardware consumption
Comparing experiments
– Table format diff
– Overlayed learning curves
– Code
Limited
Limited
Limited
Organizing and searching experiments and metadata
– Experiment table customization
Limited
Limited
Limited
Limited
Limited
Limited
Limited
Limited
Limited
– Custom dashboards
Limited
Limited
Limited
Limited
Limited
No
– Nested metadata structure support in the UI
Limited
Limited
Reproducibility and traceability
– One-command experiment re-run
– Experiment lineage
– Environment versioning
– Saving/fetching/caching datasets for experiments
Collaboration and knowledge sharing
– User groups and ACL
Only for Teams and Enterprise customers
Only for the “advanced” pricing model
Only for Teams Pro and Enterprise
Only for enterprise customers
Only for enterprise customers
Only in the managed version
Only for Teams and Enterprise customers
– Sharing UI links with project members
– Sharing UI links with external people
– Commenting

Note: This table was last updated in March 2024. Some information may be outdated today. See some incorrect info? Let us know, and we’ll update it.

neptune.ai

neptune.ai is an experiment tracker offering model versioning and real-time model performance monitoring. Neptune is available in cloud and on-premise versions and integrates with over 25 tools and libraries, including various model training and hyperparameter optimization tools.

See in the app
Neptune offers a highly customizable UI to analyze experiment metadata. | Explore a live Neptune project

With a strong focus on collaboration, Neptune allows users to create projects within the app, work on them together, and share UI links with each other or with external stakeholders. The platform provides fine-grained user management and a highly customizable user interface with advanced visualization features.

Advantages

  • Neptune can log a wide range of experiment metadata, including source code, Jupyter notebook snapshots, and Git information. Users can flexibly adapt the metadata structure to their needs.
  • Neptune is highly scalable and easily handles hundreds of thousands of experiments.
  • Neptune’s UI is versatile and customizable while easy to navigate for non-technical collaborators.

Limitations

  • Neptune is focused on experiment tracking and model management. Users are expected to stand up and manage their own MLOps infrastructure.
  • The free tier of Neptune’s SaaS offering is limited to a single active project.

Weights & Biases

Weights & Biases (also known as WandB and W&B) is a platform for experiment tracking, dataset versioning, and model management. WandB’s components, including a model registry and artifact store, allow you to store and manage the model lifecycle, version datasets, and models.

Weights & Biases UI showing comparison of experiment runs
Weights & Biases’ UI enables data scientists to interactively compare experiment runs. | Source

The experiment tracking feature is a central component of WandB. It supports the leading machine-learning and deep-learning frameworks, such as TensorFlow, PyTorch, Keras, and scikit-learn, out of the box but can log arbitrary metrics as well. WandB also supports CPU and GPU usage tracking.

The platform is available as a managed platform and as an on-premise tool in different configurations.

Advantages

  • Weights & Biases provides a unified platform for the entire ML development process, including dataset and model management. 
  • WandB allows users to create interactive dashboards and reports collaboratively.
  • Built-in support for hyperparameter search and model optimization with WandB Sweeps.

Limitations

  • The “Teams” plan is limited to ten team members accessing the platform and dashboards. Beyond that, you’ll need to negotiate an “Enterprise” contract.
  • User management and administration can be cumbersome.

Comet ML

Comet is a cloud-based MLOps platform that helps data scientists track experiments, manage the model lifecycle, and collaborate on ML projects. It also provides interactive dashboards for analyzing experiment metadata and comparing experiments. Comet’s UI presents code, graphics, audio, and text data, as well as common analysis tools like confusion matrices and histograms.

Comet UI
CometML provides a clean UI for visualization and analysis of experiment metadata. | Source

In addition to experiment tracking capabilities, Comet ML includes a model registry and production monitoring features. The SaaS offering is free for personal use and academics, and on-premise deployment is available for enterprise customers.

Advantages

  • Users can manage the entire model lifecycle through a single UI that provides customizable and versatile visualization features.
  • The tracking client captures a wide range of information about the experiment run’s environment.
  • Includes a dedicated component for working with LLMs.

Limitations

  • Team collaboration is only available in the paid plans.
  • Comet’s UI tends to become slow for large numbers of experiments.
  • Due to its deep integration with the platform, Comet ML’s experiment tracking functionality is difficult to use standalone.

Aim

Aim is an open-source experiment tracker created by AimStack. It was started in 2019 and offers extensive dashboards and plots, as well as comparing multiple runs.

The Aim experiment tracker can be run locally or hosted on Kubernetes for multiple users. The platform consists of two components: a UI and a tracking server. Its integration with MLflow allows you to use Aim’s versatile UI to explore MLflow experiment data. AimStack, the company behind Aim, offers enterprise support for Aim with multi-user deployments and custom plugins.

Aim UI showing possibility to explore and query past runs
Aim’s UI allows users to explore and query past experiment runs. | Source

In October 2023, AimStack released AimOS as the successor. It remains to be seen whether Aim will be actively developed further. Initially, they released AimOS as a direct replacement but later said they would continue the development of Aim.

Advantages

Limitations

  • Unclear future of the original Aim after the company behind it announced AimOS as the successor.
  • Aim does not support scikit-learn, a widely used ML framework.
  • No managed offer is available, and self-hosting requires significant effort.

DagsHub

DagsHub is a fully managed end-to-end AI platform built primarily on proven open-source tools. In that spirit, DagsHub relies on MLflow’s backend for experiment tracking but provides a custom UI. Paired with DVC and Git integrations, this enables fully reproducible ML experiments.

DagsHub UI
The DagsHub UI is organized similarly to that of version control systems like GitHub or GitLab. | Source

DagsHub is available as SaaS. The free “Community” tier allows for up to two collaborators in private projects and unlimited experiment tracking. In the “Enterprise” tier, getting an isolated or on-prem installation of DagsHub is possible.

Advantages

Limitations

  • Teams have to adopt the entire platform to benefit from DagsHub over using MLflow as a standalone tool.
  • Advanced authentication capabilities are only available in the “Enterprise” tier.

ClearML Experiment

ClearML (formerly known as Allegro Trains) is an end-to-end machine-learning platform with experiment-tracking capabilities. It is available as an open-source solution that can be self-hosted in the cloud or on Kubernetes and as a managed offering.

ClearML Experiment goes beyond tracking model metrics. It can also log code, notebooks, configuration files, and containers. It features a user-friendly web UI for tracking and visualizing experiments. Users can share results with others through ClearML Reports.

ClearML Experiment UI
ClearML Experiment can track and visualize GPU metrics and other hardware parameters. | Source

One strength is the tight integration with the other ClearML platform components for data management, model lifecycle management, and model serving capabilities. ClearML is framework-agnostic and can be extended.

Advantages

  • Integrated with a complete end-to-end ML platform that provides a unified user experience to data scientists, ML engineers, and DevOps specialists.
  • ClearML Experiment provides an offline mode in which all information is saved in a local folder to be synced when an internet connection becomes available again.
  • The ClearML Enterprise managed platform provides enterprise-grade security and governance.

Limitations

  • ClearML’s UI offers few customization options, limiting the ways in which experiment metadata can be visualized.
  • Due to its deep integration with the rest of the ClearML platform, ClearML Experiment is difficult to use as a standalone experiment tracker.

MLflow

MLflow is an open-source platform that helps manage the whole machine learning lifecycle. This includes experimentation but also model storage, reproducibility, and deployment. Each of these four elements is represented by one MLflow component: Tracking, Model Registry, Projects, and Models.

The MLflow Tracking component consists of an API and UI that support logging various metadata (including parameters, code versions, metrics, and output files) and later visualizing the results. Recently, the MLflow developers have added a dedicated LLM experiment tracker and experimental support for prompt engineering.

MLflow UI
MLflow’s UI enables users to plot experiment metadata. | Source

MLflow excels at streamlining ML lifecycle management and simplifying experiment tracking. However, it lacks many features that data science teams seek, such as dataset versioning or user access management. Further, you need to deploy MLflow on your own infrastructure and rely on its community for support. The managed MLflow by Databricks comes with user and access management.

Advantages

  • Focus on the whole lifecycle of the machine-learning process.
  • A strong and big community of users that provide community support.
  • Open interface that can be integrated with any ML library or language.

Limitations

  • Has to be self-hosted (although a managed offer by Databricks exists), which involves a significant overhead.
  • Security and compliance measures have to be implemented.
  • Lack of user and group management and collaborative features.

DVC Experiments and DVC Studio

DVC Experiments is the experiment-tracking component of the open-source Data Version Control (DVC) family of tools. Originally, DVC was an open-source version control system created specifically for machine learning projects built on top of Git. Accordingly, the main focus of DVC Experiments is on reproducibility through code and data versioning. It also includes features to run experiments.

DVC UI
DVC Studio comes with standard analyses like precision-recall plots, ROC curves, and confusion matrices. | Source

DVC Studio is the web interface of the DVC tools. It is available as a managed service and for self-hosting on AWS or Kubernetes. DVC enables data science teams to collaborate on dataset curation, model management, and ML experiments. It integrates a versatile plotting functionality and supports live visualization of metrics.

Advantages

Limitations

  • For users not familiar with Git or version control, navigating experiments and tracked metadata – which are organized through Git branches – can be a challenge.
  • Compared to dedicated experiment trackers, the visualization and experiment comparison features are limited.

Sacred and Omniboard

Sacred is an open-source software developed at the Swiss AI Lab IDSIA that allows machine learning researchers to configure, organize, log, and reproduce experiments. Sacred is a highly flexible experiment-tracking library that integrates with various storage backends.

Omniboard is a dedicated web UI for Sacred, which itself does not come with a UI built-in. It allows users to list and compare experiments, plot metrics, and access logged artifacts and source files.

Omniboard UI
Omniboard provides a UI tailored to the Sacred experiment tracker. | Source

Advantages

  • Flexible metadata structure that can be adapted to a wide range of models and data types.
  • Sacred provides a powerful command line interface and a versatile Python client.
  • Sacred can be connected to different UIs, such as Omniboard or Neptune. You can also directly query the MongoDB database it uses to store metadata.

Limitations

  • Since it does not provide a UI, Sacred requires users to be comfortable with the command line and have an advanced knowledge of Python.
  • As a research-focused tool, Sacred does not provide capabilities for model lifecycle management, such as a model registry.

Azure Machine Learning

Azure Machine Learning is Microsoft’s cloud-based MLOps platform. It lets you manage and automate the whole ML lifecycle, including model management, deployment, and monitoring. It is tightly integrated with the Azure platform, offering seamless integration with other Azure services that many organizations already use.

Azure ML UI
Azure Machine Learning Studio provides a range of plotting features for ML experiment metadata. | Source 

Azure Machine Learning provides a built-in machine-learning studio that allows users to organize, manage, and analyze experiments. They can visualize and compare different experiment runs and monitor resource utilization.

Advantages

Limitations

  • Adopting Azure Machine Learning as your MLOps solution locks you into the Azure ecosystem. While the MLflow compatibility of the experiment tracking solution makes migration to MLflow easy, moving data and your training scripts will require significantly more work.
  • Azure Machine Learning cannot be self-hosted but is only available as a SaaS solution.
  • Due to its tight integration with the Azure platform, the Azure Machine Learning tracking feature is challenging to use as a stand-alone tool.

Amazon SageMaker

Amazon SageMaker is the end-to-end machine learning solution on the AWS cloud platform. It allows data scientists and ML engineers to prepare, build, train, and deploy machine learning models. Amazon SageMaker is one of the oldest offerings on the market and is particularly interesting for those already invested in the AWS ecosystem.

Well-integrated with Amazon’s suite of cloud services and SageMaker’s AutoML and hyperparameter optimization features, SageMaker’s Experiments component offers capabilities such as logging machine learning experiments, tracking model performance, and storing relevant metadata and artifacts.

Amazon Sagemaker UI
Amazon SageMaker’s experiment tracking component is tightly integrated with the other platform features. | Source

In addition, Amazon SageMaker includes SageMaker Studio, a unified development environment for ML solutions. It comprises components such as a feature store, notebooks, and parameter tuning, all integrated into a user-friendly web interface. (Note that SageMaker studio was revamped at the end of 2023. The previous version is now referred to as “SageMaker Studio Classic.”)

Advantages

Limitations

  • Compared to dedicated experiment trackers, the experiment tracking capabilities of Amazon SageMaker are quite basic.
  • Opting for Amazon SageMaker as your machine learning platform binds you to AWS permanently. Migrating from Amazon’s cloud platform to a different MLOps stack will most likely involve significant work.
  • Amazon SageMaker cannot be self-hosted. All your data, models, and metadata will be stored within AWS.

Case Study: How Brainly Avoids Workflow Bottlenecks With Automated Tracking

Brainly is the leading learning platform worldwide, with the most extensive Knowledge Base for all school subjects and grades. Each month, over 350 million students, parents, and educators rely on Brainly as the proven platform to accelerate understanding and learning.

One of their core features and key entry points is Snap to Solve. It’s a machine learning-powered feature that lets users take and upload a photo. Snap to Solve then detects the question or problem in that photo and provides solutions.

The team uses Amazon SageMaker to run its computing workloads and serve its models. When the number of training runs on the team’s large compute architectures increased, they realized that their logs from Amazon SageMaker needed to be trackable and manageable, or they would cause bottlenecks in their workflow.

Read how the Brainly team adopted Neptune as their experiment tracker and explore Neptune’s SageMaker integration.

Google Vertex AI

Vertex AI is the fully managed machine learning solution of Google’s Cloud Platform. It provides a unified platform for the entire ML lifecycle, from data preparation and model training to model deployment, model experiment tracking, and monitoring.

Google has firmly established itself as a leading machine learning and AI player, with many advances coming from the tech giant’s research branch and infrastructure teams. Much of this experience has found its way into Vertex. The platform is especially interesting for teams working on Google Cloud or looking to leverage leading data management solutions like BigQuery.

Vertex AI UI
Vertex AI includes capabilities to plot and compare data from different experiment runs. | Source

Vertex AI includes the Vertex ML Metadata experiment tracker. This component is based on the ML Metadata (MLMD) library maintained by the TensorFlow Serving team but can be used to track training metrics and artifact lineage for any machine-learning framework. Vertex AI also integrates the open-source TensorBoard metadata visualization tool.

Advantages

  • Vertex ML Metadata is fully integrated with other Vertex AI components and Google Cloud Platform services.
  • Fine-grained access control through Google Cloud Platform’s Identity and Access Management capabilities.

Limitations

  • Choosing Vertex AI as your MLOps stack locks you to the Google Cloud Platform long-term. Migrating to a different cloud provider or a selection of SaaS and self-hosted tools will likely require re-engineering large parts of your infrastructure.
  • The Python SDK and REST API of Vertex ML Metadata are relatively low-level, which gives users a lot of flexibility but requires them to become familiar with the metadata schemas or develop their own.

TensorBoard

TensorBoard is the visualization tool integrated with TensorFlow, so it’s often the first choice of TensorFlow users. TensorBoard offers a suite of features for visualizing and debugging machine learning models. Users can track model metrics like loss and accuracy, visualize the model graph and changes to weights and biases, and project embeddings to a lower-dimensional space. TensorBoard also includes a profiler.

Tensorboard UI
TensorBoard includes advanced features like an embedding projector. | Source

Like its parent framework, TensorBoard is open source. It runs locally and can be integrated with Jupyter and Colab notebooks.

Advantages

  • Well-developed features for working with images, text, and embeddings
  • Tight and deep integration with the latest versions of TensorFlow.
  • TensorBoard runs in any Python environment with TensorFlow installed. It does not require the setup of a database or additional libraries.

Limitations

  • TensorBoard has to be self-hosted. (A managed version previously available at tensorboard.dev was discontinued at the end of 2023).
  • No collaboration features, access control, user management, or centralized data store.
  • Unsuitable for frameworks other than TensorFlow.

Conclusion

Tracking machine learning experiments has always been an important element of the ML development process. However, earlier, the process was very manual, time-consuming, and error-prone. 

Over the last few years, the market of modern experiment tracking and experiment management tools for machine learning has grown and matured. The range of available options is broad and diversified now. No matter if you’re looking for an open-source or enterprise solution or if you prefer a standalone experiment tracking framework or an end-to-end platform, you’ll certainly find the right tool.

FAQ

  • Machine learning experiment tracking tools help data scientists and machine-learning practitioners manage and keep a record of their machine-learning experiments. ML experiment trackers are crucial to ensure the reproducibility of training, gaining insights into model performance, and fostering collaboration.

    An ML experiment tracking tool captures and organizes metadata associated with each experiment, including model hyperparameters, any code that was used, dataset versions, and evaluation metrics. This allows users to trace the evolution of models, compare different experiments, and identify the conditions that led to the best-performing models.

  • Experiment trackers contribute to the efficiency and transparency of the ML development process by providing a systematic way to log and analyze experiment-related information.

     

    The main reasons for using an ML experiment tracker are:

    • All of your ML experiments and models are organized in a single place. This allows you to search and filter experiments quickly, irrespective of where you ran them.
    • Compare ML experiments, analyze results, and debug model training. An experiment tracker helps you consistently log all relevant information in a well-defined format.
    • Collaborate on ML experiments and share results with stakeholders. Experiment trackers allow teams to work together and create dashboards and reports.
    • Monitor experiments and model training progress. This is particularly important when training jobs run on remote servers.
  • What experiment tracker works best for your situation and your team depends on a number of factors:

    • It has to fit your team’s workflow and preferred way of working. For example, some teams love CLI-based tools, while others prefer to work with a rich UI.

    • It has to be compatible with the machine-learning frameworks you’re using and the machine-learning tasks you’re solving. Some experiment trackers might offer excellent support for time-series forecasting but lack the features necessary for NLP tasks.

    • The experiment tracker has to meet your collaboration needs. In particular, if you’re working remotely or with outside collaborators, integrated communication capabilities and the possibility of sharing links to dashboards come in handy.

    • The experiment tracker must meet your business requirements. Typically, this is not about costs (experiment trackers tend to be relatively cheap compared to the computational resources ML model training requires) but compliance and (data) security.
  • A typical experiment tracking setup consists of two components: A tracking server that stores the experiment metadata and provides a UI to analyze it, and a client library that you integrate in your training script.

    Throughout a model training run, the experiment tracking client sends metadata such as intermediate metrics, sample predictions, and model artifacts to the tracking server. The server receives this information and writes it to a database or artifact store. Later, you can access this data through a web interface, CLI, or programmatically through a script.

Was the article useful?

Thank you for your feedback!
What topics would you like to see for your next read
Let us know what should be improved

    Thanks! Your suggestions have been forwarded to our editors