Experiment Tracking in Machine Learning

Everything you need to know about experiment tracking in Machine Learning

You’ll find out about: 

What is experiment tracking in Machine Learning?

Every model training process goal is to come up with the best performing configuration of training data, metrics, hyperparameters, code, etc. 

The road to finding this configuration goes through running a lot of experiments, analyzing their results, and trying new ideas. You can think about experiment tracking as a bridge between this road and the goal.

Machine learning experiment tracking is the process of saving all experiment-related information (metadata) that you care about for every experiment you run. 

“What you care about” will strongly depend on your project, but it may include:

  • Scripts used for running the experiment
  • Environment configuration files 
  • Versions of the data used for training and evaluation
  • Parameter configurations
  • Evaluation metrics 
  • Model weights
  • Performance visualizations (confusion matrix, ROC curve)  
  • Example predictions on the validation set (common in computer vision)

Looking at an even broader picture, experiment tracking is a part of MLOps, a larger ecosystem of tools and methodologies that deals with the operationalization of Machine Learning. For example, in the visualization below, experiment tracking connects the research part of the model development process with the deployment and monitoring part. 

Read more about these concepts:

Why should you care about ML experiment tracking?

Experiment tracking focuses on the iterative model development phase when you try many things to get your model performance to the level you need. Sometimes, just a small change in training metadata has a huge impact on that performance. 

Without tracking what exactly you did, you won’t be able to compare or reproduce the results. Not to mention, you’ll lose a lot of time and have trouble meeting business objectives.

Experiment tracking gives you a sense of security that you don’t miss any data or insight.

You don’t have to spend hours or days thinking about what experiments to run next, how your previous (or even running) experiments do, which experiment performed the best, or how to reproduce and deploy it. You simply look at your experiment tracking database, and you know it.

Or at least, you’re able to figure it out using just this one source of truth.


Now, let’s dive deeper into different use cases and the benefits of experiment tracking.

How to implement experiment tracking into your workflow?

If you don’t track your experiments yet, there are a few options to consider. The most popular being:

1. Spreadsheets + naming conventions

A common approach to experiment tracking is to simply create a big spreadsheet where you put all of the information that you can (metrics, parameters, etc) and a directory structure where things are named in a certain way.

It’s an easy-to-implement solution, and it’s definitely something to start with when you don’t run too many experiments. 

But as straightforward as it is, experiment tracking in spreadsheets also requires a lot of discipline and time. Whenever you run an experiment, you look at the results and copy them to the spreadsheet. So, it’s a very manual process and it doesn’t scale well. Spreadsheets are flexible only until a certain point because they weren’t created specifically for tracking runs.

Read more:

You can keep track of your work in spreadsheets, but it’s super error-prone. And every experiment that I don’t use and don’t look at afterward is wasted compute: it’s bad for the environment, and it’s bad for me because I wasted my time. Thore Bürgel PhD Student @AILS Labs

Read more:

Switching from Spreadsheets to neptune.ai and How It Pushed My Model Building Process to the Next Level

2. Versioning configuration files with GitHub

Another option is to version all of your experiment metadata in GitHub.

The way you can go about it is to commit metrics, parameters, charts, and whatever you want to keep track of to GitHub when running your experiment. It can be done with post-commit hooks where you create or update some files (configs, charts, etc) automatically after your experiment finishes.

Yet, again, GitHub wasn’t built for Machine Learning experiment tracking. So, it can serve the purpose to some extent, but it’s not the most effective and scalable option.

3. Building own experiment tracking platform

Many companies choose to build an in-house experiment tracking platform. 

It can be the most tailored solution with exactly the features necessary for the project. But, it’s also a demanding option in terms of time and resources, and it requires continuous maintenance. In many cases, it also lacks the UI, which makes such a platform difficult to interact with for people that don’t code much.

We’re still a fairly small team (10 devs or so), so we’d rather avoid having to manage this system ourselves, so we can focus on building our product and improving the AI. We had to do that with our previous system, and it was a huge time sink. Andreas Malekos Chief Scientist @Continuum Industries

Read more:

How to Build an Experiment Tracking Tool [Learnings From Engineers Behind Neptune]

4. Using modern experiment tracking tools 

Finally, there are modern experiment-tracking tools. They were created as a response to the ML community’s need for an automated way of tracking experiments. These are solutions built specifically for tracking, organizing, and comparing experiments.

Within the first few tens of runs, I realized how complete the tracking was – not just one or two numbers, but also the exact state of the code, the best-quality model snapshot stored to the cloud, the ability to quickly add notes on a particular experiment. My old methods were such a mess by comparison. Edward Dixon Data Scientist and Founder @Rigr AI

Experiment tracking tools

When choosing an ML experiment tracking tool, you should review a few points. First and foremost, remember that there’s no one platform that fits all use cases. 

Depending on your existing workflow, you may consider the following features of the tool:


  • What use cases does it offer? 
  • Is it a standalone component or a part of a broader ML platform? 
  • Is it delivered as commercial software, open-source software, or a managed cloud service?
  • Which metadata log and display does it support?
  • How flexible and stable is the API?
  • What are the comparison & visualization features?
  • How user-friendly is the UI? Can you easily organize and search experiments and metadata?
  • Does it allow for collaborative work?
  • Is it easy to try out and later integrate with your workflow? What other frameworks does it integrate with? 
  • What’s the price?

Having all that in mind, review available experiment tracking tools and select the one that checks all your boxes. 

Here are a few articles that can help you.

It can be the most tailored solution with exactly the features necessary for the project. But, it’s also a demanding option in terms of time and resources, and it requires continuous maintenance. In many cases, it also lacks the UI, which makes such a platform difficult to interact with for people that don’t code much.

15 Best Tools for Tracking Machine Learning Experiments
The Best Tools for Machine Learning Model Visualization
The Best Tools to Monitor Machine Learning Experiment Runs
Best Tools to Log and Manage ML Model Building Metadata
Best Metadata Store Solutions
Best 7 Data Version Control Tools

Examples of experiment tracking implementation

If you want to read about how different ML practitioners and industry teams implemented experiment tracking in their workflows, check these case studies and examples. 

Setting Up a Scalable ML Workflow With Only a Few Data Scientists & ML Engineers [Case Study]
How to Manage Experiment Tracking When the Number of Experiments Changes [Case Study]
Setting Up Experiment Tracking for a Team of ML Researchers [Case Study]
How to Manage Experiments When Working with SageMaker Pipelines [Case Study]
How to Keep Track of Over 100k Models [Case Study]
Setting up a Scalable Research Workflow for Medical ML at AILS Labs [Case Study]
Setting up CI/CD for the Infrastructure Design Optimization Engine [Case Study]
MLOps at GreenSteam: Shipping Machine Learning [Case Study]
How to Track and Organize ML Experiments That You Run in Google Colab
How to Keep Track of Deep Learning Experiments in Notebooks
How to Keep Track of Experiments in PyTorch Using Neptune
How to Keep Track of PyTorch Lightning Experiments with Neptune

Want to start playing with experiment tracking right now?

    Contact with us

    This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

    * - required fields