Every model training process goal is to come up with the best performing configuration of training data, metrics, hyperparameters, code, etc.
The road to finding this configuration goes through running a lot of experiments, analyzing their results, and trying new ideas. You can think about experiment tracking as a bridge between this road and the goal.
Machine learning experiment tracking is the process of saving all experiment-related information (metadata) that you care about for every experiment you run.
“What you care about” will strongly depend on your project, but it may include:
Scripts used for running the experiment
Environment configuration files
Versions of the data used for training and evaluation
Example predictions on the validation set (common in computer vision)
Looking at an even broader picture, experiment tracking is a part of MLOps, a larger ecosystem of tools and methodologies that deals with the operationalization of Machine Learning. For example, in the visualization below, experiment tracking connects the research part of the model development process with the deployment and monitoring part.
Experiment tracking focuses on the iterative model development phase when you try many things to get your model performance to the level you need. Sometimes, just a small change in training metadata has a huge impact on that performance.
Without tracking what exactly you did, you won’t be able to compare or reproduce the results. Not to mention, you’ll lose a lot of time and have trouble meeting business objectives.
Experiment tracking gives you a sense of security that you don’t miss any data or insight.
You don’t have to spend hours or days thinking about what experiments to run next, how your previous (or even running) experiments do, which experiment performed the best, or how to reproduce and deploy it. You simply look at your experiment tracking database, and you know it.
Or at least, you’re able to figure it out using just this one source of truth.
Now, let’s dive deeper into different use cases and the benefits of experiment tracking.
During the course of a project (especially when there are more people working on it), you can end up having your experiment results scattered across many machines. In such cases, it’s difficult to manage the experimentation process, and it’s likely that some information gets lost.
With the experiment tracking system in place, all of your experiment results are logged to one repository by design. Your work (and the work of your team) is simplified and, most importantly, standardized.
The possibility to easily compare model training runs is the most frequently mentioned benefit of experiment tracking.
“Tracking and comparing different approaches has notably boosted our productivity, allowing us to focus more on the experiments, develop new, good practices within our team and make better data-driven decisions.” – Tomasz Grygiel, Data Scientist @idenTT
When you follow the same protocol for logging runs, those comparisons can go really deep. And you don’t have to do much extra.
Model training takes time. Sometimes it can take days or weeks, especially when you deal with long running deep learning jobs. Thanks to experiment tracking you can monitor their performance live, and see (almost) right away that there is no chance for better results. Instead of letting such experiments run, you are better off simply stopping them and trying something different.
“Without the information I have in the Monitoring section I wouldn’t know that my experiments are running 10 times slower as they could.” – Michał Kardas, Machine Learning Researcher @TensorCell
Reproducing model training runs is a crucial part of every researcher’s work.
Imagine you run a bunch of experiments (each of them with slightly different parameters), you get satisfying results, and you want to repeat the best performing run.
If you haven’t tracked any metadata, this scenario ends here, and you have to do everything again (this time, hopefully, with proper tracking in place!).
ML experiment tracking allows you to have all experiments recorded in a centralized way, hence fully reproducible. You can go back to old results whenever you want.
The advantage that connects all previous benefits is the ability to share experiments between team members.
Example of projects dashboard in an experiment tracking platform
Experiment tracking lets you organize and compare not only your past experiments but also see what everyone else was trying and how that worked out.
“Being able to see my team’s work results any time I need makes it effortless to track progress and enables easier coordination.” – Michael Ulin, VP, Machine Learning @Zesty.ai
When you are part of a team, and many people are running experiments, having one source of truth for your entire team is really important and makes it so much easier to finalize projects.
How to implement experiment tracking into your workflow?
If you don’t track your experiments yet, there are a few options to consider. The most popular being:
1. Spreadsheets + naming conventions
A common approach to experiment tracking is to simply create a big spreadsheet where you put all of the information that you can (metrics, parameters, etc) and a directory structure where things are named in a certain way.
It’s an easy-to-implement solution, and it’s definitely something to start with when you don’t run too many experiments.
But as straightforward as it is, experiment tracking in spreadsheets also requires a lot of discipline and time. Whenever you run an experiment, you look at the results and copy them to the spreadsheet. So, it’s a very manual process and it doesn’t scale well. Spreadsheets are flexible only until a certain point because they weren’t created specifically for tracking runs.
Read more:
You can keep track of your work in spreadsheets, but it’s super error-prone. And every experiment that I don’t use and don’t look at afterward is wasted compute: it’s bad for the environment, and it’s bad for me because I wasted my time.
Thore BürgelPhD Student @AILS Labs
Another option is to version all of your experiment metadata in GitHub.
The way you can go about it is to commit metrics, parameters, charts, and whatever you want to keep track of to GitHub when running your experiment. It can be done with post-commit hooks where you create or update some files (configs, charts, etc) automatically after your experiment finishes.
Yet, again, GitHub wasn’t built for Machine Learning experiment tracking. So, it can serve the purpose to some extent, but it’s not the most effective and scalable option.
3. Building own experiment tracking platform
Many companies choose to build an in-house experiment tracking platform.
It can be the most tailored solution with exactly the features necessary for the project. But, it’s also a demanding option in terms of time and resources, and it requires continuous maintenance. In many cases, it also lacks the UI, which makes such a platform difficult to interact with for people that don’t code much.
We’re still a fairly small team (10 devs or so), so we’d rather avoid having to manage this system ourselves, so we can focus on building our product and improving the AI. We had to do that with our previous system, and it was a huge time sink.
Andreas MalekosChief Scientist @Continuum Industries
Finally, there are modern experiment-tracking tools. They were created as a response to the ML community’s need for an automated way of tracking experiments. These are solutions built specifically for tracking, organizing, and comparing experiments.
Within the first few tens of runs, I realized how complete the tracking was – not just one or two numbers, but also the exact state of the code, the best-quality model snapshot stored to the cloud, the ability to quickly add notes on a particular experiment. My old methods were such a mess by comparison.
Edward DixonData Scientist and Founder @Rigr AI
Experiment tracking tools
When choosing an ML experiment tracking tool, you should review a few points. First and foremost, remember that there’s no one platform that fits all use cases.
Depending on your existing workflow, you may consider the following features of the tool:
What use cases does it offer?
Is it a standalone component or a part of a broader ML platform?
Is it delivered as commercial software, open-source software, or a managed cloud service?
Which metadata log and display does it support?
How flexible and stable is the API?
What are the comparison & visualization features?
How user-friendly is the UI? Can you easily organize and search experiments and metadata?
Does it allow for collaborative work?
Is it easy to try out and later integrate with your workflow? What other frameworks does it integrate with?
What’s the price?
Having all that in mind, review available experiment tracking tools and select the one that checks all your boxes.
Here are a few articles that can help you.
It can be the most tailored solution with exactly the features necessary for the project. But, it’s also a demanding option in terms of time and resources, and it requires continuous maintenance. In many cases, it also lacks the UI, which makes such a platform difficult to interact with for people that don’t code much.
15 Best Tools for Tracking Machine Learning Experiments
If you want to read about how different ML practitioners and industry teams implemented experiment tracking in their workflows, check these case studies and examples.
Setting Up a Scalable ML Workflow With Only a Few Data Scientists & ML Engineers [Case Study]
To provide the best experiences, we use technologies like cookies to store and/or access device information. Find out more in our privacy policy.
Functional
Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes.The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.