Experiment Tracking in Machine Learning

    Everything you need to know about experiment tracking in Machine Learning. You’ll find out about: 

    Want to start playing with experiment tracking right now?

    What is experiment tracking in Machine Learning?

    Every model training process goal is to come up with the best performing configuration of training data, metrics, hyperparameters, code, etc. 

    The road to finding this configuration goes through running a lot of experiments, analyzing their results, and trying new ideas. You can think about experiment tracking as a bridge between this road and the goal.

    Machine learning experiment tracking is the process of saving all experiment-related information (metadata) that you care about for every experiment you run. 

    “What you care about” will strongly depend on your project, but it may include:

    • Scripts used for running the experiment
    • Environment configuration files 
    • Versions of the data used for training and evaluation
    • Parameter configurations
    • Evaluation metrics 
    • Model weights
    • Performance visualizations (confusion matrix, ROC curve)  
    • Example predictions on the validation set (common in computer vision)

    Looking at an even broader picture, experiment tracking is a part of MLOps, a larger ecosystem of tools and methodologies that deals with the operationalization of Machine Learning. For example, in the visualization below, experiment tracking connects the research part of the model development process with the deployment and monitoring part. 

    MLOps cycle

    Why should you care about ML experiment tracking?

    Experiment tracking focuses on the iterative model development phase when you try many things to get your model performance to the level you need. Sometimes, just a small change in training metadata has a huge impact on that performance. 

    Without tracking what exactly you did, you won’t be able to compare or reproduce the results. Not to mention, you’ll lose a lot of time and have trouble meeting business objectives.

    Click to tweet

    Experiment tracking gives you a sense of security that you don’t miss any data or insight.

    You don’t have to spend hours or days thinking about what experiments to run next, how your previous (or even running) experiments do, which experiment performed the best, or how to reproduce and deploy it. You simply look at your experiment tracking database, and you know it. Or at least, you’re able to figure it out using just this one source of truth. 

    Now, let’s dive deeper into different use cases and the benefits of experiment tracking.

    See all model training metadata in one place

    During the course of a project (especially when there are more people working on it), you can end up having your experiment results scattered across many machines. In such cases, it’s difficult to manage the experimentation process, and it’s likely that some information gets lost.

    With the experiment tracking system in place, all of your experiment results are logged to one repository by design. Your work (and the work of your team) is simplified and, most importantly, standardized.

    Get started with Neptune

    Example of experiment tracking dashboard with various metadata in one place – parameters, metrics, visualizations, and more.

    Read more:

    Compare model training runs

    The possibility to easily compare model training runs is the most frequently mentioned benefit of experiment tracking.

    “Tracking and comparing different approaches has notably boosted our productivity, allowing us to focus more on the experiments, develop new, good practices within our team and make better data-driven decisions.” – Tomasz Grygiel, Data Scientist @idenTT

    When you follow the same protocol for logging runs, those comparisons can go really deep. And you don’t have to do much extra.

    Compare runs - experiment tracking

    Example of runs comparison table

    Read more:

    See model training runs live

    Model training takes time. Sometimes it can take days or weeks, especially when you deal with long running deep learning jobs. Thanks to experiment tracking you can monitor their performance live, and see (almost) right away that there is no chance for better results. Instead of letting such experiments run, you are better off simply stopping them and trying something different.

    “Without the information I have in the Monitoring section I wouldn’t know that my experiments are running 10 times slower as they could.” – Michał Kardas, Machine Learning Researcher @TensorCell

    Model monitoring - Experiment tracking

    Example of model monitoring dashboard

    Read more:

    Reproduce model training runs

    Reproducing model training runs is a crucial part of every researcher’s work.

    Imagine you run a bunch of experiments (each of them with slightly different parameters), you get satisfying results, and you want to repeat the best performing run.

    If you haven’t tracked any metadata, this scenario ends here, and you have to do everything again (this time, hopefully, with proper tracking in place!).

    ML experiment tracking allows you to have all experiments recorded in a centralized way, hence fully reproducible. You can go back to old results whenever you want.

    Metadata dashboard artifacts

    Example dashboard with source code, dataset sample, and other artifacts

    Read more:

    Collaborate and share knowledge

    The advantage that connects all previous benefits is the ability to share experiments between team members.

    Collaborate - experiment tracking

    Example of projects dashboard in an experiment tracking platform

    Experiment tracking lets you organize and compare not only your past experiments but also see what everyone else was trying and how that worked out.

    “Being able to see my team’s work results any time I need makes it effortless to track progress and enables easier coordination.” – Michael Ulin, VP, Machine Learning @Zesty.ai

    When you are part of a team, and many people are running experiments, having one source of truth for your entire team is really important and makes it so much easier to finalize projects.

    Collaborate share knowledge - experiment tracking

    Example view of team experiments

    Read more:

    How to implement experiment tracking into your workflow?

    If you don’t track your experiments yet, there are a few options to consider. The most popular being:

    1. Spreadsheets + naming conventions

    A common approach to experiment tracking is to simply create a big spreadsheet where you put all of the information that you can (metrics, parameters, etc) and a directory structure where things are named in a certain way.

    It’s an easy-to-implement solution, and it’s definitely something to start with when you don’t run too many experiments. 

    But as straightforward as it is, experiment tracking in spreadsheets also requires a lot of discipline and time. Whenever you run an experiment, you look at the results and copy them to the spreadsheet. So, it’s a very manual process and it doesn’t scale well. Spreadsheets are flexible only until a certain point because they weren’t created specifically for tracking runs.

    “You can keep track of your work in spreadsheets, but it’s super error-prone. And every experiment that I don’t use and don’t look at afterward is wasted compute: it’s bad for the environment, and it’s bad for me because I wasted my time.”

    Thore Bürgel

    Thore Bürgel

    PhD Student @AILS Labs

    2. Versioning configuration files with GitHub

    Another option is to version all of your experiment metadata in GitHub.

    The way you can go about it is to commit metrics, parameters, charts, and whatever you want to keep track of to GitHub when running your experiment. It can be done with post-commit hooks where you create or update some files (configs, charts, etc) automatically after your experiment finishes.

    Yet, again, GitHub wasn’t built for Machine Learning experiment tracking. So, it can serve the purpose to some extent, but it’s not the most effective and scalable option.

    3. Building own experiment tracking platform

    Many companies choose to build an in-house experiment tracking platform. 

    It can be the most tailored solution with exactly the features necessary for the project. But, it’s also a demanding option in terms of time and resources, and it requires continuous maintenance. In many cases, it also lacks the UI, which makes such a platform difficult to interact with for people that don’t code much.

    “We’re still a fairly small team (10 devs or so), so we’d rather avoid having to manage this system ourselves, so we can focus on building our product and improving the AI. We had to do that with our previous system, and it was a huge time sink.”

    Andreas Malekos

    Andreas Malekos

    Chief Scientist @Continuum Industries

    4. Using modern experiment tracking tools 

    Finally, there are modern experiment tracking tools. They were created as a response to the ML community’s need for an automated way of tracking experiments. These are solutions built specifically for tracking, organizing, and comparing experiments.

    “Within the first few tens of runs, I realized how complete the tracking was – not just one or two numbers, but also the exact state of the code, the best-quality model snapshot stored to the cloud, the ability to quickly add notes on a particular experiment. My old methods were such a mess by comparison.”

    Edward Dixon

    Data Scientist and Founder @Rigr AI

    Experiment tracking tools were designed to treat machine learning experiments as first-class citizens, and they will always:

    • be easier to use for a machine learning person than general tools
    • have better integrations with the ML ecosystem
    • have more experiment-focused features than the general solutions  
    Example dashboard in Neptune, a metadata store built for experiment tracking

    Experiment tracking tools

    When choosing an ML experiment tracking tool, you should review a few points. First and foremost, remember that there’s no one platform that fits all use cases. 

    Depending on your existing workflow, you may consider the following features of the tool: 

    • What use cases does it offer? 
    • Is it a standalone component or a part of a broader ML platform? 
    • Is it delivered as commercial software, open-source software, or a managed cloud service?
    • Which metadata log and display does it support?
    • How flexible and stable is the API?
    • What are the comparison & visualization features?
    • How user-friendly is the UI? Can you easily organize and search experiments and metadata?
    • Does it allow for collaborative work?
    • Is it easy to try out and later integrate with your workflow? What other frameworks does it integrate with? 
    • What’s the price?

    Having all that in mind, review available experiment tracking tools and select the one that checks all your boxes. 

    Here are a few articles that can help you. 

    Examples of experiment tracking implementation

    If you want to read about how different ML practitioners and industry teams implemented experiment tracking in their workflows, check these case studies and examples. 

    Stop losing the results of your work, start tracking your experiments

    Evaluation and selection

    The Ultimate Guide to Evaluation and Selection of Models in Machine Learning

    Read more

    The Best MLOps Tools You Need to Know as a Data Scientist

    Read more
    ML project likely to succeed

    How to Make a Machine Learning Project More Likely to Succeed?

    Read more

    Best Tools to Manage Machine Learning Projects

    Read more