While working on a machine learning project, getting good results from a single model-training run is one thing, but keeping all of your machine learning experiments organized and having a process that lets you draw valid conclusions from them is quite another. That’s what machine learning experiment management helps with.
In this article, I will explain why you, as data scientists and machine learning engineers, need a tool for tracking machine learning experiments and what is the best software you can use for that.
Tools for tracking machine learning experiments – who needs them and why?
- Data Scientists: Currently many organizations are either including ML in their products as an added value or are AI-first companies. These organizations have to adhere to MLOps processes and tools such as experiment tracking to ensure better collaborations between individuals, teams, and the success of the ML project. Without them when a data scientist or data science team wants to come back to an idea, re-run a model from a couple of months ago, or simply compare and visualize the differences between runs, the need for a system or tool for tracking ML experiments becomes (painfully) apparent.
- Machine Learning Engineers: Right after the data scientist or data science team finalizes model development and is ready to launch the said model, the MLEs are the ones who take all the research/dev code, model and turn it all into a production-ready version as well as deploy it. But in between this handoff process from DS’ to MLEs, there is more information about the model than just the weights that are needed to ensure a successfully deployed solution (one that is debuggable, maintainable, reproducible, and comparable). That’s when experiment metadata is very important – from the dataset, code, and model version to hyperparameters and other configurations used to train the model.
- Managers/Business people: tracking software creates an opportunity to involve other team members like managers or business stakeholders in your machine learning projects. Thanks to the possibility to prepare visualizations, add comments and share the work, managers and co-workers can easily track the progress and cooperate with the machine learning team.
Here is an in-depth article about experiment tracking for those of you who want to learn more.
The best tools for tracking machine learning experiments (also deep learning experiments)
Here’s a comparison of the features and integrations of the 15 best experiment management tools.
This comparison table was last updated on 29/April/2020. Some information may be outdated today. See some incorrect info? -> tell us we’ll update it.
Neptune is a metadata store for any MLOps workflow, built for both research and production teams that run a lot of experiments.
Individuals and organizations use Neptune for data versioning, experiment tracking, and model registry to have control over their experimentation and model development. These core features allow you to replace folder structures, spreadsheets, and naming conventions with a single source of truth where all metadata generated during the machine learning lifecycle is organized, easy to find, query, analyze as well as share with a team and managers. Additionally, Neptune is very flexible, works with many other frameworks, and thanks to its stable user interface, it enables great scalability (to millions of runs).
- Log and display all metadata types including parameters, model weights, images, HTML, audio, video, etc.
- Tools for efficient team collaboration and project supervision
- Jupyter notebook tracking included
- Live model training
- Both database and dashboard scale with thousands of runs
Weight & Biases targets the most advanced deep learning teams. It allows them to record experiments and visualize every part of the research. Weight & Biases has been created to facilitate collaboration between data scientists and offers many useful features in this matter. All of it with well-designed user experience.
- Created for deep learning experiment tracking
- Easy integration process
- Customizable visualization and reporting tools
➡️ See the comparison between Weights & Biases and Neptune.
Similar to the previously described tools, Comet was built to enable tracking of machine learning projects. The team behind this software has a mission to help data scientists better organize and manage their experiments. Comet provides the possibility to easily compare experiments and keep a record of the collected data, as well as collaborate with other team members.
- Quick and easy adaptation with any machine
- Works well with existing ML libraries
- Safeguard IP
➡️ See the comparison between Comet and Neptune.
“Every experiment is sacred…” as they say in the Sacred tool description. Sacred is open-source software and allows machine learning engineers to configure, organize, log and reproduce experiments. Sacred doesn’t come with its proper UI but there are a few dashboarding tools that you can connect to it, such as Omniboard, Sacredboard or Neptune. Also, it doesn’t have the scalability of previous tools and has not been adapted to team collaboration, however, it has great potential when it comes to individual research.
- Open-source tool
- Extensive experiment parameters customization options
- Easy integration
➡️ See the comparison between Sacred + Omniboard and Neptune.
➡️ See also: The Best Sacred + Omniboard Alternatives
MLflow is an open-source platform that helps manage the whole machine learning lifecycle. This includes experimentation, but also reproducibility and deployment. Each of these three elements represented by one MLflow component: Tracking, Projects, and Models. That means a data scientist who works with MLflow is able to track an experiment, organize it, describe it for other ML engineers and pack it into a machine learning model. It’s been designed to enable scalability from one person to big organization, however, it works best for an individual user.
- Focus on the whole lifecycle of the machine learning process
- Compatible with many additional tools and platforms
- Open interface integrated with any ML library or language
➡️ See the comparison between MLflow and Neptune.
TensorBoard is another experiment tracking tool. It’s open-source and offers a suite of tools for visualization and debugging of machine learning models. TensorBoard is the most popular solution on the market and thus it’s widely integrated with many other tools and applications. What’s more, it has an extensive network of engineers using this software and sharing their experience and ideas. This makes a powerful community ready to solve any problem. The software, itself however, is best suited for an individual user.
- Large library of pre-built tracking tools
- Integration with many other tools and applications
- Well prepared problem-solving materials and community
➡️ See the comparison between TensorBoard and Neptune.
➡️ See also: The Best TensorBoard Alternatives (2020 Update).
7. Guild AI
The team behind Guild AI states that “The faster and more effective you can apply experiments, the sooner you’ll complete your work.” In order to make this process well organized they created this open-source experiment tracking software, which is best suited for individual projects. It’s lightweight and equipped with many useful features that make it easier to run, analyze, optimize and recreate machine learning experiments. What’s more, Guild AI includes a variety of analytics tools making the experiments comparison process much easier.
- The automated machine learning process
- Integrated with any language and library
- Remote training and backup possibility
➡️ See the comparison between Guild AI and Neptune.
Polyaxon is a platform that focuses on both, the whole life cycle management of machine learning projects as well as the facilitation of the ML team collaboration. It includes a wide range of features from tracking and optimization of experiments to model management and regulatory compliance. The main goal of its developers is to maximize the results and productivity while saving costs. It’s worth mentioning, however, that Polyaxon needs to be integrated into your infra/cloud before it’s ready to use.
- Integrated with most popular deep learning frameworks and ML libraries
- Designed to serve different groups of interests including data scientists, team leads and architects
- Team collaboration possibilities
➡️ See the comparison between Polyaxon and Neptune.
Trains was built to track the “glorious but messy process of training production-grade deep learning models”, as stated by its creators. The main focus of the software is to help keep track of machine learning and deep learning experiments in an effortless, yet effective way. Trains is an open-source platform that is still in the beta stage, however, it is being constantly developed and upgraded.
- Quick and easy implementation process
- Possibility to boost team collaboration
- Useful features designed to track the experiment process and save data to one centralized server
Valohai has been designed with data scientists in mind and its main benefit is that it makes the model building process faster. It does it with large-scale automation but needs to be integrated with your infrastructure/private cloud first. Valohai is compatible with any language or framework, as well as many different tools and apps. The software is also teamwork-oriented and has many features that facilitate it.
- Significant acceleration of the model building process
- Helpful customer service and monthly checkup
- Focused on the entire lifecycle of machine learning
Pachyderm is a tool that makes it possible for its users to control an end-to-end machine learning cycle. From data lineage, through building and tracking experiments, to scalability options – with Pachyderm, it’s all covered. The software is available in three different versions, Community Edition (open-source, with ability to be used anywhere), Enterprise Edition (complete version-controlled platform) and Hub Edition (still a beta version, combining characteristics of the two previous versions). It needs to be integrated with your infrastructure/private cloud, thus, it’s not as lightweight as some of the other tools mentioned before.
- Possibility to adapt the software version to your own needs
- End-to-end process support
- Established and backed by a strong community of experts
➡️ See the comparison between Pachyderm and Neptune.
Kubeflow is a software with the main goal of run orchestration and making deployments of machine learning workflows easier. It’s known as the machine learning toolkit for Kubernetes and aims to use the Kubernetes potential to facilitate the scaling of machine learning models. The team behind Kubeflow is constantly developing its features and does its best to make data scientists’ life easier. There are some tracking capabilities but it’s not the main focus of the project. It can be easily used with other tools on this list as a complementary tool.
- Multi-framework integration
- Perfect for Kubernetes users
- Open-source character
➡️ See the comparison between Kubeflow and Neptune.
Verta’s main features can be summarized in four words: track, collaborate, deploy and monitor. As one can see, the software has been created to facilitate the management of the entire machine learning lifecycle. And it’s equipped with the necessary tools to assist ML teams in every stage of the process. The variety of features, however, causes the platform to be more complex and thus, not as lightweight as other options we mention.
- Compatibility with other ML frameworks
- Assistance in the end-to-end machine learning process
- User-friendly design
14. SageMaker Studio
SageMaker Studio is an Amazon tool that allows data scientists to manage an entire machine learning lifecycle. From building and training to deploying ML models. The idea behind this software is to make it easier and less time-consuming to develop high-quality experiments. It’s a web-based tool and comes with the whole toolset designed to help data scientists improve their performance.
- Possibility to track thousands of experiments
- Integration with a wide range of Amazon tools for ML related tasks
- Fully managed
➡️ See the comparison between SageMaker Studio and Neptune.
The last project is an open-source version control system created specifically for machine learning projects. Its aim is to enable data scientists to share the ML models and make them reproducible. DVC user interface can cope with versioning and organization of big amounts of data and store them in a well-organized, accessible way. It focuses on data and pipeline versioning and management but has some (limited) experiment tracking functionalities. It can be easily used with other tools on this list as a complementary tool.
- Adaptable to any language and framework
- Possibility to version large amount of data
- Open-source character
➡️ See the comparison between DVC and Neptune.
Tracking machine learning experiments has always been an important element of the ML development process, however, in the past, it required a lot of effort from data scientists. The tracking tools were limited and thus the process was manual and time-consuming.
For this reason, data scientists and engineers often neglected this part of the machine learning lifecycle or created home-grown solutions. It shouldn’t be the case anymore.
Over the last few years, tools for tracking machine learning experiments have matured a lot and are extremely accessible and easy to use. The apps and platforms we listed today are the best examples. Hopefully, every data scientist finds here the software that will make his or her life easier!
Setting up a Scalable Research Workflow for Medical ML at AILS Labs [Case Study]
8 mins read | Ahmed Gad | Posted June 22, 2021
AILS Labs is a biomedical informatics research group on a mission to make humanity healthier. That mission is to build models which might someday save your heart from illness. It boils down to applying machine learning to predict cardiovascular disease development based on clinical, imaging, and genetics data.
Four full-time and over five part-time team members. Bioinformaticians, physicians, computer scientists, many on track to get PhDs. Serious business.
Although business is probably the wrong term to use because user-facing applications are not on the roadmap yet, research is the primary focus. Research so intense that it required a custom infrastructure (which took about a year to build) to extract features from different types of data:
- Electronic health records (EHR),
- Diagnosis and treatment information (time-to-event regression methods),
- Images (convolutional neural networks),
- Structured data and ECG.
With a fusion of these features, precise machine learning models can solve complex issues. In this case, it’s risk stratification for primary cardiovascular prevention. Essentially, it’s about predicting which patients are most likely to get cardiovascular disease.
AILS Labs has a thorough research process. For every objective, there are seven stages:
- Define the task to be solved (e.g., build a risk model of cardiovascular disease).
- Define the task objective (e.g., define expected experiment results).
- Prepare the dataset.
- Work on the dataset in interactive mode with Jupyter notebooks; quick experimenting, figuring out the best features for both the task and the dataset, coding in R or Python.
- Once the project scales up, use a workflow management system like Snakemake or Prefect to transform the work into a manageable pipeline and make it reproducible. Without that, it would be costly to reproduce the workflow or compare different models.
- Create machine learning models using Pytorch Lightning integrated with Neptune, where some initial evaluations are applied. Log experiment data.
- Finally, evaluate model performance and inspect the effect of using different sets of features and hyperparameters.
5 problems of scaling up Machine Learning research
AILS Labs started as a small group of developers and researchers. One person wrote code, and another reviewed it. Not a lot of experimenting. But collaboration became more challenging, and new problems started to appear along with the inflow of new team members:
- Data privacy,
- Workflow standardization,
- Feature and model selection,
- Experiment management,
- Information logging.