The phrase “Every model is wrong but some are useful” is especially true in Machine Learning. When developing machine learning models you should always understand where it works as expected and where it fails miserably.
There are many methods that you can use to get that understanding:
- Look at evaluation metrics (also you should know how to choose an evaluation metric for your problem)
- Look at performance charts like ROC, Lift Curve, Confusion Matrix, and others
- Look at learning curves to estimate overfitting
- Look at model predictions on best/worst cases
- Look how resource-intensive is model training and inference (they translate to serious costs and will be crucial to the business side of things)
Once you get some decent understanding for one model you are good, right? Wrong 🙂
Typically, you need to do some or a lot of experimenting with model improvement ideas and visualizing differences between various experiments become crucial.
You can do all of those (or most of those) yourself but today there are tools that you can use. If you’re looking for the best tools that will help you visualize, organize, and gather data, you’re in the right place.
Neptune is a Metadata store for MLOps for research and production teams that run a lot of experiments. It offers an open-source library that lets users log metadata generated during the model development process whether by executing scripts (Python, R, other) or notebooks (local, Google Colab, AWS SageMaker).
Projects in Neptune can have multiple members with different roles (viewer, contributor, admin), so all machine learning experiments that land in Neptune can be viewed, shared, and discussed by every team member.
Neptune is meant to provide an easy way to store, organize, display, and compare all metadata generated during the model development process.
Neptune – summary:
- Log model predictions
- Log losses and metrics
- Log artifacts(data version, model binaries)
- Log git information, code, or notebook checkpoints
- Log hardware utilization
- Log error analysis in notebooks after the training has been completed
- Log performance visualizations like ROC curve or Confusion matrix (during or after training) or anything else (and it becomes interactive) ‘metric_charts’ has nice example charts
- Log interactive visualizations from Altair, Bokeh, Plotly, or other HTML objects
- Compare hyperparameters and metrics across many runs with an intelligent compare table that highlights what was different.
Weights & Biases a.k.a. WandB is focused on deep learning. Users can track experiments to the application with Python library, and – as a team – can see each other’s experiments.
WandB is a hosted service allowing you to backup all experiments in a single place and work on a project with the team – work sharing features are there to use.
In the WandB users can log and analyze multiple data types.
Weights & Biases – summary:
- Monitor training runs information like loss, accuracy (learning curves)
- View histograms of weights and biases (no pun intended), or gradients
- Log rich objects like, charts, video, audio or interactive charts during training
- Use various comparison tools like tables showing auto-diffs, parallel coordinates plot and others
- Interactive prediction bounding box visualization for object detection models
- Interactive prediction masks visualization for semantic segmentation models
Comet is a meta machine learning platform for tracking, comparing, explaining, and optimizing experiments and models.
Just like many other tools like Neptune (neptune-client specifically) or WandB, Comet provides you with an open source Python library to allow data scientists to integrate their code with Comet and start tracking work in the application.
As it’s offered both cloud-hosted and self-hosted, users can have team projects and save a backup of experimentation history.
Comet is converging towards more automated approaches to ML, by predictive early stopping (not available with the free version of the software) and Neural architecture search (in the future).
Comet.ml – summary:
- Visualize samples with dedicated modules for vision, audio, text and tabular data to detect overfitting and easily identify issues with your dataset
- You can customize and combine your visualizations
- You can monitor your learning curves
- Comet’s flexible experiments and visualization suite allow you to record, compare, and visualize many artifact types
TensorBoard provides the visualization and tooling needed for machine learning experimentation. It’s open-source and offers a suite of tools for visualization and debugging of machine learning models. TensorBoard is the most popular solution on the market and thus it’s widely integrated with many other tools and applications.
What’s more, it has an extensive network of engineers using this software and sharing their experience and ideas. This makes a powerful community ready to solve any problem. The software, itself, however, is best suited for an individual user.
TensorBoard – summary:
- Tracking and visualizing metrics such as loss and accuracy
- Visualizing the model graph (ops and layers)
- Viewing histograms of weights, biases, or other tensors as they change over time
- Projecting embeddings to a lower-dimensional space
- Displaying images, text, and audio data
- Profiling TensorFlow programs
Visdom is a tool for flexible creating, organizing, and sharing visualizations of live, rich data. It supports Torch and Numpy.
Visdom facilitates visualization of remote data with an emphasis on supporting scientific experimentation and has a simple set of features that can be composed for various use-cases.
Visdom allows you to reflect results of statistical calculations and share them with other people, conveniently test, view, and experiment since all your results are presented in the interactive form.
A slight disadvantage may be the fact that there is no easy way to access the data, and to compare consecutive runs.
Visdom – summary:
- It helps to interactively visualize any data (including remote machine model training)
- It contains a ton of visualization atomics. In the context of machine learning models the most useful are: line plots, histograms, scatter plots, images, matplotlib figures, audio, videos, html objects but there is a ton to choose from
- Various visualization elements can be combined into a dashboard of visualizations
- It can be easily shared with your team or collaborators
- Since you have full customizability you can create your own favourite deep learning dashboard -> as explained here
Hiplot is a straightforward interactive visualization tool to help AI researchers discover correlations and patterns in high-dimensional data. It uses parallel plots and other graphical ways to represent information more clearly.
HiPlot can be run quickly from a Jupyter notebook with no setup required. The tool enables machine learning (ML) researchers to more easily evaluate the influence of their hyperparameters, such as learning rate, regularizations, and architecture. It can also be used by researchers in other fields, so they can observe and analyze correlations in data relevant to their work.
HiPlot – summary:
- Creates an interactive parallel plot visualization to easily explore various hyperparameter-metric interactions
- Based on selection on the parallel plot the experiment table is updated automatically
- It’s super lightweight and can be used inside notebooks or as a standalone webserver
Machine learning model visualization tools are so important because a visual summary of your ML or deep learning models makes it easier to identify trends and patterns, understand connections, and interact with your data.
I hope you found what you were looking for and can now improve your experiments.
15 Best Tools for ML Experiment Tracking and Management
10 mins read | Author Patrycja Jenkner | Updated August 25th, 2021
While working on a machine learning project, getting good results from a single model-training run is one thing. But keeping all of your machine learning experiments well organized and having a process that lets you draw valid conclusions from them is quite another.
The answer to these needs is experiment tracking. In machine learning, experiment tracking is the process of saving all experiment-related information that you care about for every experiment you run.
ML teams implement experiment tracking in different ways, may it be by using spreadsheets, GitHub, or self-built platforms. Yet, the most effective option is to do it with tools designed specifically for tracking and managing ML experiments.
In this article, we overview and compare the 15 best tools that will allow you to track and manage your ML experiments. You’ll get to know their main features and see how they are different from each other. Hopefully, this will help you evaluate them and choose the right one for your needs.
How to evaluate an experiment tracking tool?
There’s no one answer to the question “what is the best experiment tracking tool?”. Your motivation and needs may be completely different when you work individually or in a team. And, depending on your role, you may be looking for various functionalities.
If you’re a Data Scientist or a Researcher, you should consider:
- If the tool comes with a web UI or it’s console-based;
- If you can integrate the tool with your preferred model training frameworks;
- What metadata you can log, display, and compare (code, text, audio, video, etc.);
- Can you easily compare multiple runs? If so, in what format – only table, or also charts;
- If organizing and searching through experiments is user-friendly;
- If you can customize metadata structure and dashboards;
- If the tool lets you track hardware consumption;
- How easy it is to collaborate with other team members – can you just share a link to the experiment or you have to use screenshots as a workaround?
As an ML Engineer, you should check if the tool lets you:
- Easily reproduce and re-run experiments;
- Track and search through experiment lineage (data/models/experiments used downstream);
- Save, fetch, and cache datasets for experiments;
- Integrate it with your CI/CD pipeline;
- Easily collaborate and share work with your colleagues.
Finally, as an ML team lead, you’ll be interested in:
- General business-related stuff like pricing model, security, and support;
- How much infrastructure the tool requires, how easy it is to integrate it into your current workflow;
- Is the product delivered as commercial software, open-source software, or a managed cloud service?
- What collaboration, sharing, and review feature it has.
I made sure to keep these motivations in mind when reviewing the tools that are on the market. So let’s take a closer look at them.Continue reading ->