Why should you monitor your model? There are many reasons. It can help you understand the accuracy of your predictions, prevent prediction errors, and tweak your models to perfect them.
Overall, ML model monitoring is necessary for making your model successful. One of the easiest ways to ensure things work smoothly is to use ML model monitoring tools.
They’ll allow you to automate work and optimize smallish processes. It’s often possible to simultaneously run two models to test their performance, see the connection between your model and the input data, and perform advanced tests.
Dedicated tools can also be used to collaborate with your team, share your work with other people—it’s a shared space for teams to collaborate, participate in model creation and further monitoring. It’s easier to exchange ideas, thoughts and observations, and spot errors when you have real-time insight into what’s happening with your models.
Here are the best tools that’ll help you do ML model monitoring as best as possible. Let’s dive in!
Neptune is a lightweight experiment management tool that helps you keep track of your machine learning experiments and manage all your model metadata. It is very flexible, works with many other frameworks, and thanks to its stable user interface, it enables great scalability.
Here’s what Neptune offers to monitor your ML models:
- Fast and beautiful UI with a lot of capabilities to organize runs in groups, save custom dashboard views and share them with the team
- Version, store, organize, and query models, and model development metadata including dataset, code, env config versions, parameters and evaluation metrics, model binaries, description, and other details
- Filter, sort, and group model training runs in a dashboard to better organize your work
- Compare metrics and parameters in a table that automatically finds what changed between runs and what are the anomalies
- Automatically record the code, environment, parameters, model binaries, and evaluation metrics every time you run an experiment
- Your team can track experiments that are executed in scripts (Python, R, other), notebooks (local, Google Colab, AWS SageMaker) and do that on any infrastructure (cloud, laptop, cluster)
- Extensive experiment tracking and visualization capabilities (resource consumption, scrolling through lists of images)
Neptune is a robust software that lets you store all your data in one place, easily collaborate, and flexibly experiment with your models.
Qualdo is a Machine Learning model performance monitoring tool in Azure, Google, and AWS. The tool has some nice, basic features that allow you to observe your models throughout their entire lifecycle.
With Qualdo, you can gain insights from production ML input/predictions data, logs and application data to watch and improve your model performance. There’s model deployment and automatic monitoring of data drifts and data anomalies, you can see quality metrics and visualizations.
It also offers tools to monitor ML pipeline performance in Tensorflow and leverages Tensorflow’s data validation and model evaluation capabilities.
Additionally, it integrates with many AI, machine learning, and communication tools to improve your workflow and make collaboration easier.
It’s a rather simple tool and doesn’t offer many advanced features. Hence, it’s best if you’re looking for an easy ML model monitoring performance solution.
Fiddler is a model monitoring tool that has a user-friendly, clear, and simple interface. It lets you monitor model performance, explain and debug model predictions, analyze model behavior for entire data and slices, deploy machine learning models at scale, and manage your machine learning models and datasets
Here are Fiddler’s ML model monitoring features:
- Performance monitoring—a visual way to explore data drift and identify what data is drifting, when it’s drifting, and how it’s drifting
- Data integrity—to ensure no incorrect data gets into your model and doesn’t negatively impact the end-user experience
- Tracking outliers—Fiddler shows both Univariate and Multivariate Outliers in the Outlier Detection tab
- Service metrics—give you basic insights into the operational health of your ML service in the production
- Alerts—Fiddler allows you to set up alerts for a model or group of models in a project to warn about issues in production
Overall, it’s a great tool for monitoring machine learning models with all the necessary features.
MLWatcher is a Python agent designed by Anodot to close a critical visibility gap in the ML development process. The tool helps to monitor running ML algorithms in real-time. It records a large variety of time-serie metrics of your running ML classification algorithm.
It enables you to monitor in real-time the following:
- Predictions: monitor the repartition of classes, the distribution of the predict_proba_matrix values, anomalies in your predictions
- Features: monitor concept drift, anomalies in your data
- Labels: monitor accuracy, precision, recall, f1 of your predictions vs labels if applicable
With MLWatcher metrics you can alert on concept drift, analyze the performance of the model, check if your model is numerically stable, canary process new models, and, if labels are available, analyze the evolution of the classic ML metrics and correlate with other time series (features, predictions).
Evidently is an open-source ML model monitoring system. It helps analyze machine learning models during development, validation, or production monitoring. The tool generates interactive reports from pandas DataFrame.
Currently, 6 reports are available:
- Data Drift: detects changes in feature distribution
- Numerical Target Drift: detects changes in the numerical target and feature behavior
- Categorical Target Drift: detects changes in categorical target and feature behavior
- Regression Model Performance: analyzes the performance of a regression model and model errors
- Classification Model Performance: analyzes the performance and errors of a classification model. Works both for binary and multi-class models
- Probabilistic Classification Model Performance: analyzes the performance of a probabilistic classification model, quality of model calibration, and model errors. Works both for binary and multi-class models
Amazon SageMaker Model Monitor one of the Amazon SageMaker tools. It automatically detects and alerts on inaccurate predictions from models deployed in production so you can maintain the accuracy of models.
Here’s the summary of SageMaker Model Monitoring features:
- Customizable data collection and monitoring – you can select the data you want to monitor and analyze without the need to write any code
- Built-in analysis in the form of statistical rules, to detect drifts in data and model quality
- You can write custom rules and specify thresholds for each rule. The rules can then be used to analyze model performance
- Visualization of metrics, and running ad-hoc analysis in a SageMaker notebook instance
- Model prediction – import your data to compute model performance
- Schedule monitoring jobs
- The tool is integrated with Amazon SageMaker Clarify so you can identify potential bias in your ML models
When used with other tools for ML, the SageMaker Model Monitor gives you a full control of your experiments.
👉 See the comparison between Neptune and SageMaker.
7. Seldon Core
Seldon Core is an open-source platform for deploying machine learning models on Kubernetes. It’s an MLOps framework that lets you package, deploy, monitor and manage thousands of production machine learning models.
It runs on any cloud and on-premises, is framework agnostic, supports top ML libraries, toolkits, and languages. Also, it converts your ML models (e.g., Tensorflow, Pytorch, H2o) or language wrappers (Python, Java) into production REST/GRPC microservices.
Basically, Seldon Core has all the necessary functions to scale a high number of ML models. You can expect such features like advanced metrics, outlier detectors, canaries, rich inference graphs made out of predictors, transformers, routers, or combiners, and more
KFServing provides a Kubernetes Custom Resource Definition (CRD) for serving machine learning models on arbitrary frameworks. It aims to solve production model serving use cases by providing performant, high abstraction interfaces for common ML frameworks like Tensorflow, XGBoost, ScikitLearn, PyTorch, and ONNX.
The tool provides a serverless machine learning inference solution that allows a consistent and simple interface to deploy your models.
Main features of KFServing:
- Provides a simple, pluggable, and complete story for your production ML inference server by enabling prediction, pre-processing, post-processing and explainability
- Customizable InferenceService to add your resource requests for CPU, GPU, TPU and memory requests and limits
- Batching individual model inference requests
- Traffic management
- Revision management
- Request/Response logging
- Scalable Multi-Model Serving
Best Tools To Do ML Model Serving
As you can see, every tool is distinct, even though they share the same ‘theme’—effective model monitoring. They all have slightly different functionalities with different purposes.
No matter which one you choose, it’ll help you squeeze as much of your models as possible and monitor them so you can get the best out of them.
A Complete Guide to Monitoring ML Experiments Live in Neptune
Jakub Czakon | Posted July 21, 2020
Training machine learning or deep learning models can take a really long time.
If you are like me, you like to know what is happening during that time:
- want to monitor your training and validation losses,
- take a look at the GPU consumption,
- see image predictions after every other epoch
- and a bunch of other things.
Neptune lets you do all that, and in this post I will show you how to make it happen. Step by step.
Check out this example run monitoring experiment to see how this can look like.Continue reading ->