Blog » ML Tools » Best Tools to Do ML Model Monitoring

Best Tools to Do ML Model Monitoring

If you deploy models to production sooner or later, you will start looking for ML model monitoring tools.

When your ML models impact the business (and they should), you just need visibility into “how things work”.

The first moment you really feel this is when things stop working. With no model monitoring set up, you may have no idea what is wrong and where to start looking for problems and solutions. And people want you to fix this asap.

But what do “things” and “work” mean in this context?

Interestingly, depending on the team/problem/pipeline/setup, people mean entirely different things.

One benefit of working at an MLOps company is that you can talk to many ML teams and get this info firsthand. So it turns out that when people say “I want to monitor ML models” they may want to:

  • monitor model performance in production: see how accurate the predictions of your model are. See if the model performance decays over time, and you should re-train it.
  • monitor model input/output distribution: see if the distribution of input data and features that go into the model changed? Has the predicted class distribution changed over time? Those things could be connected to the data and concept drift.
  • monitor model training and re-training: see learning curves, trained model predictions distribution, or confusion matrix during training and re-training.
  • monitor model evaluation and testing: log metrics, charts, prediction, and other metadata for your automated evaluation or testing pipelines
  • monitor hardware metrics: see how much CPU/GPU or Memory your models use during training and inference.
  • monitor CI/CD pipelines for ML: see the evaluations from your CI/CD pipeline jobs and compare them visually. In ML, the metrics often only tell you so much, and someone needs to actually see the results.

Which ML model monitoring did you mean?

Either way, we’ll look into tools that help with all of those use cases.

But first…

How to compare ML model monitoring tools

Obviously, depending on what you want to monitor, your needs will change but there are some things that you should definitely consider before choosing an ML model monitoring tool:

  • ease of integration: how easy is it to connect it to your model training and deployment tools
  • flexibility and expersiveness: can you log and see what you want and how you want it
  • overhead: how much overhead does the logging impose on your model training and deployment infrastructure
  • monitoring functionality: can you monitor data/feature/concept/model drift? Can you compare multiple models that are running at the same time (A/B tests)?
  • alerting: does it provide automated alerts when the performance or input goes crazy?

Ok now, let’s look into the actual model monitoring tools!

ML model monitoring tools

1. Neptune

Model monitoring Nepune

Neptune is a metadata store for MLOps built for research and productions teams that run a lot of experiments.

You can log and display pretty much any ML metadata from metrics and losses, prediction images, hardware metrics to interactive visualizations.

When it comes to monitoring ML models, people mostly use it for:

  • model training, evaluation, testing, 
  • hardware metrics display 
  • but you can (and some teams do) log performance metrics from production jobs and see metadata from ML CI/CD pipelines.

It has a flexible metadata structure that allows you to organize training and production metadata the way you want to. You can think of it as a dictionary or a folder structure that you create in code and display in the UI.

You can build dashboards that display the performance and hardware metrics you want to see to better organize your model monitoring information.

You can compare metrics between models and runs to see how model update changed performance or hardware consumption and whether you should abort live model training because it just won’t beat the baseline.

You can log metadata you want to monitor via easy-to-use API and 25+ integrations with tools from the ML ecosystem.

If you are wondering if it will fit your workflow:

2. Evidently

Evidently is an open-source ML model monitoring system. It helps analyze machine learning models during development, validation, or production monitoring. The tool generates interactive reports from pandas DataFrame

Currently, 6 reports are available:

  1. Data Drift: detects changes in feature distribution
  2. Numerical Target Drift: detects changes in the numerical target and feature behavior
  3. Categorical Target Drift: detects changes in categorical target and feature behavior
  4. Regression Model Performance: analyzes the performance of a regression model and model errors
  5. Classification Model Performance: analyzes the performance and errors of a classification model. Works both for binary and multi-class models
  6. Probabilistic Classification Model Performance: analyzes the performance of a probabilistic classification model, quality of model calibration, and model errors. Works both for binary and multi-class models

3. Qualdo

Qualdo is a Machine Learning model performance monitoring tool in Azure, Google, and AWS. The tool has some nice, basic features that allow you to observe your models throughout their entire lifecycle.

With Qualdo, you can gain insights from production ML input/predictions data, logs and application data to watch and improve your model performance. There’s model deployment and automatic monitoring of data drifts and data anomalies, you can see quality metrics and visualizations.

It also offers tools to monitor ML pipeline performance in Tensorflow and leverages Tensorflow’s data validation and model evaluation capabilities.

Additionally, it integrates with many AI, machine learning, and communication tools to improve your workflow and make collaboration easier.

It’s a rather simple tool and doesn’t offer many advanced features. Hence, it’s best if you’re looking for an easy ML model monitoring performance solution.

4. Fiddler

Fiddler is a model monitoring tool that has a user-friendly, clear, and simple interface. It lets you monitor model performance, explain and debug model predictions, analyze model behavior for entire data and slices, deploy machine learning models at scale, and manage your machine learning models and datasets

Here are Fiddler’s ML model monitoring features:

  • Performance monitoring—a visual way to explore data drift and identify what data is drifting, when it’s drifting, and how it’s drifting
  • Data integrity—to ensure no incorrect data gets into your model and doesn’t negatively impact the end-user experience
  • Tracking outliers—Fiddler shows both Univariate and Multivariate Outliers in the Outlier Detection tab
  • Service metrics—give you basic insights into the operational health of your ML service in the production
  • Alerts—Fiddler allows you to set up alerts for a model or group of models in a project  to warn about issues in production

Overall, it’s a great tool for monitoring machine learning models with all the necessary features.

5. Amazon SageMaker Model Monitor

Amazon SageMaker Model Monitor one of the Amazon SageMaker tools. It automatically detects and alerts on inaccurate predictions from models deployed in production so you can maintain the accuracy of models.

Here’s the summary of SageMaker Model Monitoring features:

  • Customizable data collection and monitoring – you can select the data you want to monitor and analyze without the need to write any code
  • Built-in analysis in the form of statistical rules, to detect drifts in data and model quality
  • You can write custom rules and specify thresholds for each rule. The rules can then be used to analyze model performance
  • Visualization of metrics, and running ad-hoc analysis in a SageMaker notebook instance
  • Model prediction – import your data to compute model performance
  • Schedule monitoring jobs
  • The tool is integrated with Amazon SageMaker Clarify so you can identify potential bias in your ML models

When used with other tools for ML, the SageMaker Model Monitor gives you a full control of your experiments.

👉 See the comparison between Neptune and SageMaker.

6. Seldon Core

Seldon Core is an open-source platform for deploying machine learning models on Kubernetes. It’s an MLOps framework that lets you package, deploy, monitor and manage thousands of production machine learning models.

It runs on any cloud and on-premises, is framework agnostic, supports top ML libraries, toolkits, and languages. Also, it converts your ML models (e.g., Tensorflow, Pytorch, H2o) or language wrappers (Python, Java) into production REST/GRPC microservices.

Basically, Seldon Core has all the necessary functions to scale a high number of ML models. You can expect such features like advanced metrics, outlier detectors, canaries, rich inference graphs made out of predictors, transformers, routers, or combiners, and more.


Now that you know how to evaluate tools for ML model monitoring and what is out there, the best way to go forward is to test out the ones you liked!

If you want to give Neptune a try good next steps are:

You can also continue evaluating tools by checking out this great resource, ml model monitoring tools comparison prepared by the

Either way, happy monitoring!


The Best MLOps Tools and How to Evaluate Them

12 mins read | Jakub Czakon | Updated August 25th, 2021

In one of our articles—The Best Tools, Libraries, Frameworks and Methodologies that Machine Learning Teams Actually Use – Things We Learned from 41 ML Startups—Jean-Christophe Petkovich, CTO at Acerta, explained how their ML team approaches MLOps.

According to him, there are several ingredients for a complete MLOps system:

  • You need to be able to build model artifacts that contain all the information needed to preprocess your data and generate a result. 
  • Once you can build model artifacts, you have to be able to track the code that builds them, and the data they were trained and tested on. 
  • You need to keep track of how all three of these things, the models, their code, and their data, are related. 
  • Once you can track all these things, you can also mark them ready for staging, and production, and run them through a CI/CD process. 
  • Finally, to actually deploy them at the end of that process, you need some way to spin up a service based on that model artifact. 

It’s a great high-level summary of how to successfully implement MLOps in a company. But understanding what is needed in high-level is just a part of the puzzle. The other one is adopting or creating proper tooling that gets things done. 

That’s why we’ve compiled a list of the best MLOps tools. We’ve divided them into six categories so you can choose the right tools for your team and for your business. Let’s dig in!

Continue reading ->

A Complete Guide to Monitoring ML Experiments Live in Neptune

Read more

How to Monitor Machine Learning and Deep Learning Experiments

Read more
Experiment tracking Experiment management

15 Best Tools for ML Experiment Tracking and Management

Read more
GreenSteam MLOps toolstack

MLOps at GreenSteam: Shipping Machine Learning [Case Study]

Read more