If you deploy models to production sooner or later, you will start looking for ML model monitoring tools.
When your ML models impact the business (and they should), you just need visibility into “how things work”.
The first moment you really feel this is when things stop working. With no model monitoring set up, you may have no idea what is wrong and where to start looking for problems and solutions. And people want you to fix this asap.
But what do “things” and “work” mean in this context?
Interestingly, depending on the team/problem/pipeline/setup, people mean entirely different things.
One benefit of working at an MLOps company is that you can talk to many ML teams and get this info firsthand. So it turns out that when people say “I want to monitor ML models” they may want to:
- monitor model performance in production: see how accurate the predictions of your model are. See if the model performance decays over time, and you should re-train it.
- monitor model input/output distribution: see if the distribution of input data and features that go into the model changed? Has the predicted class distribution changed over time? Those things could be connected to the data and concept drift.
- monitor model training and re-training: see learning curves, trained model predictions distribution, or confusion matrix during training and re-training.
- monitor model evaluation and testing: log metrics, charts, prediction, and other metadata for your automated evaluation or testing pipelines
- monitor hardware metrics: see how much CPU/GPU or Memory your models use during training and inference.
- monitor CI/CD pipelines for ML: see the evaluations from your CI/CD pipeline jobs and compare them visually. In ML, the metrics often only tell you so much, and someone needs to actually see the results.
Which ML model monitoring did you mean?
Either way, we’ll look into tools that help with all of those use cases.
How to compare ML model monitoring tools
Obviously, depending on what you want to monitor, your needs will change but there are some things that you should definitely consider before choosing an ML model monitoring tool:
- ease of integration: how easy is it to connect it to your model training and deployment tools
- flexibility and expersiveness: can you log and see what you want and how you want it
- overhead: how much overhead does the logging impose on your model training and deployment infrastructure
- monitoring functionality: can you monitor data/feature/concept/model drift? Can you compare multiple models that are running at the same time (A/B tests)?
- alerting: does it provide automated alerts when the performance or input goes crazy?
Ok now, let’s look into the actual model monitoring tools!
ML model monitoring tools
Neptune is a metadata store for MLOps built for research and productions teams that run a lot of experiments.
You can log and display pretty much any ML metadata from metrics and losses, prediction images, hardware metrics to interactive visualizations.
When it comes to monitoring ML models, people mostly use it for:
- model training, evaluation, testing,
- hardware metrics display
- but you can (and some teams do) log performance metrics from production jobs and see metadata from ML CI/CD pipelines.
It has a flexible metadata structure that allows you to organize training and production metadata the way you want to. You can think of it as a dictionary or a folder structure that you create in code and display in the UI.
You can build dashboards that display the performance and hardware metrics you want to see to better organize your model monitoring information.
You can compare metrics between models and runs to see how model update changed performance or hardware consumption and whether you should abort live model training because it just won’t beat the baseline.
If you are wondering if it will fit your workflow:
- check out case studies of how people set up their MLOps tool stack with Neptune
- explore an example public project
- run a model monitoring example in Colab and see for yourself
Evidently is an open-source ML model monitoring system. It helps analyze machine learning models during development, validation, or production monitoring. The tool generates interactive reports from pandas DataFrame.
Currently, 6 reports are available:
- Data Drift: detects changes in feature distribution
- Numerical Target Drift: detects changes in the numerical target and feature behavior
- Categorical Target Drift: detects changes in categorical target and feature behavior
- Regression Model Performance: analyzes the performance of a regression model and model errors
- Classification Model Performance: analyzes the performance and errors of a classification model. Works both for binary and multi-class models
- Probabilistic Classification Model Performance: analyzes the performance of a probabilistic classification model, quality of model calibration, and model errors. Works both for binary and multi-class models
Qualdo is a Machine Learning model performance monitoring tool in Azure, Google, and AWS. The tool has some nice, basic features that allow you to observe your models throughout their entire lifecycle.
With Qualdo, you can gain insights from production ML input/predictions data, logs and application data to watch and improve your model performance. There’s model deployment and automatic monitoring of data drifts and data anomalies, you can see quality metrics and visualizations.
It also offers tools to monitor ML pipeline performance in Tensorflow and leverages Tensorflow’s data validation and model evaluation capabilities.
Additionally, it integrates with many AI, machine learning, and communication tools to improve your workflow and make collaboration easier.
It’s a rather simple tool and doesn’t offer many advanced features. Hence, it’s best if you’re looking for an easy ML model monitoring performance solution.
Fiddler is a model monitoring tool that has a user-friendly, clear, and simple interface. It lets you monitor model performance, explain and debug model predictions, analyze model behavior for entire data and slices, deploy machine learning models at scale, and manage your machine learning models and datasets
Here are Fiddler’s ML model monitoring features:
- Performance monitoring—a visual way to explore data drift and identify what data is drifting, when it’s drifting, and how it’s drifting
- Data integrity—to ensure no incorrect data gets into your model and doesn’t negatively impact the end-user experience
- Tracking outliers—Fiddler shows both Univariate and Multivariate Outliers in the Outlier Detection tab
- Service metrics—give you basic insights into the operational health of your ML service in the production
- Alerts—Fiddler allows you to set up alerts for a model or group of models in a project to warn about issues in production
Overall, it’s a great tool for monitoring machine learning models with all the necessary features.
Amazon SageMaker Model Monitor one of the Amazon SageMaker tools. It automatically detects and alerts on inaccurate predictions from models deployed in production so you can maintain the accuracy of models.
Here’s the summary of SageMaker Model Monitoring features:
- Customizable data collection and monitoring – you can select the data you want to monitor and analyze without the need to write any code
- Built-in analysis in the form of statistical rules, to detect drifts in data and model quality
- You can write custom rules and specify thresholds for each rule. The rules can then be used to analyze model performance
- Visualization of metrics, and running ad-hoc analysis in a SageMaker notebook instance
- Model prediction – import your data to compute model performance
- Schedule monitoring jobs
- The tool is integrated with Amazon SageMaker Clarify so you can identify potential bias in your ML models
When used with other tools for ML, the SageMaker Model Monitor gives you a full control of your experiments.
👉 See the comparison between Neptune and SageMaker.
6. Seldon Core
Seldon Core is an open-source platform for deploying machine learning models on Kubernetes. It’s an MLOps framework that lets you package, deploy, monitor and manage thousands of production machine learning models.
It runs on any cloud and on-premises, is framework agnostic, supports top ML libraries, toolkits, and languages. Also, it converts your ML models (e.g., Tensorflow, Pytorch, H2o) or language wrappers (Python, Java) into production REST/GRPC microservices.
Basically, Seldon Core has all the necessary functions to scale a high number of ML models. You can expect such features like advanced metrics, outlier detectors, canaries, rich inference graphs made out of predictors, transformers, routers, or combiners, and more.
Now that you know how to evaluate tools for ML model monitoring and what is out there, the best way to go forward is to test out the ones you liked!
If you want to give Neptune a try good next steps are:
You can also continue evaluating tools by checking out this great resource, ml model monitoring tools comparison prepared by the mlops.community.
Either way, happy monitoring!
The Best MLOps Tools and How to Evaluate Them
12 mins read | Jakub Czakon | Updated August 25th, 2021
In one of our articles—The Best Tools, Libraries, Frameworks and Methodologies that Machine Learning Teams Actually Use – Things We Learned from 41 ML Startups—Jean-Christophe Petkovich, CTO at Acerta, explained how their ML team approaches MLOps.
According to him, there are several ingredients for a complete MLOps system:
- You need to be able to build model artifacts that contain all the information needed to preprocess your data and generate a result.
- Once you can build model artifacts, you have to be able to track the code that builds them, and the data they were trained and tested on.
- You need to keep track of how all three of these things, the models, their code, and their data, are related.
- Once you can track all these things, you can also mark them ready for staging, and production, and run them through a CI/CD process.
- Finally, to actually deploy them at the end of that process, you need some way to spin up a service based on that model artifact.
It’s a great high-level summary of how to successfully implement MLOps in a company. But understanding what is needed in high-level is just a part of the puzzle. The other one is adopting or creating proper tooling that gets things done.
That’s why we’ve compiled a list of the best MLOps tools. We’ve divided them into six categories so you can choose the right tools for your team and for your business. Let’s dig in!Continue reading ->