DataRobot is an enterprise AI platform that provides tools for building, managing, and deploying models at scale for businesses, and automating end-to-end ML lifecycles. It contains different kinds of models, data, latest open-source algorithms that can be used on-premise, or as a fully managed AI service, DataRobot gives you the power of AI to drive better business outcomes. One important component in DataRobot is its model registry.
The DataRobot Model Registry is the central organizational hub for the variety of models used on the DataRobot platform. It houses each model and registers them as a deployment-ready model package.
DataRobot model registry offers the following:
- A central hub to register all your models regardless of their origin or deployment location.
- It processes each package functions the same way, regardless of the origin of its model.
- It provides a unified deployment experience for your machine learning models across the entire enterprise. You can also deploy models within DataRobot to external sources.
- DataRobot model registry has a Custom Model Workshop, where you can create and deploy custom models.
- You can also create custom model inferences and add external model packages for use in the model registry.
- DataRobot provides a leaderboard for comparing all the models in the model registry. This leaderboard also provides explainability on the features and techniques used during model training and allows to learn more details about how much data it trained on, and its overall accuracy scores.
The main aim of the DataRobot Model Registry is to store all model metadata needed for reproducibility and deployment across the business enterprise. In turn, this would allow data science teams to easily find all model-related metadata whenever they need them.
However, the DataRobot model registry might not be the best option for your team, and here’s why:
- The DataRobot Model Registry is one of the components of the DataRobot platform, so to use it your team has to move the entire machine learning infrastructure to the DataRobot platform.
- The API integrations of DataRobot with other frameworks and libraries are not so easy to use.
- DataRobot only stores deployment-ready models, i.e. experimental models that require continuous training are not stored.
- When it comes to data handling, DataRobot handles only small data unless you’re using the enterprise version. As you know, every model is as good as the amount of data it is being trained on.
- Also, data cannot be edited within DataRobot. i.e No data cleansing or ETL capabilities. The data has to be edited outside the system and then be input again.
- You cannot export the underlying code used in DataRobot.
- DataRobot is less suitable for unsupervised learning and scenarios where enough data is not available.
- On DataRobot there are constraints to how many models can be run at the same time. Not more than two per time.
So it is imperative that we look at some alternative tools for model registry.
Neptune is a metadata store for MLOps, built for research and production teams that run a lot of experiments. It provides a central store to log, store, display, organize, compare, and query all metadata generated during the machine learning lifecycle.
- It supports and stores many ML model-related metadata types and you can version, display, and query most metadata generated during model building.
- Neptune model registry helps you store and log metadata on your models, dataset, environment configuration versions, hyperparameters, and model metrics.
- From the Neptune dashboard you can:
- search based on metric and parameter values,
- find runs with user-defined tags,
- group model training runs based on the parameter or tag values.
- Neptune versions, and stores all your model training runs in its central registry repository.
- Neptune versions and stores all types of model metadata such as pkl files, data metadata, images, data, charts, etc.
- Neptune allows you to log model information via the UI or the client library.
- Combined with its model registry capability, Neptune allows you to track changes in models metadata.
- You can make comparisons between model training runs and get insights into the differences between them.
- Neptune has integrations with ML frameworks and libraries including PyTorch, TensorFlow, Keras, XGBoost, etc.
- Neptune model registry allows collaboration for ML teams as it gives access for teams to explore all the model-related metadata from its central store.
Neptune provides pricing plans for Cloud and your private infrastructure. There are a few plans available:
- Individual: Free (+ usage above free quota)
- Academia: Free
- Team: Paid
Check out Neptune’s pricing to learn more.
2. SageMaker Model Registry
SageMaker is a fully managed tool that can be used for every stage of ML development, including a model registry. It helps users build, debug, deploy, and basically everything one needs to adequately monitor ML models all within a unified visual interface.
SageMaker Model Registry provides the following:
- A searchable centralized store for models and model-related metadata for reproducibility.
- Versioning and tracking for every experimental training run.
- Deploying a model from the model registry into an existing Sagemaker endpoint.
Sagemaker Model Registry allows you to connect to Amazon SageMaker Model Monitor to continuously monitor the quality of your machine learning model in real-time after deployment.
- Sagemaker has a system that helps to catalog models for production. Sorting and storing them according to model type and content.
- In the Sagemaker model registry, models can be quickly moved to production. It has Automated model deployment with CI/CD.
- Sage maker allows you to switch from one hardware configuration to another with its ability to provide and manage the hardware infrastructure of your model’s environment
- Jupyter notebooks are easily created and shared.
- It has over 150 preset models available for several cases.
- Model groups can be created to monitor models that have already been trained to solve certain problems. These models can then be stored and made into new model versions.
- SageMaker Studio supports several other frameworks such as Tensorflow, PyTorch, MXNet, etc
With Sagemaker, it is pay-per-use. And there are two available payment options:
- On-demand pricing: This is billed by the second, with no minimum fees or upfront commitments.
- The SageMaker savings plan: This offers a flexible, usage-based pricing model in exchange for a commitment to a consistent amount of usage.
3. MLflow Model Registry
MLflow model registry is a self-sufficient system that comprises a UI, set of APIs, concepts that enable the efficient management of the life cycle of MLflow models. Stage transitions, model versioning, and annotations are included. MLflow model registry stores model artifacts, Metadata, Parameters, and Metrics.
- MLflow Model Registry works hand in hand with the MLflow tracking component. This allows you to backtrack to the original source, from which the model and data artifacts were generated, including its source code version, thereby producing a thorough lineage of the lifecycle for all models and data transformation.
- MLflow Model Registry comprises APIs and smart UI that enable easy registration and sharing of new model versions. And of course conduct management of the life cycle on existing models.
- MLFlow has the ability to reproduce experiments and reports/results.
- With MLFlow you can Instantly version the data stored in the data lake as it is stored into the Delta table or directory.
- Each model version has an allocated preset. For example, “Production” depicts the life cycle of a model.
It is free.
You may like
Comet is a machine learning, cloud-based experiment management platform. Comet model registry provides a system for logging, registering, versioning, and deploying machine learning models, via its Python SDK Experiment, as well as registering, versioning, and deploying them.
Comet simplifies the process of documenting the history of the experiments and model versions.
Model training runs are registered in Comet via:
- Comet.ml Experiment’s Asset Tab User Interface
- Programmatically through the Comet Python SDK
- Comet is compatible with the majority of platforms and machine learning libraries.
- Comet makes it easy to compare experiments, by making code, hyperparameters, metrics, and dependencies in one user interface, accessible to you.
- The automated notification system enables users to constantly monitor the model’s performance. Thereby, boosting the overall quality of the project.
- With its in-built features for visualization and reporting, comet allows for both individual developers and professional teams to communicate, as well as run and track experiments as they occur.
Comet offers the following pricing plans:
- Individual: Free (+ usage above free quota)
- Academia: Free
- Team: Paid
You can read about their pricing in detail here.
Verta is an AI/ML model management and operations tool with model registry features, where you can manage and deploy your machine learning models in a central space.
Verta enables you to streamline your data science and ML workflows. It also facilitates faster deployment of models into production, while ensuring real-time model health.
- Verta model registry has a central base for the publishing of release-ready models.
- Verta model registry is connected to the experiment component making it easy to reuse trained models.
- Verta model registry helps you with managing model governance.
- Verta works well on Docker and Kubernetes. Can be integrated with several ML tools like Tensorflow, PyTorch, Spark.ml, and R.
- Verta can be easily integrated with CI/CD pipelines such as Jenkins, Chef, and GitOps.
- Verta is also known for its Git-like environment. Making it very convenient and familiar to experienced Git users (and other developers).
Verta offers the following pricing plans:
- Open-source: Free
- SaaS: Paid
- Enterprise: Paid
You can read about their pricing in detail here.
Dataiku provides a central solution for the design, deployment, and management of ML models for businesses. It provides self-service analytics and collaboration for teams for efficient work.
When models are designed and trained on Dataiku DSS, they are stored in Lab, which poses as its model registry.
Dataiku model registry is the central store for models in Dataiku. It allows the storage of external models via MLflow. Dataiku Model registry provides a central point to:
- Explore Models and their versions.
- Track the state of a model version
- Track simple input drifts over time.
- Store model parameters and data lineage.
- Dataiku provides a dashboard for monitoring model operations and training.
- Dataiku provides both model and data lineage.
- Dataiku provides a set of interactive features for data cleaning, analysis, and validation.
- It provides end-to-end management for models from model building to deployment.
- Data DSS is data agnostic.
- Dataiku has a robust deployment API that ships models to production.
- Data flow is very easy to visualize. Data models can be represented in picture and table form, as opposed to having different files everywhere.
- Freedom for you to choose your database format, available options include SQL, NoSQL, and HADOOP.
- Dataiku integrates with frameworks like Tensorflow, Keras, XGboost, and external libraries( H2O, Data, etc) using its API.
Dataiku offers services for both Online or Installed on-premise or on your Cloud stack. It provides a 14-day trial version after that it starts at $499/ month. Read more about the pricing here.
Comparison of DataRobot alternatives for Model Registry
Open table in new window
The DataRobot model registry is a good component in the DataRobot platform that promotes reproducibility and collaboration within teams but as stated above it might not always be the best fit for your team.
Alternative model registry tools like Neptune, Sagemaker Model registry, Dataiku, MLflow would allow ML teams to perform better model registry and governance.
Continuum Industries Case Study: How to Track, Monitor & Visualize CI/CD Pipelines
7 mins read | Updated August 9th, 2021
Continuum Industries is a company in the infrastructure industry that wants to automate and optimize the design of linear infrastructure assets like water pipelines, overhead transmission lines, subsea power lines, or telecommunication cables.
Its core product Optioneer lets customers input the engineering design assumptions and the geospatial data and uses evolutionary optimization algorithms to find possible solutions to connect point A to B given the constraints.
As Chief Scientist Andreas Malekos, who works on the Optioneer AI-powered engine, explains:
“Building something like a power line is a huge project, so you have to get the design right before you start. The more reasonable designs you see, the better decision you can make. Optioneer can get you design assets in minutes at a fraction of the cost of traditional design methods.”
But creating and operating the Optioneer engine is more challenging than it seems:
- The objective function does not represent reality
- There are a lot of assumptions that civil engineers don’t know in advance
- Different customers feed it completely different problems, and the algorithm needs to be robust enough to handle those
Instead of building the perfect solution, it’s better to present them with a list of interesting design options so that they can make informed decisions.
The engine team leverages a diverse skillset from mechanical engineering, electrical engineering, computational physics, applied mathematics, and software engineering to pull this off.
A side effect of building a successful software product, whether it uses AI or not, is that people rely on it working. And when people rely on your optimization engine with million-dollar infrastructure design decisions, you need to have a robust quality assurance (QA) in place.
As Andreas pointed out, they have to be able to say that the solutions they return to the users are:
- Good, meaning that it is a result that a civil engineer can look at and agree with
- Correct, meaning that all the different engineering quantities that are calculated and returned to the end-user are as accurate as possible
On top of that, the team is constantly working on improving the optimization engine. But to do that, you have to make sure that the changes:
- Don’t break the algorithm in some way or another
- They actually improve the results not just on one infrastructure problem but across the board
Basically, you need to set up a proper validation and testing, but the nature of the problem the team is trying to solve presents additional challenges:
- You cannot automatically tell whether an algorithm output is correct or not. It is not like in ML where you have labeled data to compute accuracy or recall on your evaluation set.
- You need a set of example problems that is representative of the kind of problem that the algorithm will be asked to solve in production. Furthermore, these problems need to be versioned so that repeatability is as easily achievable as possible.