MLOps Blog

Model Registry Makes MLOps Work: Here’s Why

6 min
Harshil Patel
19th April, 2023

Model Registry is a part of the machine learning lifecycle or MLOps. It is a service that manages multiple model artifacts, tracks, and governs models at different stages of the ML lifecycle. Model registry is a collaborative hub where teams can work together at different stages of the machine learning lifecycle, starting from the experimentation phase to the production phase. It makes approval, governance, and monitoring seamless and improves workflow performance. It helps you manage the full development life cycle of an ML Model, and standardize deployment. In the next few years, 75% of companies/startups are planning to go from pilot to operations for their machine learning projects. So, when it comes to production we sometimes use ML Tools that are completely unfit for data scientists or ML workflows. So, the Model Registry is a kind of a linchpin to make MLops work. We’ll explore a few platforms and the key features of Model registry. 

What is a Model Registry?

The Model Registry is a system that allows machine learning engineers and data scientists to publish, test, monitor, govern and share them for collaboration with other teams. Essentially, the model registry is used when you’re done with your experimentation phase, and ready to share with the team and stakeholders.

Model registry overview
Source: Author

Why we need Model Registry 

Let’s say you’ve spent a lot of resources developing an ML algorithm that works well and has good potential to impact your business outcome. Rolling out ML to production is painfully slow, companies take one, sometimes two years to release a single model. What we lack is transparency and a way to collaborate with other team members. So, what if you had a Central Repo for staging all your production-ready models? That would streamline the entire production and workflow process. With a model registry, you can ensure that all the key values (including data, configurations, environment variables, model code, versions, and docs) are in one place, where everyone has access. 

Lack of governance and security is a major issue in many industries. It eventually slows down production, and then companies have to go back to the whiteboard to think about what went wrong and how to fix it. There are many real-life cases where lack of proper management leads to serious issues. Proper management, governance, and testing might not create any issue in the first place. Thanks to the model registry, you can:

  • Manage model lifecycle
  • Model risks and approval of workflows
  • Roll-out faster and seamlessly
  • Easily collaborate and manage

Key features

  • Central Repository: Seamlessly manage all your experiments in one place, manage registered models with versions and other metadata. This collaborative hub helps teams to easily access and share all types of information in one place. 
  • Model Versioning: Automatically keep track of versions. There are so many different components of your machine learning model – data usage, hyperparameters, pre-defined algorithm or your own developed algorithm, model architecture. With model versioning, you can easily manage all these components. Versioning is an important part of the machine learning governance process. There are many versioning tools that help you improve your ML Workflow. Below are a few top tools for data version control.
    • Neptune: Its intuitive UI and easy-to-use management system help you automate, track and monitor your experiments.
    • DVC:  It is an open-source system, which lets you maintain different data types, configurations, and code. It helps you to track full code and data provenance. 
    • GIT (LFS):  Git is used to monitor and version your experiments. Git tracks your code, lets you store, merge and implement those changes. While GIT LFS is an extension for versioning large files and datasets.                          
  • CI/CD Workflow Integration: It is a workflow that lets developers and data scientists change, update and merge code into a central repo. You can Govern the staging process, approve and review the changes. It allows teams to either automatically transition the model into production-based condition, or teams can manually control and validate life cycle stages.
    • Every time you update the code, CI relies on an automated suite of testing. The suite lets you know when something is not working, making it easy for teams to fix the issue. 
    • Its architecture is in a way that lets you work with small iteration and helps you with iterative releases. Keeping the code in a deployable manner.
    • Improving team productivity, faster deployments, more frequent releases, and with minimal risk. 
  • Model Management: Serve ML models as APIs for online deployment or testing. Companies often have thousands of Machine learning models at different stages. Model Registry makes it easy to govern, track and manage at testing, experimentation, and production stages. Sometimes keeping track of all your machine learning experiments can be challenging, but with proper management, things can go quite simple. Model management helps you with proper insights, easy collaboration, detailed record of your Machine learning experiments.

Model Registry platforms

Let’s discuss the few best and most used tools for Model Registry. We’ll also compare a few key features for better understanding, and run a small model registry demo.

1. neptune.ai

neptune.ai is a metadata store for MLOps, built for research and production teams that run a lot of experiments. 

It gives you a central place to log, store, display, organize, compare, and query all metadata generated during the machine learning lifecycle. Individuals and organizations use Neptune for experiment tracking and model registry to have control over their experimentation and model development. 

Model registry Neptune
  • Record and version all your machine learning model development metadata in one place.
  • Seamlessly Version notebooks, .git info, model code, datasets, and images.
  • Organize models in the central ML Model registry, version store, filter, sort, and group all your machine learning model training runs in a dashboard.
  • Neptune provides an easy way to debug and compare your model metrics. It automatically generates tables between the runs, making it effortless to compare. 
  • Team collaboration for machine learning models securely shares everything with your team. Get info on what, when, and who made changes to the model. 
  • You can re-run your ML models and keep track of all the runs and their results.

Getting started with Neptune is pretty easy:

  1. Installation
pip install neptune
  1. Training script preparation 

Create a file(main.py) and paste the below-mentioned code

import neptune
run = neptune.init_run(project="your_workspace/your_project")
# Track metadata and hyperparameters of your run
run["JIRA"] = "NPT-952"
run["parameters"] = {"learning_rate": 0.001,
"optimizer": "Adam"}
# Track the training process by logging your training metrics
for epoch in range(100):
run["train/accuracy"].append(epoch * 0.6)
run["train/loss"].append(epoch * 0.4)
run["f1_score"] = 0.66
  1. Running your script

Go to the terminal and run:

python main.py

You will see a Web link where you can see your experiments data in the UI

Check out their documentation page to see how everything works. 

Log model building metadata with Neptune:

Make sure you have proper libs installed, and API configured from the Neptune system.

  1. Connecting to Neptune
import neptune
run = neptune.init_run(project='common/quickstarts',
api_token='ANONYMOUS')
  1. Logging parameters
PARAMS = {'lr': 0.1, 'epoch_nr': 10, 'batch_size': 32}
run['parameters'] = PARAMS
  1. Adding metrics and losses
loss = ...
run["train/loss"].append(loss)
  1. Logging for test score
run['test/acc'] = 0.76
  1. Add logging of model files
run["model"].upload('my_model.pkl')

Try Neptune now or read more about its Model Registry features.

2. Azure Machine Learning

Azure ML is a cloud-based platform for training, deploying, automation, managing, and monitoring all your machine learning experiments. Azure ML uses an MLOPs approach to improve the quality and performance of your machine learning experiments. In Azure, you get the option to create and register a model with UI, or register using an API. 

  • Azure lets you create reusable pipelines and environments which help you easily work with model training, data preparation, and deployment of machine learning models. 
  • Register and deploy ML models, monitor metadata associated with your model from anywhere.
  • Govern the whole machine learning lifecycle, including when, where, and who made changes to the models.
  • Custom notification for different events, for example where your experiment is complete or model, is deployed. 
  • Monitor and explore different types of metrics, get custom alerts for your machine learning experiments.

The Azure ML Workflow is the same, regardless of where you deploy it.

  1. Registering the model
  2. Preparing entry script and configuration
  3. Deployment of the model (Cloud/Local)
  4. Monitoring and analysis 
  5. Re-deploying the model to the cloud
  6. Testing the performance

Registering Model from a local machine

wget https://aka.ms/bidaf-9-model -o model.onnx
az ml model register -n bidaf_onnx -p ./model.onnx

Set -p to the path of a folder or a file that you want to register.

Registering Model from Azure Machine learning’s training run

az ml model register -bidaf_onnx --asset-path outputs/model.onnx --experiment-name myexperiment --run-id myrunid --tag area=qna

Register a model using the API

There are three ways to register a model with Azure. 

with mlflow.start_run(run_name=<run-name>) as run:
  ...
  mlflow.<model-flavor>.log_model(<model-flavor>=<model>,
    artifact_path="<model-path>",
    registered_model_name="<model-name>"
  )

To register a model with some specific name, after running all your experiments, use the mlflow.register_model() method. 

result=mlflow.register_model("runs:<model-path>", "<model-name>")

To create a new register model with unique name, you can use client api method, create_registered_model()  

client = MlflowClient()
result = client.create_registered_model("<model-name>")

Reference: Deploy machine learning models to Azure

3. MLFlow

MLFlow is an open-source platform to manage your machine learning model lifecycle. It’s a centralized model store with APIs, and a UI to easily manage the MLops Lifecycle. It provides many features including model lineage, model versioning, production to deployment transitions, and annotations.

  • Register a model within the model registry using a name, versions, POC to the deployment stage, and other data.
  • The versioning feature lets you track versions for your machine learning models when they get updated.
  • Different model versions can be allocated one stage at a particular time, it provides a predefined model stage. It can be used for different stages such as Staging, Production, or Archived.
  • Automatically Log transition metrics, events, or changes made to your machine learning experiments. 
  • Annotations and model description, you can annotate top-level models and any description or information useful for other team members. For example – algorithm information, or dataset info. 
  • MLFlow lets you view and make changes to every stage transition as part of CI/CD pipelines for better governance and monitoring.  

MLFlow Model Registry Workflow

You can access the model registry via UI or API. If you’re running your own server, you must use a database-backed store to access the MLFlow model registry.

UI Workflow

  • Register model by going into the Artifacts page in the MLFlow details section.
  • Add a new model in the Model name field, make sure you provide a unique name to it, or else you can choose your existing model.
  • Now that your model is registered you can view details of the model by going to the Registered model section.
  • Each model has a details section where all active versions are shown.
  • You can change the model stage from staging to production by simply choosing from a drop-down menu. 
Model registry MLflow

API Workflow

API workflow is an alternative way to use a model registry. You can register a model during an MLflow experiment run or after all your experiment runs.

  • Adding MLFlow model
with mlflow.start_run(run_name="YOUR_RUN_NAME") as run:
    params = {"n_estimators": 5, "random_state": 42}
    sk_learn_rfr = RandomForestRegressor(**params)

    # Log parameters and metrics using the MLflow APIs
    mlflow.log_params(params)
    mlflow.log_param("param_1", randint(0, 100))
    mlflow.log_metrics({"metric_1": random(), "metric_2": random() + 1})

    # Log the sklearn model and register as version 1
    mlflow.sklearn.log_model(
        sk_model=sk_learn_rfr,
        artifact_path="sklearn-model",
        registered_model_name="sk-learn-random-forest-reg-model"
    )
  • Getting MLFlow model from registry 
model_name = "sk-learn-random-forest-reg-model"
model_version = 1

model = mlflow.pyfunc.load_model(
    model_uri=f"models:/{model_name}/{model_version}"
)

model.predict(data)
  • Serving MLflow model from the registry
export MLFLOW_TRACKING_URI=http://localhost:5000

mlflow models serve -m "models:/sk-learn-random-forest-reg-model/Production"
  • Updating your model information/description
client = MlflowClient()
client.update_model_version(
    name="sk-learn-random-forest-reg-model",
    version=1,
    description="This model version is a scikit-learn random forest containing 100 decision trees"
)

Read also

neptune.ai vs MLflow: Which tool is better?
Best Alternatives to MLflow Model Registry

Are there any problems with model registries?

Most model registries, including the ones we discussed here, are flexible and easy to use with other MLOps frameworks. However, not every platform will meet all your machine learning experiment needs. 

  • MLFlow doesn’t have an option to add user permissions, sometimes things become tough when you’re in the deployment stage.
  • Logging is easier and seamless in Azure ML and Neptune, sometimes MLFlow UI becomes laggy when you have more experiments to run. Meanwhile, Sagemaker uses Cloudwatch to log metrics, and cloud watch and the metric visualization are less desirable in sage maker.
  • Tracking is not allowed in MLFlow for experiment analysis, while Neptune and Azure provide a seamless tracking experience.

Here’s a tabular comparison between the tools we talked about:

 
Azure*
neptune.ai
MLFlow

Managing Workspace

Use Data Stores

Log Metrics

Notebook and Data versioning

View Metrics

User Management

Grouping Experiments

Monitoring Model Performance

Deploy Models

Uploading Artifacts

Pricing

Free Trial,
Pay As You Go

Free For Individual,
Teams paid

Free

*Azure Machine Learning studio

No matter what platform you use, a model registry will help you speed up your roll-out process, make your ML experiments easy to manage and make collaboration easier. It will create a seamless hand-off for your team, increasing security and governance. Every platform has its own features, depending on what you need and how you want to track all the ML experiments you can always opt for a free trial. It’s a good way to see which platform suits your machine learning experiments and helps you move forwards with deployment.