MLOps Blog

The Best Vertex ML Metadata Alternatives

8 min
7th August, 2023

Last year, Google announced Vertex AI, a new managed machine learning platform aimed at making it easier for developers to deploy and maintain AI models. So, what are the benefits and drawbacks of Vertex AI? In this article, we’ll discuss Vertex AI and take a look at its alternatives, but first, let’s see why is there a need for such a tool?

Why would you need an ML Metadata Store?

The information that characterizes the dataset, computing environment, and model is known as metadata. In simple terms, it’s a one-stop store for all you need to know about building and deploying machine learning models. This is crucial for ML reproducibility; without a metadata store, you won’t be able to recreate tests if you don’t record and save information. ML metadata store will help you:

  • Keep track of all experiments and model metadata.
  • Provide tools to visualize and compare experiments.
  • Log all metadata that might be relevant.
  • Get Information on code, data, parameters, and version of the environment.
  • Analyze, monitor, and alert you if any unexpected changes are detected to the model’s input/output distribution.

Learn more

ML Metadata Store: What It Is, Why It Matters, and How to Implement It

Best ML Metadata Store Solutions

Data and artifact lineage, reproducibility, and comparability are all aided by collecting data about each ML pipeline run. When transitioning projects from development to production, reproducibility ensures data consistency and error reduction. It also helps in the investigation of faults and anomalies. In general, an ML metadata store will record the following metadata:

  • The versions of the pipeline and components that were used.
  • The start and finish dates, as well as the time it took the pipeline to complete each stage.
  • The parameters supplied to the pipeline as arguments.
  • The model assessment metrics generated for both the training and test sets during the model evaluation process. During the model validation process, these metrics allow you to compare the performance of a freshly trained model to the performance of the prior model.
  • The model evaluation metrics generated for both the training and test sets during the model evaluation process.
Basic Workflow of Metadata Store
Basic workflow of Metadata Store | Image by author

Vertex AI

Google has introduced Vertex AI, a new unified machine learning platform that uses Google’s AI technologies to help you deploy models quicker. Vertex AI combines the Google Cloud services for machine learning development into a single UI and API. Vertex AI allows you to easily train and compare models with AutoML or custom code, and all of your models are kept in a single model repository.

Vertex AI Dashboard
Vertex AI Dashboard | Screenshot by Author 

Features of Vertex AI

With Vertex AI you now have a single workflow as an end-user that incorporates the full development lifecycle—from experimentation to deployment. Here are a handful of the features of vertex AI:

  • All popular open-source frameworks are supported.

It supports mostly all ML frameworks via custom training and prediction containers. This allows you to treat all models in the same way, regardless of whether they are custom-written or created with AutoML.

  • Pre-trained APIs

You get pre-trained APIs for vision, video, and more, not just to make your process easier, but also to make things quicker. You can simply integrate them into your existing apps or use them to create a new one. As a result, you may not need to hunt for additional AI API platforms to complete your task.

  • Seamless data-to-AI Integration

BigQueryML is widely used to develop and execute machine learning models using SQL queries, which comes as no surprise. You’ll be able to access it and export datasets into the platform to connect it with the process using Vertex AI. As a result, you’ll have end-to-end integration.

  • A simplified Machine Learning process

You’ll be able to work on ML models utilizing tools like AutoML, Explainable, Edge Manager, and a few more. Having everything in one place should be a game-changer. You can also use custom code to train while keeping everything in one place.

Vertix AI offers tools like:

  • Model Monitoring
  • Matching Engine
  • ML Metadata
  • TensorBoard
  • Pipelines and a lot more.

Vertex ML Metadata

Vertex AI uses MLMD principles for a metadata store. It presents the metadata as a navigable graph, with nodes representing executions and artifacts and edges linking them(as seen in the graphic below). Executions and objects are further connected by contexts, which are represented by subgraphs. Vertex ML Metadata will assist with the analysis of runs, ML experiments, and the tracking of ML artifacts, among other things. Let’s take a look at some of the most important entities that are utilized to record metadata.

  • Artifacts
  • Executions
  • Events
  • Contexts
Vertex AI: example lineage graph
Vertex AI: example lineage graph | Source

Limitations of Vertex AI

Vertex AI offers a lot of good characteristics, but it still has a few drawbacks that might be a cause of concern for many: 

  • Limited features 

Vertex metadata is based on ML Metadata, which does not have a client python SDK, which is required for ad hoc training operations, such as within notebooks. Organizations can save time, money, and reduce IT effort by using ad hoc operations. It currently lacks features such as automated debugging, model registry, model artifact compilation, and Kubernetes support, which may be a deal-breaker for some users. It also lacks the ability to sync online and offline functions.

  • Cost 

You will have a free tier for a few months as a new user, but after that, you will have to pay-as-you-go. When compared to other solutions, Vertax AI is relatively pricey, and the price may rise substantially as clients utilize more services. You have to pay charges for other Google Cloud products that you use with Model Monitoring, such as BigQuery storage or Batch Explain. Vertex ML Metadata costs start at $10 per gibibyte (GiB) per month for metadata storage and can increase if you use other features. You can find complete price information here.

  • Scalability 

Vertex AI Metadata is currently only available for use in the production pipeline. It allows you to link a GPU accelerator to an existing CPU instance, however it does not yet support the A100 instance type.

  • Documentation & Developer guides 

Despite the fact that Vertex AI has been available for a while, it’s difficult to locate a solution online or on developer community sites. As the product isn’t designed for small or medium-sized businesses, the team will require solid documentation and tutorials to get started. 

Things to consider while choosing an ML metadata store

A metadata store is an ideal way to go for speed, automation, and smart insights whether you have a growing team, intend on scaling and upgrading existing solutions, or want to add additional ML solutions to your product line. Metadata is important in machine learning as it may assist you with:

  • Artifact tracking
  • Legal compliance with model audit trials
  • Warmstart training
  • Tracking model sign-offs
  • Learning from previous mistakes

Here’s what you should take into consideration when choosing a Vertex ML Metadata alternative, or just any metadata store.

1. Tracking capabilities 

You’ll be keeping track of hyperparameters, models, code, resources, insights, and much more. The metadata store should offer a wide variety of tracking capabilities, including data versioning, tracking data and model lineage, source code versioning, and even versioning the testing and production environments. As a result, be sure the tool you choose includes all of the tracking features you’ll require for your project.

2. Integrations 

To capture the elements of life created at each stage of the pipeline, the ideal metadata store should be able to integrate with all major or most significant tools and frameworks.

3. Easy collaboration 

To design a successful ML solution, the development team, operations teams, and even business teams need to collaborate. The metadata stores should be reliable and allow for simple collaboration among team members. For the team to collaborate on ML experiments, the tool you use should offer such provisions.

4. Detailed insights 

While the metadata store collects data at different stages, it should also give intelligent insights that can speed up tests and enhance reporting. An ideal metadata store platform should be able to give developers dynamic visualizations that they can tailor to illustrate the pipeline data’s unique emphasis areas.

5. Visualizations

A decent visual representation will make it easier to assess the results. It simplifies complicated concepts and allows you to convey visual outcomes to your stakeholders. It can also assist you in performing an error analysis and identifying areas for improvement.

Because of all the reasons discussed above, Vertex AI may not be the perfect fit for everyone. So, let’s have a look at some of the top solutions with appealing features currently in the market:

The best alternatives to Vertex ML Metadata

1. Neptune

Neptune is a metadata store. It is a tool that connects several components of the MLOps workflow, such as data versioning, experiment tracking, model registry, and monitoring. It simplifies the storage, management, visualization, and comparison of all information created during the model development process

Neptune offers a Python client library that lets users log and keep track of any metadata type in their ML experiments whether those run in Python scripts, Jupyter Notebooks, Amazon SageMaker Notebooks, or Google Colab.

Example dashboard with logged metadata | Source

Neptune summary

Neptune enhances the management of machine learning projects for teams. Its easy-to-use interface lets you aggregate runs, save custom dashboard views, and quickly share them with your team.

  • All metadata types, including parameters, model weights, and media files, are logged and displayed.
  • The user interface is simple to use and offers a variety of options for grouping runs.
  • Compare insights and parameters.
  • Automatically record the code, environment, parameters, model binaries, and much more.
  • Track experiments that are executed in scripts, notebooks and on any infrastructure.
  • Extensive experiment tracking and visualization capabilities.
  • You can monitor the hardware for your experiment runs automatically. Examine the amount of GPU/CPU and memory your model training runs consume.

Vertex ML Metadata vs Neptune

  • Neptune offers a Python client library. 
  • It offers a user interface that is very intuitive and flexible, allowing users to see and arrange data in the way they choose.
  • Neptune saves the majority of the metadata and its versions, making it easier for users to recreate the models.
  • It allows for smooth integration with over 25 different tools.
  • Cost-effective.

2. ML Metadata (MLMD)

TensorFlow’s ML Metadata (MLMD) is part of TensorFlow Extended (TFX), which is an end-to-end framework for deploying machine learning solutions. Every time a production ML pipeline is run, metadata is generated that contains information on the pipeline components, their executions (for example, training runs), and the artifacts produced (e.g. trained models). 

MLMD architecture consists of three things, 

  • Driver, which supplies required metadata to the executor.
  • The executor is where the component’s functionality is coded.
  • The result is stored in metadata by the publisher.
MLMD Architecture | Source

MLMD summary 

  • Tracks metadata flowing between components in the pipeline. 
  • Supports multiple storage backends.
  • You can compare two Artifacts of the same type by loading both of them.
  • It stores Metadata about the pipeline components’ lineage
  • It lists all Artifacts of a specific type.
  • Stores Information on the pipeline’s executions.
  • It has APIs for storing and retrieving metadata to and from the storage.
  • The storage backend is expandable and pluggable. 
  • It records and query the context of workflow runs.

This tutorial will help you understand MLMD better.

ML Metadata (MLMD) vs Vertex ML Metadata

  • Both Vertex ML Metadata and MLMD are built in the same way, although there are some differences in their APIs, workflows, and other aspects.
  • Vertex AI is an enterprise level AI platform, whereas MLMD is a library for recording and retrieving information related to machine learning applications. 
  • MLMD is a powerful open-source model debugging library.
  • In comparison to the model-first view, it prioritizes the pipeline view.

3. MLflow

MLflow is an open-source ML lifecycle management tool. It helps data scientists and ML engineers with experiments, deployment, and model registry. It can be used with a variety of ML  libraries and tools. It is also an open-source model-first machine learning metadata store you can use to monitor your experiments or package models for production.

MLflow example view
MLflow example dashboard | Source

MLFlow summary

  • It can work with any machine learning library, language or any existing code. It runs in the same manner in any cloud.
  • It packs an ML model in a standardized format that may be utilized by downstream tools.
  • The four primary components of MLflow are MLflow tracking, MLflow projects, MLflow models, and the MLflow registry.
  • You can store and query your code and data experiments using MLflow tracking.
  • MLFlow tracking lets you record artifacts, metrics, parameters, source, time and a lot more.
  • MLflow projects is a data science package that includes code that is reusable and reproducible. It also comes with an API and a command-line tool for ML and data science tasks.
  • Different types of ML models may be deployed using MLflow models. Each model is stored as a directory that contains any number of files.

MLflow vs Vertex ML Metadata

  • It is an open-source platform.
  • MLflow is highly customizable.
  • Provides real-time experiment tracking.
  • MLflow can work with any cloud service provider.

May interest you

See an in-depth comparison between MLflow and Neptune.

4. KubeFlow

Kubeflow is a machine learning tool for Kubernetes that is open source. Kubeflow transforms stages in your data science process into Kubernetes tasks, giving your machine learning libraries, frameworks, pipelines, and notebooks a cloud-native interface. 

By default, Kubeflow includes a metadata component, which is used to store and serve metadata. During a pipeline run, it logs metadata automatically. This allows you to keep track of things like pipeline versions, when they were last updated, and metadata in order to analyze a pipeline run.

Kubeflow UI
Kubeflow UI | Source 

Kubeflow summary

  • Deployments are repeatable and portable across a variety of infrastructure.
  • Many frameworks and platforms are supported.
  • In particular kubeflow pipeline automatically logs information about a run including workflow artifacts, execution and lineage.  
  • Kubernetes users will find Kubeflow to be an excellent fit.
  • You can manually write to the metadata server to collect additional metadata in addition to automated tracking.
  • It’s scalable and offers a wide range of hyperparameter tuning options.

Learn more from the Kubeflow documentation. 

May interest you

See an in-depth comparison between Kubeflow and Neptune.

Kubeflow vs Vertex ML Metadata

  • It is an open-source platform.
  • Even small and medium-sized businesses can benefit from its scalability.
  • For managing and tracking model experiments, tasks, and runs, it offers an excellent user interface.
  • Kubeflow allows users to quickly connect to other platforms and, if needed, users can easily migrate to another platform.

5. Valohai

Valohai is an MLOps platform that automates data extraction and model deployment. With Valohai, your team takes a big step ahead in terms of production machine learning. Its end-to-end MLOps platform enables teams to swiftly and confidently create and deploy machine learning systems.

Valohai UI
Valohai UI | Source 

Valohai summary

  • You can conduct your tests on any cloud or on-premise system using Valohai, and you won’t have to worry about any of the regular DevOps duties.
  • Everything you do with Valohai is saved and versioned on the platform. 
  • Everything from models to measurements may be easily shared between you and your team.
  • You can keep track of your executions‘ main KPIs and arrange them by your unique metrics with ease.
  • Anything you publish as JSON from your code is potentially captured as Valohai metadata.
  • You can compare execution metrics as a time series or a scatter plot.

You’ll find more information about the features on the Valohai website. 

Valohai vs Vertex ML Metadata

  • Valohai offers a Python utility library called valohai-utils to help with the everyday boilerplate.
  • It ensures that the process is consistent and that everyone in the team is aware of what is going on.
  • You can compare the results of the experiments in a graph or tabular style.
  • Compatible with any programming language and ML framework.
  • You can get the option to download the metadata as a CSV or JSON file.

6. SageMaker Studio

Amazon SageMaker lets data scientists prepare, build, train, tune, deploy, and manage all the ML experiments. It has a user-friendly interface that makes the tasks of ML engineer and data scientists considerably easier. If you currently use AWS, Sagemaker Studio is the ideal option since it offers excellent integration support for all AWS products.

Amazon SageMaker UI
Amazon SageMaker Studio UI | Source 

SageMaker Studio summary

  • It works seamlessly with other AWS tools.
  • Easy to use interface.
  • You can track the inputs and outputs of the container.
  • Track and visualize thousands of experiments.
  • Sagemaker includes a python library integration with robomaker, allowing the two systems to communicate throughout the training process.
  • It comes with built-in algorithms for training and running your experiments.
  • SageMaker has a built-in debugger to help you find and fix issues.

Amazon SageMaker vs Vertex ML Metadata

  • Sagemaker can monitor the container’s inputs and outputs.
  • It has the ability to visualize metrics.
  • It also allows you to start with smaller instances.
  • Model registry and model artifact recompilation are both supported by Sagemaker.

May interest you

See an in-depth comparison between Amazon SageMaker and Neptune.

Vertex ML Metadata alternatives comparison

ML Metadata

Individual free (+ usage above free quota), academia: free, team: paid


  • Flexible and works well with other frameworks
  • t

  • Intuitive UI
  • t

  • Easy collaboration with team and stakeholders

  • The storage is expandable and pluggable
  • t

  • Allows you to store a wide range of metadata

  • Highly customizable
  • t

  • Fits perfectly for data science workflow
  • t

  • Can work with any machine learning library, language or any existing code

  • Perfect fit for Kubernetes users
  • t

  • Highly scalable 
  • t

  • Automatically logs information about a run including workflow artifacts

  • Easy to use collaboration feature
  • t

  • Everything you do with Valohai is saved and versioned on the platform

  • Works well with the SageMaker platform
  • t

  • Easy to use interface

  • On-premise
  • t

  • Cloud

  • Cloud

  • On-premise
  • t

  • Cloud

  • On-premise
  • t

  • Cloud

  • On-premise
  • t

  • Cloud

  • On AWS
Hyperparameters tracking
Input/output artifacts
Visual comparisons
Dataset versioning

Let’s have a quick comparison between these platforms.

Final thoughts

Metadata is an important part of any end-to-end ML development process since it not only speeds up the process but also enhances the quality of the final pipeline.

Vertex AI is a relatively new machine learning platform among ML professionals. It has a lot of potential, but it also has certain limitations, which is why businesses are searching for more open and simple integration solutions. We discussed a few ML metadata stores, and you may choose one based on your machine learning requirements. I hope you liked the article.

Happy experimenting!

References and recommending reading:

Was the article useful?

Thank you for your feedback!