Blog » MLOps » Best Metadata Store Solutions: Kubeflow Metadata vs TensorFlow Extended (TFX) ML Metadata (MLMD) vs MLflow vs Neptune

Best Metadata Store Solutions: Kubeflow Metadata vs TensorFlow Extended (TFX) ML Metadata (MLMD) vs MLflow vs Neptune

How do you get the most precise machine learning model? Through experiments, of course! Whether you’re testing which algorithm to use, changing variable values, or choosing features to include, ML experiments help you decide. 

But, there’s a downside. They produce massive amounts of artifacts. The output could be a trained model, a model checkpoint, or a file created during the training process. Data scientists need a standardized way to manage these artifacts – otherwise it can become hectic very quickly. Here is just a basic list of all the variables and artifacts probably flowing through: 

  • Parameters: hyperparameters, model architectures, training algorithms
  • Jobs: pre-processing job, training job, post-processing job — these consume other infrastructure resources such as compute, networking and storage
  • Artifacts: training scripts, dependencies, datasets, checkpoints, trained models
  • Metrics: training and evaluation accuracy, loss
  • Debug data: weights, biases, gradients, losses, optimizer state
  • Metadata: experiment, trial and job names, job parameters (CPU, GPU and instance type), artifact locations (e.g. S3 bucket)

If data scientists don’t store all this experimental metadata, they will not be able to achieve reproducibility or compare ML experiment results. 

Why you need to store the metadata from ML experiments: 

Creating a machine learning model is a bit like a scientific experiment: you have a hypothesis, you test it using various methods, and then pick the best method based on data. With ML, you start out with a hypothesis about which input data might produce the accurate results, and train multiple models using various features. 

Going back and forth with error analysis and various domain experts, you can build new features meant to increase performance. However, there’s no surefire way to tell if the new model is fairly comparing the previous version – unless you store metadata. 

➡️ ML Experiment Tracking: What It Is, Why It Matters, and How to Implement It
➡️ Experiment Tracking vs Machine Learning Model Management vs MLOps

Storing machine learning experiment metadata helps you with comparability – being able to compare results between experiments. When one project has large teams of data scientists, it becomes difficult to use the same training and test set splits, the same validation schemes, etc. 

Machine Learning Development Lifecycle | Source

For example, individual data scientists within a team could be taking different ML approaches to the problem, with their own libraries and languages – with these differences, you need a standardized method to collect and store experiment metadata if you want to compare results. 

Another great thing about storing metadata is reproducibility – being able to repeatedly run your algorithm on different datasets and achieve the same results. Reproducibility assures data consistency, error reduction, and ambiguity when moving projects from development to production. Even if you lose some trained model objects, or have data changes – with stored metadata, you can retrain the model and deploy to production. 

The 4 types of metadata to store during training: 

So, storing metadata lets you compare results during experiments and reproduce them. There are four types of metadata to store during training: 


You should store the datasets used for model training and evaluation, most likely through a pointer to the data’s location. In the dataset, make sure to keep track of: 

  • Name
  • Version 
  • Columns
  • Statistics (distributions, etc.) 


You should store the following attributes: 

  • Model Type: The algorithm used to train the model is a classic piece of model metadata. Whether a random forest, an elastic net, or a gradient-boosted tree classifier, you should simply store the name of the framework and class associated with that model. You will be able to effortlessly instantiate new objects of the same class in the future. 
  • Data Preprocessing Steps: As we all know, data preprocessing is essential in order to convert raw data into usable training data. Using a series of feature preprocessing steps such as encoding categorical variables, imputation, catering, or scaling, this raw data is transformed until it’s accepted by the machine learning algorithm. Possible higher levels, such as merging data from different databases, should also be stored in the model metadata. 
    A model trained on preprocessed data will expect data in that format from then on. In order to ensure reproducibility, any data preprocessing steps taken should be stored as a single object with the fitted model. Now, the process of re-instantianting the fitted model at inference time is vastly simplified. 
  • Hyper parameters: An example of a hyperparameter is the topology and size of a neural network. You should store hyperparameters from the model training process for reproducibility. 


Evaluation metrics are the result of testing model performance on a brand new dataset. They measure the quality of the ML model, for example classification accuracy, logarithmic loss, or confusion matrix. 

Evaluation metrics help to evaluate the model, as it may perform better using one measurement from one evaluation metric, but not so greatly with another. 

Storing these will help you determine if your ML model is overfitting to the training set, and conduct in-depth error analysis. You can also optimize your hyperparameters by switching out hyperparameter settings to explore how they affect evaluation metrics. 


Reproducibility can only be achieved by storing context, or information about the ML experiment’s environment that has potential to affect its output.

Examples of context: 

  • Source code
  • Programing languages + version
  • Dependencies + packages
  • Host info including system packages, CPU and OS information, environment variables

The selection criteria for ML metadata store

Given a variety of options for ML Metadata Store, it can be hard to know which to pick. 6 things to keep in mind when selecting your solution: 

  1. Experiment Tracking Features: Does the platform have a variety of tools like data versioning, notebook versioning, model versioning, code versioning, or environment versioning for tracking experiments? 
  2. Extensibility/ Integrations: Given you want to incorporate features, data, metadata, etc. from multiple platforms, does your central storage solution let you do that? 
  3. UI Features: Does your platform have a rich, intuitive, user-friendly UI that lets you manage experiments clearly? 
  4. Team Collaboration: Is there a way for you to assign roles and collaborate as a large team on a project? 
  5. Product Features: What product features does it include? Examples could be APIs or dedicated user support.
  6. Scales: Does your chosen platform scale to millions of runs? 

The Best Software for Collaborating on Machine Learning Projects


Logging metadata

Neptune is a lightweight experiment management tool, and one of the best tracking platforms for data scientists. Neptune easily integrates with your workflow and provides various tracking features. In addition to easily tracking, retrieving, and analysing experiments, you can collaborate in large teams by assigning roles. Neptune has a beautiful, intuitive UI, and you can even use their web platform, so you don’t have to deploy it on your own hardware. 

Neptune’s main features are: 

  • Experiment Management: keep track of all your team’s experiments, also tag, filter, group, sort, and compare them 
  • Notebook versioning and diffing: compare two notebooks or checkpoints in the same notebook; similarly to source code, you can do a side-by-side comparison 
  • Team Collaboration: add comments, mention teammates, and compare experiment results

Neptune lets you track basically anything that happens during experiment and model training: 

  • Metrics
  • Hyperparameters
  • Learning curves
  • Training code and configuration files
  • Predictions (images, tables, etc)
  • Diagnostic charts (Confusion matrix, ROC curve, etc)
  • Console logs
  • Hardware logs

You can also track artifact metadata: 

  • Paths to the dataset or model (s3 bucket, filesystem)
  • Dataset hash
  • Dataset/prediction preview (head of the table, snapshot of the image folder)
  • Description
  • Feature column names (for tabular data)
  • Who created/modified
  • When last modified
  • Size of the dataset

And finally trained model metadata:

  • Model binary or location to your model asset
  • Dataset versions 
  • Links to recorded model training runs and experiments 
  • Who trained the model
  • Model descriptions and notes
  • Links to observability dashboards 

Neptune has a very easy setup: 

  1. Sign up for a Neptune AI account first. It’s free for individuals and non-organizations, and you get a generous 100 GB of storage. 
  2. Create a project. In your Projects dashboard, click “New Project” and fill in the following information. Pay attention to the privacy settings!
Neptune Sacred new project
  1. Install the Neptune client library
pip install neptune-client
  1. Add logging to your script
import neptune

neptune.create_experiment(params={'lr':0.1, 'dropout':0.4})
# training and evaluation logic
neptune.log_metric('test_accuracy', 0.84)

Here is how your metadata database and dashboard would look like: 

Metadata database

The metadata database is a place to store the experiment model and dataset metadata so that they can be logged and queried efficiently.


The dashboard is a visual interface to the metadata database, so you can see all your experiment metadata, models, and datasets in one place. 

Tensorflow Extended ML Metadata

ML Metadata (MLMD) is a library from TensorFlow Extended, but you can use it independently. ML Metadata helps you store information about your ML pipeline, such as: 

  • Dataset the model was trained on
  • Pipelines, other lineage information
  • Hyperparameters used to train the model 
  • TensorFlow version 
  • Failed models, errors
  • Training runs
  • Artifacts generated
  • Executions

MLMD stores metadata in the Metadata store, and uses APIs in order to record and retrieve the metadata from the storage backend; it also has reference implementations for SQLite and MySQL out of the box.

👉 Check Neptune’s integration with TensorFlow

Kubeflow Metadata

KubeFlow is a standardized solution to deploy the entire lifecycle of enterprise ML apps. Because ML systems all have various applications, platforms, and resource considerations, it can be especially hard to maintain them. Kubeflow is an open source project that provides various tools and frameworks for ML, and eases the process of developing, deploying, and managing ML projects. 

Kube Flow Metadata helps data scientists track and manage the huge amounts of metadata produced by their workflows. Metadata refers to information about runs, models, datasets, and artifacts (files and objects in the ML workflow). 

The Kubeflow UI lets you view logged artifacts and corresponding details:

Within the Artifacts screen, you can view things like model metadata, metrics metadata, or dataset metadata.

MLflow Tracking

MLflow Tracking lets you log parameters, code versions, metrics, output files, and more. MLflow Tracking has runs, or execution of code. 

Runs record code versions, starting and ending times, sources, parameters, metrics, artifacts. These runs are recorded to local files, an SQLalchemy-compatible database, or a remote tracking server. The backend store has MLflow entities such as the metadata, and the artifact store has artifacts.

👉 Check Neptune’s integration with MLflow

Conclusion + Learning more about Neptune

As you can see, ML Metadata Store Solutions are necessary for data scientists. Take a look at the features these tools offer when selecting one for your needs. For me, Neptune provides the most thorough, all-inclusive solution. If you want to learn more about Neptune, check out the official documentation. If you want to try it out, create your account and start tracking your machine learning experiments with Neptune.


15 Best Tools for Tracking Machine Learning Experiments

Pawel Kijko | Posted February 17, 2020

While working on a machine learning project, getting good results from a single model-training run is one thing, but keeping all of your machine learning experiments organized and having a process that lets you draw valid conclusions from them is quite another. That’s what machine learning experiment management helps with. 

In this article, I will explain why you, as data scientists and machine learning engineers, need a tool for tracking machine learning experiments and what is the best software you can use for that.

Tools for tracking machine learning experiments – who needs them and why?

  • Data Scientists: In many organizations, machine learning engineers and data scientists tend to work alone. That makes some people think that keeping track of their experimentation process is not that important as long as they can deliver that one last model. This is true to an extent, but when you want to come back to an idea, re-run a model from a couple of months ago or simply compare and visualize the differences between runs, the need for a system or tool for tracking ML experiments becomes (painfully) apparent. 
  • Teams of Data Scientists: A specialized tool for tracking ML experiments is even more useful for the whole team of data scientists. It allows them to see what others are doing, share the ideas and insights, store experiment metadata, retrieve it at any time and analyze it whenever they need to. It makes the teamwork much more efficient, prevents situations where several people work on the same task, and makes onboarding of new members way easier.
  • Managers/Business people: tracking software creates an opportunity to involve other team members like managers or business stakeholder in your machine learning projects. Thanks to the possibility to prepare visualizations, add comments and share the work, managers and co-workers can easily track the progress and cooperate with the machine learning team.

Here is an in-depth article about experiment management for those of you who want to learn more.

Continue reading ->

MLOps: What It Is, Why it Matters, and How To Implement It (from a Data Scientist Perspective)

Read more
MLOps best practices

MLOps: 10 Best Practices You Should Know

Read more

Experiment Tracking vs Machine Learning Model Management vs MLOps

Read more
GreenSteam MLOps toolstack

MLOps at GreenSteam: Shipping Machine Learning [Case Study]

Read more