MLOps Blog

Best Tools to Log and Manage ML Model Building Metadata

5 min
21st August, 2023

When you’re developing machine learning models, you absolutely need to be able to reproduce experiments. It would be very unlucky to get a model with great results, which you can’t reproduce because you didn’t log the experiment. 

You can make your experiments reproducible by logging everything. There are several tools you can use for this. In this article, let’s look at some of the most popular tools, and see how to start logging your experiments’ metadata with them. You’ll learn how you can run a LightGBM experiment using these tools. 

Let’s dive right in. is a platform that can be used for logging and managing ML model building metadata. You can use it to log:

  • Model versions,
  • Data versions,
  • Model hyperparameters,
  • Charts,
  • and a lot more.

Neptune is hosted on the cloud, so you don’t need any setup, and you can access your experiments anytime, anywhere. You can organize all your experiments in one place, and collaborate on them with your team. You can invite your teammates to view and work on any experiment.

To start using Neptune, you need to:

params = {'boosting_type': 'gbdt',
              'objective': 'regression',
              'num_leaves': 40,
              'learning_rate': 0.09,
              'feature_fraction': 0.8

run = neptune.init_run(name='LightGBM-training',params=param)
from neptune.integrations.lightgbm import NeptuneCallback
import lightgbm as lgb
lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)

gbm = lgb.train(params,
    valid_sets=[lgb_train, lgb_eval],

Once the training process is over, you can head back to the Neptune web UI to see the experiments and compare results. 

Neptune new UI

Neptune also logs model metrics. For instance, let’s look at how you can log the mean absolute error, mean squared error and the root mean squared error. You can log all sorts of metrics using the `log_metric` function.

predictions = gbm.predict(X_test)
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error
run['Root Mean Squared Error'].append(np.sqrt(mean_squared_error(y_test, predictions)))
run['Mean Squarred Error'].append(mean_squared_error(y_test, predictions))
run['Mean Absolute Error'].append(mean_absolute_error(y_test, predictions))
Neptune logs

Neptune also automatically logs the training and validation learning curves. You will find these under the Charts menu in the web UI. 

Neptune training curve
Neptune validation curve


MLflow is an open-source platform for tracking machine learning models, logging, and managing ML model building metadata. It also integrates with popular data science tools. Let’s take a look at the LightGBM integration (this integration is still experimental in MLflow).

The first thing we do while running experiments with MLflow is enable automatic logging of parameters and metrics.

import mlflow

You can also log the parameters manually, using the `log_param` function. The method logs parameters under the current run. It creates a new run if none is active. 

params = {'boosting_type': 'gbdt',
              'objective': regression,
              'num_leaves': 67,
              'learning_rate': 0.01,
              'feature_fraction': 0.8

mlflow.log_param("boosting_type", params["boosting_type"])
mlflow.log_param("objective", params["objective"])
mlflow.log_param("num_leaves", params["num_leaves"])
mlflow.log_param("learning_rate", params["learning_rate"])
mlflow.log_param("feature_fraction", params["feature_fraction"])

The trained LightGBM model can also be logged manually using the `log_model` function. MFlow will automatically log this when auto logging is enabled. 

from mlflow.lightgbm import log_model

You can then use this model to run predictions on new data. Load the model, and use the `predict` function to make predictions. 

import mlflow
logged_model = 'file:///Users/derrickmwiti/Downloads/mlruns/0/56cb6b76c6824ec0bc58d4426eb92b91/artifacts/lightgbm-model'

# Load model as a PyFuncModel.
loaded_model = mlflow.pyfunc.load_model(logged_model)

# Predict on a Pandas DataFrame.
import pandas as pd

You can also perform predictions on a Spark DataFrame, if you load the model as a Spark UDF

import mlflow
logged_model = 'file:///Users/derrickmwiti/Downloads/mlruns/0/56cb6b76c6824ec0bc58d4426eb92b91/artifacts/lightgbm-model'

# Load model as a Spark UDF.
loaded_model = mlflow.pyfunc.spark_udf(logged_model)

# Predict on a Spark DataFrame.
df.withColumn(loaded_model, 'my_predictions')

You can end an active MLflow run using the `mlflow.end_run()` function. Runs can be viewed anytime from the web UI. You can start the web UI by executing `mflow ui` on the terminal. 

MLflow dashboard

The web UI makes it easy to compare different runs. This will show a comparison of the different parameters and metrics. 

Mlflow web UI

You can also see a comparison of the training and validation learning curves for different runs. 

Mlflow metrics

Under the artifacts section, you will find the logged model and charts. For instance, here’s the logged feature importance for one LightGBM training run. 

MLflow feature importance

Check out the complete MLflow example here

Check how MLflow compares with

Weights and Biases

Weights and Biases is a platform for experiment tracking, model, dataset versioning, and managing ML model building metadata. To start using it, you will have to create an account and create a project. You will then initialize the project in your Python code. 

Let’s now import `wandb` and initialize a project . At this point, you can pass the parameters that you will use for the LightGBM algorithm. These will be logged and you will see them on the web UI. 

import wandb
params = {'boosting_type': 'gbdt',
          'objective': 'regression',
          'num_leaves': 40,
          'learning_rate': 0.1,
          'feature_fraction': 0.9
run = wandb.init(config=params,project='light', entity='mwitiderrick', name='light')
Wandb configuration

The next step is to use the LightGBM callback from `wandb` to visualize and log the model’s training process. Pass the `wandb_callback` to LightGBM’s `train` function.

from wandb.lightgbm import wandb_callback

gbm = lgb.train(params,
    valid_sets=[lgb_train, lgb_eval],

Weights and Biases logs scalars, such as accuracy and regression metrics. Let’s take a look at how you can log the regression metrics for each LightGBM run. Use the `wandb.log` function. 

import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error
predictions = gbm.predict(X_test)
wandb.log({'Root Mean Squared Error': np.sqrt(mean_squared_error(y_test, predictions))})
wandb.log({'Mean Squared Error': mean_squared_error(y_test, predictions)})
wandb.log({'Mean Absolute Error': mean_absolute_error(y_test, predictions)})

The web UI will also log the training and validation learning plots automatically.

Wandb training and validation

You can also quickly create reports from runs. 

Wandb reports

You can save the trained LightGBM model, and log it to Weights and Biases at every run. Instantiate an empty `Artifact` instance, and then use it to log the model. Datasets can be logged similarly.

artifact = wandb.Artifact('model.pkl', type='model')

You will see the logged model under the Artifacts section of the web UI.

Wandb artifacts

You can end a particular experiment using `wandb.finish()`. Check the complete LightGBM with Weights and Biases example here

Check how Weights & Biases compares with


Sacred is an open-source machine learning experimentation tool. The tool can also be used for logging and managing ML model building metadata. When using Sacred, you first need to create an experiment. You’ll need to pass `interactive=True` if you’re running the experiment on Jupyter Notebooks.

from sacred import Experiment
ex = Experiment('lightgbm',interactive=True)

Next, define the experiment configuration using the `@ex.config` decorator. The configuration is used to define and log the parameters for the algorithm.

def cfg():
    params = {'boosting_type': 'gbdt',
              'objective': 'regression',
              'num_leaves': 40,
              'learning_rate': 0.01,
              'feature_fraction': 0.9

Next, define the run function. When running in interactive mode, this function has to be decorated with `@ex.main`. Otherwise, use `ex.automain`. This decorator is responsible for figuring out the file name where the main file is located. The function with either of these decorators is the one that’s executed when you run the experiment. For this experiment, a couple of things happen in the `run` function:

  • Training of the LightGBM model,
  • Saving the model,
  • Making predictions using the model,
  • Logging the regression metrics using the `log_scalar` method,
  • Logging the model using the `add_artifact` function.

You can also log a resource, such as a Python file, using the `add_resource` function. 

import lightgbm as lgb

def run(params):
    lgb_train = lgb.Dataset(X_train, y_train)
    lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)
    gbm = lgb.train(params,
        valid_sets=[lgb_train, lgb_eval],
    predictions = gbm.predict(X_test)

    ex.log_scalar('Root Mean Squared Error', np.sqrt(mean_squared_error(y_test, predictions)))
    ex.log_scalar('Mean Squared Error', mean_squared_error(y_test, predictions))
    ex.log_scalar('Mean Absolute Error', mean_absolute_error(y_test, predictions))

The next step is to run the experiment. 

r =

Unfortunately, Sacred doesn’t ship with a web UI that you can use to view the experiments. You have to use an external tool for this. That brings us to the next library, Omniboard. 


Omniboard is a web-based user interface for Sacred. The tool connects to the MongoDB database used by Sacred. It then visualizes the metrics and logs collected for each experiment. To view all the information that Sacred collects, you have to create an observer. The `MongoObserver` is the default observer. It connects the MongoDB database and creates a collection with all this information. 

from sacred.observers import MongoObserver

With that in place, you can run Omniboard from the terminal.

$ omniboard -m localhost:27017:sacred

 You can then access the web UI at 

Omniboard dashboard

Clicking a run will show more information about it. For instance the metrics. 

Omniboard run info

You also see the model that was logged during the run.

Omniboard artifacts

The configuration of that run is there as well. 

Omniboard configuration

The complete example using Sacred + Omniboard is here. You will have to run the notebook on a server or your local machine, so that you can do all the setup required to run Sacred with Omniboard. 

Check how Sacred+Omniboard compares with

Final thoughts 

In this article, we ran machine learning experiments using various experiment tracking tools. You saw how to: 

  • Create runs and experiments,
  • Log models and datasets,
  • Capture all experiment metadata,
  • Compare different runs,
  • Log model parameters and metrics.

Hope you’ve learned something new. Thanks for reading!

Was the article useful?

Thank you for your feedback!