We Raised $8M Series A to Continue Building Experiment Tracking and Model Registry That “Just Works”

Read more

Blog » ML Tools » Best Tools to Log and Manage ML Model Building Metadata

Best Tools to Log and Manage ML Model Building Metadata

When you’re developing machine learning models, you absolutely need to be able to reproduce experiments. It would be very unlucky to get a model with great results, which you can’t reproduce because you didn’t log the experiment. 

You can make your experiments reproducible by logging everything. There are several tools you can use for this. In this article, let’s look at some of the most popular tools, and see how to start logging your experiments’ metadata with them. You’ll learn how you can run a LightGBM experiment using these tools. 

Let’s dive right in. 


Neptune is a platform that can be used for logging and managing ML model building metadata. You can use it to log:

  • Model versions,
  • Data versions,
  • Model hyperparameters,
  • Charts,
  • and a lot more.

Neptune is hosted on the cloud, so you don’t need any setup, and you can access your experiments anytime, anywhere. You can organize all your experiments in one place, and collaborate on them with your team. You can invite your teammates to view and work on any experiment.

To start using Neptune, you need to install `neptune-client`. You also need to set up a project. This is the project that you’ll use from Neptune’s Python API. 

Neptune projects

A new version of Neptune was just released. The new version supports more workflows, like offline mode, ML pipelines, and resuming runs. However, some integrations, like the LightGBM one, are still being ported to the new version. So, in this article, I used an older version. Keep an eye on this page for the new version.

The next step is to initialize the `neptune-client` to work with this project. Apart from the project, you will also need your API key. You can get this under your profile picture, as shown below. 

Neptune API

With those two things in play, you can now initialize the project. 

import neptune
neptune.init(project_qualified_name='mwitiderrick/LIGHTSAT', api_token='YOUR_TOKEN')

The next step is to create an experiment and name it. The parameters for this experiment are also passed in at this stage. 

params = {'boosting_type': 'gbdt',
              'objective': 'regression',
              'num_leaves': 40,
              'learning_rate': 0.09,
              'feature_fraction': 0.8

exp = neptune.create_experiment(name='LightGBM-training',params=param)
Neptune parameters

You’re now set to train the LightGBM model. While training, you’ll use Neptune’s LightGBM callback to log the training process. You need to install `neptune-contrib[monitoring]`. 

Next, you’ll import the `neptune_monitor` callback and pass it to LightGBM’s `train` method.

from neptunecontrib.monitoring.lightgbm import neptune_monitor
import lightgbm as lgb
lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)

gbm = lgb.train(params,
    valid_sets=[lgb_train, lgb_eval],

Once the training process is over, you can head back to the Neptune web UI to see the experiments and compare results. 

Neptune new UI

Neptune also logs model metrics. For instance, let’s look at how you can log the mean absolute error, mean squared error and the root mean squared error. You can log all sorts of metrics using the `log_metric` function.

predictions = gbm.predict(X_test)
import numpy as np 
from sklearn.metrics import mean_squared_error, mean_absolute_error
neptune.log_metric('Root Mean Squared Error', np.sqrt(mean_squared_error(y_test, predictions)))
neptune.log_metric('Mean Squarred Error', mean_squared_error(y_test, predictions))
neptune.log_metric('Mean Absolute Error', mean_absolute_error(y_test, predictions))
Neptune logs

Neptune also automatically logs the training and validation learning curves. You will find these under the Charts menu in the web UI. 

Neptune training curve
Neptune validation curve

It’s very important to log your trained model. This way, you can quickly move it to production. You can log all models using the `log_artifact` function.


Neptune lets you access this experiment and download the model. Access the experiment using the `get_experiments` method, and use the `download_artifact` function to download the model. This model expects the name of the artifact and the path where you would like to store it. 

project = neptune.init('mwitiderrick/LIGHTSAT',api_token='YOUR_TOKEN')
experiment = project.get_experiments(id='LIGHTSAT-5')[0]

Check out this Notebook to see more functionalities of Neptune, and play with the complete LightGBM experiment. 


MLflow is an open-source platform for tracking machine learning models, logging, and managing ML model building metadata. It also integrates with popular data science tools. Let’s take a look at the LightGBM integration (this integration is still experimental in MLflow).

The first thing we do while running experiments with MLflow is enable automatic logging of parameters and metrics.

import mlflow

You can also log the parameters manually, using the `log_param` function. The method logs parameters under the current run. It creates a new run if none is active. 

params = {'boosting_type': 'gbdt',
              'objective': regression,
              'num_leaves': 67,
              'learning_rate': 0.01,
              'feature_fraction': 0.8

mlflow.log_param("boosting_type", params["boosting_type"])
mlflow.log_param("objective", params["objective"])
mlflow.log_param("num_leaves", params["num_leaves"])
mlflow.log_param("learning_rate", params["learning_rate"])
mlflow.log_param("feature_fraction", params["feature_fraction"])

The trained LightGBM model can also be logged manually using the `log_model` function. MFlow will automatically log this when auto logging is enabled. 

from mlflow.lightgbm import log_model

You can then use this model to run predictions on new data. Load the model, and use the `predict` function to make predictions. 

import mlflow
logged_model = 'file:///Users/derrickmwiti/Downloads/mlruns/0/56cb6b76c6824ec0bc58d4426eb92b91/artifacts/lightgbm-model'

# Load model as a PyFuncModel.
loaded_model = mlflow.pyfunc.load_model(logged_model)

# Predict on a Pandas DataFrame.
import pandas as pd

You can also perform predictions on a Spark DataFrame, if you load the model as a Spark UDF

import mlflow
logged_model = 'file:///Users/derrickmwiti/Downloads/mlruns/0/56cb6b76c6824ec0bc58d4426eb92b91/artifacts/lightgbm-model'

# Load model as a Spark UDF.
loaded_model = mlflow.pyfunc.spark_udf(logged_model)

# Predict on a Spark DataFrame.
df.withColumn(loaded_model, 'my_predictions')

You can end an active MLflow run using the `mlflow.end_run()` function. Runs can be viewed anytime from the web UI. You can start the web UI by executing `mflow ui` on the terminal. 

MLflow dashboard

The web UI makes it easy to compare different runs. This will show a comparison of the different parameters and metrics. 

Mlflow web UI

You can also see a comparison of the training and validation learning curves for different runs. 

Mlflow metrics

Under the artifacts section, you will find the logged model and charts. For instance, here’s the logged feature importance for one LightGBM training run. 

MLflow feature importance

Check out the complete MLflow example here

👉 Check how MLflow compares with Neptune.

Weights and Biases

Weights and Biases is a platform for experiment tracking, model, dataset versioning, and managing ML model building metadata. To start using it, you will have to create an account and create a project. You will then initialize the project in your Python code. 

Let’s now import `wandb` and initialize a project . At this point, you can pass the parameters that you will use for the LightGBM algorithm. These will be logged and you will see them on the web UI. 

import wandb
params = {'boosting_type': 'gbdt',
          'objective': 'regression',
          'num_leaves': 40,
          'learning_rate': 0.1,
          'feature_fraction': 0.9
run = wandb.init(config=params,project='light', entity='mwitiderrick', name='light')
Wandb configuration

The next step is to use the LightGBM callback from `wandb` to visualize and log the model’s training process. Pass the `wandb_callback` to LightGBM’s `train` function.

from wandb.lightgbm import wandb_callback

gbm = lgb.train(params,
    valid_sets=[lgb_train, lgb_eval],

Weights and Biases logs scalars, such as accuracy and regression metrics. Let’s take a look at how you can log the regression metrics for each LightGBM run. Use the `wandb.log` function. 

import numpy as np 
from sklearn.metrics import mean_squared_error, mean_absolute_error
predictions = gbm.predict(X_test)
wandb.log({'Root Mean Squared Error': np.sqrt(mean_squared_error(y_test, predictions))})
wandb.log({'Mean Squared Error': mean_squared_error(y_test, predictions)})
wandb.log({'Mean Absolute Error': mean_absolute_error(y_test, predictions)})

The web UI will also log the training and validation learning plots automatically.

Wandb training and validation

You can also quickly create reports from runs. 

Wandb reports

You can save the trained LightGBM model, and log it to Weights and Biases at every run. Instantiate an empty `Artifact` instance, and then use it to log the model. Datasets can be logged similarly.

artifact = wandb.Artifact('model.pkl', type='model')

You will see the logged model under the Artifacts section of the web UI.

Wandb artifacts

You can end a particular experiment using `wandb.finish()`. Check the complete LightGBM with Weights and Biases example here

👉 Check how Weights & Biases compares with Neptune.


Sacred is an open-source machine learning experimentation tool. The tool can also be used for logging and managing ML model building metadata. When using Sacred, you first need to create an experiment. You’ll need to pass `interactive=True` if you’re running the experiment on Jupyter Notebooks.

from sacred import Experiment
ex = Experiment('lightgbm',interactive=True)

Next, define the experiment configuration using the `@ex.config` decorator. The configuration is used to define and log the parameters for the algorithm.

def cfg():
    params = {'boosting_type': 'gbdt',
              'objective': 'regression',
              'num_leaves': 40,
              'learning_rate': 0.01,
              'feature_fraction': 0.9

Next, define the run function. When running in interactive mode, this function has to be decorated with `@ex.main`. Otherwise, use `ex.automain`. This decorator is responsible for figuring out the file name where the main file is located. The function with either of these decorators is the one that’s executed when you run the experiment. For this experiment, a couple of things happen in the `run` function:

  • Training of the LightGBM model,
  • Saving the model,
  • Making predictions using the model,
  • Logging the regression metrics using the `log_scalar` method,
  • Logging the model using the `add_artifact` function.

You can also log a resource, such as a Python file, using the `add_resource` function. 

import lightgbm as lgb

def run(params):
    lgb_train = lgb.Dataset(X_train, y_train)
    lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)
    gbm = lgb.train(params,
        valid_sets=[lgb_train, lgb_eval],
    predictions = gbm.predict(X_test)
    ex.log_scalar('Root Mean Squared Error', np.sqrt(mean_squared_error(y_test, predictions)))
    ex.log_scalar('Mean Squared Error', mean_squared_error(y_test, predictions))
    ex.log_scalar('Mean Absolute Error', mean_absolute_error(y_test, predictions))

The next step is to run the experiment. 

r = ex.run()

Unfortunately, Sacred doesn’t ship with a web UI that you can use to view the experiments. You have to use an external tool for this. That brings us to the next library, Omniboard. 


Omniboard is a web-based user interface for Sacred. The tool connects to the MongoDB database used by Sacred. It then visualizes the metrics and logs collected for each experiment. To view all the information that Sacred collects, you have to create an observer. The `MongoObserver` is the default observer. It connects the MongoDB database and creates a collection with all this information. 

from sacred.observers import MongoObserver

With that in place, you can run Omniboard from the terminal.

$ omniboard -m localhost:27017:sacred

 You can then access the web UI at 

Omniboard dashboard

Clicking a run will show more information about it. For instance the metrics. 

Omniboard run info

You also see the model that was logged during the run.

Omniboard artifacts

The configuration of that run is there as well. 

Omniboard configuration

The complete example using Sacred + Omniboard is here. You will have to run the notebook on a server or your local machine, so that you can do all the setup required to run Sacred with Omniboard. 

👉 Check how Sacred+Omniboard compares with Neptune.

Final thoughts 

In this article, we ran machine learning experiments using various experiment tracking tools. You saw how to: 

  • Create runs and experiments,
  • Log models and datasets,
  • Capture all experiment metadata,
  • Compare different runs,
  • Log model parameters and metrics.

Hope you’ve learned something new. Thanks for reading!


Setting up a Scalable Research Workflow for Medical ML at AILS Labs [Case Study]

8 mins read | Ahmed Gad | Posted June 22, 2021

AILS Labs is a biomedical informatics research group on a mission to make humanity healthier. That mission is to build models which might someday save your heart from illness. It boils down to applying machine learning to predict cardiovascular disease development based on clinical, imaging, and genetics data.

Four full-time and over five part-time team members. Bioinformaticians, physicians, computer scientists, many on track to get PhDs. Serious business.

Although business is probably the wrong term to use because user-facing applications are not on the roadmap yet, research is the primary focus. Research so intense that it required a custom infrastructure (which took about a year to build) to extract features from different types of data:

  • Electronic health records (EHR),
  • Diagnosis and treatment information (time-to-event regression methods),
  • Images (convolutional neural networks),
  • Structured data and ECG.

With a fusion of these features, precise machine learning models can solve complex issues. In this case, it’s risk stratification for primary cardiovascular prevention. Essentially, it’s about predicting which patients are most likely to get cardiovascular disease

AILS Labs has a thorough research process. For every objective, there are seven stages:

  1. Define the task to be solved (e.g., build a risk model of cardiovascular disease).
  2. Define the task objective (e.g., define expected experiment results).
  3. Prepare the dataset.
  4. Work on the dataset in interactive mode with Jupyter notebooks; quick experimenting, figuring out the best features for both the task and the dataset, coding in R or Python. 
  5. Once the project scales up, use a workflow management system like Snakemake or Prefect to transform the work into a manageable pipeline and make it reproducible. Without that, it would be costly to reproduce the workflow or compare different models.
  6. Create machine learning models using Pytorch Lightning integrated with Neptune, where some initial evaluations are applied. Log experiment data.
  7. Finally, evaluate model performance and inspect the effect of using different sets of features and hyperparameters.

5 problems of scaling up Machine Learning research

AILS Labs started as a small group of developers and researchers. One person wrote code, and another reviewed it. Not a lot of experimenting. But collaboration became more challenging, and new problems started to appear along with the inflow of new team members:

  1. Data privacy,
  2. Workflow standardization,
  3. Feature and model selection,
  4. Experiment management,
  5. Information logging.
Continue reading ->
Best Tools to Log and Manage ML Model Building Metadata

Best Tools to Log and Manage ML Model Building Metadata

Read more
GreenSteam MLOps toolstack

MLOps at GreenSteam: Shipping Machine Learning [Case Study]

Read more
MLflow alternatives

The Best MLflow Alternatives (2022 Update)

Read more
Model monitoring tools

Best Tools to Do ML Model Monitoring

Read more