MLOps Blog

A Complete Guide to Monitoring ML Experiments Live in Neptune

Jakub Czakon

5 min

18th March, 2024

ML Model Development

Training machine learning or deep learning models can take a really long time.

If you are like me, you like to know what’s happening during that time and you’re probably interested in:

monitoring your training and validation losses,
looking at the GPU consumption,
seeing image predictions after every other epoch
and a bunch of other things.

Neptune lets you do all that, and in this post, I will show you how to make it happen. Step by step.

Check out this example run to see what this can look like in the Neptune app.

If you want to try Neptune monitoring without registration check this Quickstart tutorial.

Set up your Neptune account

Setting up a project and connecting your scripts to Neptune is super easy, but you still need to do it 🙂

Let’s take care of that quickly.

1. Create a project

Let’s create a project first.

To do that:

go to the Neptune app,
click on New project button on the left,
give it a name,
decide whether you want it to be public or private,
done.

2. Get your API token

You will need a Neptune API token (your personal key) to connect the scripts you run with Neptune.

To do that:

click on your user logo on the right
click on Get Your API token
copy your API token
paste it to the environment variable, config file, or directly to your script if you feel really adventurous 🙂

A token is like a password, so I try to keep it safe.

Since I am a Linux guy I put it in my environment file ~/.bashrc. If you are using a different system, check the API token section in the documentation.

With that, whenever you run my training scripts, Neptune will know who you are and log things appropriately.

3. Install client library

To work with Neptune, you need a client library that deals with logging everything you care about.

Since I am using Python, I will use the Python client, but you can use Neptune with R language as well.

You can install it with pip:

pip install neptune

4. Initialize Neptune

Now that you have everything set up, you can start monitoring things!

First, connect your script to Neptune by adding the following towards the top of your script:

import neptune

run = neptune.init_run(
    project="workspace-name/project-name",
    api_token="Your Neptune API token",
)

5. Create a run

Use the init_run() method to create a new run. We started a run when we executed neptune.init_run() above.

The started run then tracks some system metrics in the background, plus whatever metadata you log in your code. By default, Neptune periodically synchronizes the data with the servers in the background. Check what exactly Neptune logs automatically.

The connection to Neptune remains open until the run is stopped or the script finishes executing. You can explicitly stop the run by calling run.stop().

But what’s a run?

A ‘run’ is a namespace inside a project where you can log model-building metadata.

Typically, you create a run every time you execute a script that does model training, re-training, or inference. Runs can be viewed as dictionary-like structures that you define in your code.

They have:

Fields, where you can log your ML metadata
Namespaces, which organize your fields

Whatever hierarchical metadata structure you create, Neptune reflects them in the UI.

To create a structured namespace, use a forward slash / like this:

run["metrics/f1_score"] = 0.67
run["metrics/test/roc"] = 0.82

The snippet above:

Creates two namespaces: metrics and metrics/test.
Assigns values to fields f1_score and roc.

For the full list of run arguments, you can refer to Neptune’s API documentation.

Monitoring experiments in Neptune: methods

Logging basic stuff

In a nutshell, logging into Neptune is as simple as going:

run["WHAT_YOU_WANT_TO_LOG"] = ITS_VALUE

Let’s take a look at some different ways in which you can log important things to Neptune.

You can log:

Metrics and losses -> run["accuracy"]=0.90
Images and charts -> run["images"].upload("bboxes.png");
Artifacts like model files -> run["model_checkpoints"].upload("my_model.pt")
And many other things.

Sometimes you may just want to log something once before or after the training is done.

In that case, just do:

params = {
    "activation": "sigmoid",
    "dropout": 0.25,
    "learning_rate": 0.1,
    "n_epochs": 100,
}

In other scenarios, there is a training loop inside which you might want to log a series of values. For this, we use the .append() function.

for epoch in range(params["n_epochs"]):
    # this would normally be your training loop
    run["train/loss"].append(0.99*epoch)
    run["train/acc"].append(1.01*epoch)
    run["eval/loss"].append(0.98*epoch)
    run["eval/acc"].append(1.02*epoch)

This creates the namespaces “train” and “eval”, each with a loss and acc field.

You can see these visualized as charts in the app later.

Logging with integrations

To make logging easier, we created integrations for most of the Python ML libraries, including PyTorch, TensorFlow, Keras, scikit-learn, and more. You can see all the Neptune integrations here. These integrations give you out-of-the-box utilities that log most of the ML metadata you would normally log in those ML libraries. Let’s check a few examples.

Monitor TensorFlow/Keras models

The Neptune–Keras integration logs the following metadata automatically:

Model summary
Parameters of the optimizer used for training the model
Parameters passed to model.fit during the training
Current learning rate at every epoch
Hardware consumption and stdout/stderr output during training
Training code and Git information

To log metadata as you train your model with Keras, you can use NeptuneCallback in the following manner.

from neptune.integrations.tensorflow_keras import NeptuneCallback

run = neptune.init_run()
neptune_cbk = NeptuneCallback(run=run)

model.fit(
    x_train,
    y_train,
    epochs=5,
    batch_size=64,
    callbacks=[neptune_cbk],
)

Your training metrics will be logged to Neptune automatically:

Check the docs to learn more about what you can do with Neptune-Keras integration.

Monitor time series Prophet models

Prophet is a popular time-series forecasting library. With the Neptune–Prophet integration, you can keep track of parameters, forecast data frames, residual diagnostic charts, cross-validation folds, and other metadata while training models with Prophet.

Here’s an example of how to log relevant metadata regarding your Prophet model all at once.

import pandas as pd
from prophet import Prophet
import neptune
import neptune.integrations.prophet as npt_utils

run = neptune.init_run()

dataset = pd.read_csv(
    "https://raw.githubusercontent.com/facebook/prophet/main/examples/example_wp_log_peyton_manning.csv"
)
model = Prophet()
model.fit(dataset)

run["prophet_summary"] = npt_utils.create_summary(
    model, dataset, log_interactive=True
)

Monitor experiments Neptune Prophet — See this example in the Neptune app

Check the docs to know more about Neptune-Prophet integration.

Monitor Optuna hyperparameter optimization

Parameter tuning framework Optuna, also has a callback system that you can plug Neptune in nicely. All the results are logged and updated after every parameter search iteration.

import neptune.integrations.optuna as optuna_utils

run = neptune.init_run()
neptune_callback = optuna_utils.NeptuneCallback(run)

study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=20, callbacks=[neptune_callback])

Monitor experiments Neptune Optuna — See this example in the Neptune app

Visit the docs to learn more about the Neptune-Optuna integration.

Most ML frameworks have some callback system in place. They vary slightly, but the idea is the same. You can take a look at the entire list of tools that Neptune supports. In case you are unable to find your framework in this list, you can always resort to the good old way of logging via Neptune Client, as discussed above already.

What can you monitor in Neptune?

There are a ton of different things that you can log to Neptune and monitor live.

Metrics and learning curves, hardware consumption, model predictions, ROC curves, console logs, and more can be logged for every experiment and explored live.

Let’s go over a few of them, one by one.

Monitor ML metrics and losses

You can log scores and metrics as single values, with = assignment, or as series of values, with the log() method.

# Log scores (single value)
run["score"] = 0.97
run["test/acc"] = 0.97

# Log metrics (series of values)
for epoch in range(100):
    # your training loop
    acc = ...
    loss = ...
    metric = ...

    run["train/accuracy"].append(acc)
    run["train/loss"].append(loss)
    run["metric"].append(metric)

Monitor metrics Neptune — See this example in the Neptune app

Monitor hardware resources and console logs

These are actually logged to Neptune automatically:

run = neptune.init_run(capture_hardware_metrics=True)

Just go to the Monitoring section to see it:

Monitor hardware metrics Neptune — See this example in the app

Monitor image predictions

You can log either a single image or a series of images (example below).

from neptune.types import File

for name in misclassified_images_names:
    y_pred = ...
    y_true = ...
    run["misclassified_imgs"].append(File("misclassified_image.png"))

They will be visible in the image gallery in the app:

Neptune example_gallery — See this example in the app

Monitor file updates

You can save model weights from any deep learning framework by using the upload() method. In the below example, they’re logged under a field called my_model in the namespace model_checkpoints.

# Log PyTorch model weights
my_model = ...
torch.save(my_model, "my_model.pt")
run["model_checkpoints/my_model"].upload("model_checkpoints/my_model.pt")

Model checkpoints appear in the All metadata section.

Compare running experiments with previous ones

The cool thing about monitoring ML experiments in Neptune is that you can compare running experiments with your previous ones.

It makes it easy to decide whether the model that you are training is showing promise of improvement. If it doesn’t you can even abort the experiment from the UI.

To do that:

go to the experiment dashboard
select a few experiments
click compare to overlay learning curves and show diffs in parameters and metrics
click abort on the running ones if you no longer see the point in training

Apart from comparing experiments using charts, you can also compare them in the side-by-side table format view or as parallel coordinates. And if you log any images, it’s also possible to compare them. See the docs about comparison options.

Finally, you can share your running experiments by copying the link to the experiment and sending it to someone.

Just like I am sharing this experiment with you here:

https://ui.neptune.ai/o/shared/org/step-by-step-monitoring-experiments-live/e/STEP-22

The cool thing is you can send people directly to a part of your experiment that you want to show them, like code, hardware consumption charts, or learning curves. You can share the experiment comparisons with links as well.