A Complete Guide to Monitoring ML Experiments Live in Neptune

Posted July 21, 2020

Training machine learning or deep learning models can take a really long time.

If you are like me, you like to know what is happening during that time:

  • want to monitor your training and validation losses,
  • take a look at the GPU consumption,
  • see image predictions after every other epoch
  • and a bunch of other things.

Neptune lets you do all that, and in this post I will show you how to make it happen. Step by step.

Check out this example run monitoring experiment to see how this can look like. 

Note:

If you want to try Neptune monitoring without registration just jump to the Initialize Neptune section and start from there as an anonymous user.

Setup your Neptune account

Setting up a project and connecting your scripts to Neptune is super easy but you still need to do it 🙂

Let’s take care of that quickly.

Create a project

Let’s create a project first. 

To do that:

  • go to the Neptune app, 
  • click on New project button on the left 
  • give it a name 
  • decide whether you want it to be public or private
  • done  

Get your API token

You will need a Neptune API token (your personal key) to connect the scripts you run with Neptune.

To do that:

  • click on your user logo on the right 
  • click on Get Your API token 
  • copy your api token 
  • paste it to the environment variable (preferably ~/.bashrc or system equivalent), config file, or directly to your script if you feel really adventurous 🙂

A token is like a password so I try to keep it safe. 

Since I am a Linux guy I put it in my environment file ~/.bashrc. If you are using a different system just click on the operation system box up top and see what is recommended. 

With that, whenever I run my training scripts Neptune will know who I am and log things appropriately.

Install client library 

To work with Neptune you need a client library that deals with logging everything you care about. 

Since I am using Python I will use the Python client but you can use Neptune with other languages as well. 

You can install it with pip:

pip install neptune-client

Many monitoring helpers are available in the extension library so I suggest that you install that as well.  

pip install neptune-contrib

Initialize Neptune

Now that you have everything set up you can start monitoring things!

First, connect your script to Neptune by adding the following towards the top of your script:

import neptune

neptune.init(project_qualified_name="You/Your-Project")

Note:

If you want to try Neptune without registration you can simply use this open project and an anonymous user neptuner whose password is ‘ANONYMOUS’:

import neptune

neptune.init(api_token='ANONYMOUS',
             project_qualified_name="shared/step-by-step-monitoring-experiments-live")

Create an experiment

With Neptune you log things to experiments. Experiments are basically dictionaries that understand machine learning things. You can log whatever you want to them and Neptune will present it nicely in the UI.

To start an experiment just run:

neptune.create_experiment('step-by-step-guide')

Once you run this, a link to an experiment will appear. You can click on it and go to the Neptune UI or you can simply find your experiment directly in the Neptune app. 

Of course, there is not much there yet because we didn’t log anything. 

Let’s change that!

Note:

You can organize experiments by adding names and tags or track your model hyperparameters. Read about it here but if you simply want to monitor your runs live forget I ever said anything 🙂

Monitoring basics 

In a nutshell logging to Neptune is as simple as going:

neptune.log_TYPE('LOG_NAME', THING_I_WANT_TO_LOG)

Those could be:

  • Metrics and losses -> neptune.log_metric 
  • Images and charts -> neptune.log_image 
  • Artifacts like model files -> neptune.log_artifact
  • And others.

We’ll go into those in detail in the next sections but first let’s talk about the basics. 

Logging single values and logging in loops

Sometimes you may just want to log something once before or after the training is done.

In that case, you just run

...
neptune.log_metric('test_auc', 0.93)
neptune.log_artifact('my_model.pkl')

In other scenarios, there is a training loop inside of which you want to log things. 

Like in PyTorch:

for inputs, labels in trainloader:
     optimizer.zero_grad()
     outputs = net(inputs)
     loss = criterion(outputs, labels)
     loss.backward()
     optimizer.step()
     neptune.log_metric('loss', loss)

Your loss is logged to Neptune after every iteration and the learning curves are created automatically.

See it in Neptune

Usually, you will not interact with the training loop directly but use a callback system that makes the logging and monitoring cleaner.

We’ll talk about those next.

Logging with callbacks

When your framework has a callback system it lets you hook your monitoring functions in different places of the training loop without actually changing the training loop. 

It could be epoch end, batch end, or training start. You just specify at which point of training it should be executed. 

for epoch in epochs:
    callback.on_epoch_start()
    for batch in dataloader:
        callback.on_batch_start()
        do_stuff(batch)
        callback.on_batch_end()
    callback.on_epoch_end()

For example, in Keras you can create your Callback and specify that you want to log metrics after every epoch. 

class MonitoringCallback(Callback):
    def on_epoch_end(self, logs={}):
        for metric_name, metric_value in logs.items():
            neptune.log_metric(metric_name, metric_value)

And pass that callback to the fit method.

...
model.fit(..., callbacks=[MonitoringCallback()])

Your training metrics will be logged to Neptune automatically:

See it in Neptune

Most ML frameworks have some callback system in place. They vary slightly but the idea is the same. 

Ok, now that you know how you can monitor but a more interesting question is what you can monitor to Neptune?

What can you monitor in Neptune?

There are a ton of different things that you can log to Neptune and monitor live. 

Metrics and learning curves, hardware consumption, model predictions, ROC curves, console logs, and more can be logged for every experiment and explored live.

Let’s go over those one by one.

Monitor ML metrics and losses

Log evaluation metrics, or losses to a log section with the neptune.log_metric method:

neptune.log_metric('test_accuracy', 0.76)
neptune.log_metric('test_f1_score', 0.62)

If you want to log those metrics after every training iteration simply call neptune.log_metric many times on the same metric name:

for epoch in range(epoch_nr):
    # training logic
    f1_score = model.score(valid_data)
    neptune.log_metric('train_f1_score', f1_score)

A chart with the learning curve will be created automatically.

See it in Neptune

Monitor hardware resources

Those are logged automatically if you have psutil installed. 

To install it just run:

pip install psutil

and have your hardware monitored for every experiment.

Just go to the Monitoring section to see it:

See it in Neptune

Monitor console logs

Those are logged automatically if you have psutil installed. 

You can see both stderr and stdout in the Monitoring section:

See it in Neptune

Monitor image predictions

Log images to a log section with the neptune.log_image method. They can be grouped into named log sections like best_image_predictions or validation_predictions:

neptune.log_image(‘best_image_predictions’, image)
neptune.log_image(‘worst_image_predictions’, image)

If you want to log image predictions after every epoch you can log multiple images to the same log name:

for epoch in range(epoch_nr):
     # logic for plotting validation predictions on images
     neptune.log_image(‘validation_predictions_epoch’, image)

Your charts will be browsable in the validation_predictions tab of the logs section in the UI.

See it in Neptune

Monitor performance charts

Log and monitor charts like Confusion Matrix, ROC Curve, Precision Recall curve or anything else you want. There are two options to do it.

You can either log them as images with neptune.log_image and they will be presented in the logs section of the UI under a name you choose:

for epoch in range(epochs):
	# chart plotting logic
	neptune.log_image('ROC curves epoch', fig)

See it in Neptune

You can also log and update interactive html charts from bokeh, plotly or altair with the log_chart function. If you log matplotlib charts they will be automatically converted to plotly.

for epoch in range(epochs):
    # chart plotting logic
    log_chart(‘ROC curve’, fig)

See it in Neptune

Note:

If you want to create a new interactive chart after every epoch you need to give them different names:

for epoch in range(epochs):
    # chart plotting logic
    log_chart('ROC curve epoch:{}'format(epoch), fig)

Monitor text predictions

Log text to a log section with the neptune.log_text method. There can be many named text log subsections if you want to:

neptune.log_text('preds_head', str(text_predictions.head()))
neptune.log_text('preds_tail', str(text_predictions.tail()))

As before, you can log things after every training epoch or batch if you want to. Just call neptune.log_text multiple times:

for iteration in range(iter_nr):
     # logic for getting parameters tried during this run
     neptune.log_image('run parameters', param_dictionary)

Your charts will be browsable in the “validation_predictions” tab of the “logs” section of the UI.

See it in Neptune

Monitor file updates

You can log model checkpoints, prediction .csv files or anything else after every epoch and see it in the app:

for epoch in epochs:
    # create predictions.csv
    neptune.log_artifact('predictions.csv')
    neptune.log_artifact('model_checkpoint.pkl')

See it in Neptune

Compare running experiments with previous ones

The cool thing about monitoring ML experiments in Neptune is that you can compare running experiments with your previous ones. 

It makes it easy to decide whether the model that are training is showing promise of improvement. If it doesn’t you can even abort the experiment from the UI.

To do that:

  • go to the experiment dashboard
  • select a few experiments
  • click compare to overlay learning curves and show diffs in parameters and metrics
  • click abort on the running ones if you no longer see the point in training

See it in Neptune

Share running experiments with others with a link

You can share your running experiments by copying the link to the experiment and sending it to someone. 

Just like I am sharing this experiment with you here:

https://ui.neptune.ai/o/shared/org/step-by-step-monitoring-experiments-live/e/STEP-22

The cool thing is you can send people directly to a part of your experiment that is interesting like code, hardware consumption charts, or learning curves. You can share the experiment comparisons with links as well. 

See it in Neptune

Use integrations to monitor training in your frameworks

Neptune comes with a bunch of framework integrations to make the monitoring even easier. 

Let me show you how it usually works with two examples: Keras and Optuna.

Monitor deep learning models: Keras

Instead of creating the monitoring callback in Keras you can use the one available in `neptune-contrib`.

Simply import it and pass to the model.fit. Don’t forget to create the experiment first.

import neptune
from neptunecontrib.monitoring.keras import NeptuneMonitor

neptune.init()
neptune.create_experiment('my-keras-experiment')
#
# your logic
#
model.fit(x_train,
          y_train,
          epochs=42,
          callbacks=[NeptuneMonitor()])

Monitor hyperparameter optimization: Optuna

Parameter tuning framework Optuna also has a callback system that you can plug Neptune in nicely. All the results are logged and updated after every parameter search iteration.

import neptune
from neptunecontrib.monitoring.optuna import NeptuneCallback

neptune.init()
neptune.create_experiment('my-optuna-experiment')
#
# your logic
#
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=100, 
               callbacks=[NeptuneCallback(log_charts=True])

Note:

Neptune has 20+ framework integrations (and counting) so check them out and see if your frameworks are available or drop us a comment and we may just build it for you!

Final thoughts

With all this information you should be able to monitor every piece of the machine learning experiment that you care about.

For even more info you can:

Happy experiment monitoring!

Jakub Czakon
Senior Data Scientist

Topics

Recent posts