How to Monitor Machine Learning and Deep Learning Experiments

Posted June 22, 2020

Training machine learning/deep learning models can take a really long time, and understanding what is happening as your model is training is absolutely crucial.

Typically you can monitor:

  • Metrics and losses
  • Hardware resource consumption
  • Errors, Warnings, and other logs kept (stderr and stdout)

Depending on the library or framework, this can be easier or more difficult, but pretty much always it is doable.

Most libraries allow you to monitor your model training in one of the following ways:

  • You can add a monitoring function at the end of the training loop
  • You can add a monitoring callback either on iteration (batch) or epoch end.
  • Some monitoring tools can hook to the training loop magically by parsing logs or monkey patching. 

Let me show how to monitor machine learning models in each case.

How to add monitor function in the training loop

Some frameworks, especially lower-level ones, don’t have an elaborate callback system in place, and you have direct access to the training loop.

One such framework example is PyTorch. 

A typical training loop looks like this:

for inputs, labels in trainloader:
   outputs = net(inputs)
   loss = criterion(outputs, labels)

And you can add monitoring in the following way:

for inputs, labels in trainloader:
   outputs = net(inputs)
   loss = criterion(outputs, labels)
   neptune.log_metric('loss', loss)

Of course, you can monitor more things than just the loss. 

In that case, you should create a function that takes outputs and labels and creates all the metrics you care about, like accuracy, confusion matrix, and others.

def monitoring_function(outputs, labels):
   acc = accuracy_score(outputs, labels)
   loss = criterion(outputs, labels) 

   fig = plt.figure()
   confusion_matrix = contusion_matrix(outputs, labels)

   neptune.log_metric('accuracy', acc)
   neptune.log_metric('loss', loss)
   neptune.log_image('performance_charts', fig)

And place it after optimizer.step()

Inserting your monitoring function directly inside the training loop is not the most convenient option, but it gives you a lot of flexibility, and sometimes, there is just no other way.

How to add monitoring callback to the machine/deep learning framework

Most machine learning frameworks have a callback system that lets you hook in your monitoring functions in different places of the training loop without actually changing the training loop.

Let me show you how it works.

The typical training loop looks like this:

 for epoch in epochs:
    for batch in dataloader:

And you can create places in that loop where the callback object will be called:

for epoch in epochs:
    for batch in dataloader:

Then when you create your monitoring callback, you need to overwrite callback methods.

For example, in Keras, you can create a custom monitoring callback by inheriting from the keras.callbacks.Callback class and overriding .on_epoch_end() or .on_batch_end methods. 

class MonitoringCallback(Callback):

     def on_epoch_end(self, epoch, logs=None):
          for metric_name, metric_value in logs.items():
               neptune.log_metric(metric_name, metric_value)   

And pass it to the appropriate fit method., callbacks=[MonitoringCallback()])


Neptune has callback implementations for most major machine learning frameworks, so you don’t have to implement those callbacks and can use the ones we created.

For example, in the popular Catalyst deep learning framework, you need to import the logger:

from catalyst.contrib.dl.callbacks.neptune import NeptuneLogger

neptune_logger = NeptuneLogger(...)

And pass it to the runner:

from catalyst.dl import SupervisedRunner

runner = SupervisedRunner()
runner.train(..., callbacks=[neptune_logger])

For a full list of supported integrations, go to the documentation.

How to track your machine/deep learning models “magically”

In some frameworks, you can “magically” hook into the framework training loop by monkey-patching default loggers.

For example, you could take the keras callback we implemented before and make it a default.

We just need to overwrite (monkey-patch) what keras thinks is the default BaseLogger.

def use_monitoring_magic():
    from keras.callbacks import BaseLogger, Callback 

    class MonitoringCallback(Callback):
         def on_epoch_end(self, logs={}):
              for metric_name, metric_value in logs.items():
                   neptune.log_metric(metric_name, metric_value)     

     keras.callbacks.BaseLogger = MonitoringCallback


This is exactly how we implemented neptune integration with keras.

import neptune_tensorboard as neptune_tb

# your training logic, y_train)

You can check out the full code example in the docs.

Final thoughts

In this article, you’ve learned:

  • How to add monitoring callbacks to deep learning frameworks
  • How to add a monitor function to the model training loop
  • How for some frameworks you can add model monitoring “magically”   

I hope that with all that knowledge, you will be able to monitor your machine learning model however you train them!

Senior Data Scientist


Get started with Neptune in 5 minutes

If you are looking for an experiment tracking tool you may want to take a look at Neptune. 

It takes literally 5 minutes to set up and as one of our happy users said:

“Within the first few tens of runs, I realized how complete the tracking was – not just one or two numbers, but also the exact state of the code, the best-quality model snapshot stored to the cloud, the ability to quickly add notes on a particular experiment. My old methods were such a mess by comparison.” – Edward Dixon, Data Scientist @intel

To get started follow these 4 simple steps. 

Step 1

Install the client library.

pip install neptune-client

Step 2

Connect to the tool by adding a snippet to your training code. 

For example:

import neptune

neptune.init(...) # credentials
neptune.create_experiment() # start logger

Step 3

Specify what you want to log:

neptune.log_metric('accuracy', 0.92)

for prediction_image in worst_predictions:
    neptune.log_image('worst predictions', prediction_image)

Step 4

Run your experiment as you normally would:


And that’s it!

Your experiment is logged to a central experiment database and displayed in the experiment dashboard, where you can search, compare, and drill down to whatever information you need.

Get your free account ->