Blog » ML Tools » How to Keep Track of Experiments in PyTorch Using Neptune

How to Keep Track of Experiments in PyTorch Using Neptune

Machine Learning development seems a lot like conventional software development since both of them require us to write a lot of code. But it’s not! Let us go through some points to understand this better.

  • Machine Learning code doesn’t throw errors (of course I’m talking about semantics), the reason being, even if you configured a wrong equation in a neural network, it’ll still run but will mess up with your expectations. In the words of Andrej Karpathy, “Neural Networks fail silently”.
  • Machine Learning code/project heavily relies on the reproducibility of results. That means if a hyperparameter is nudged or there’s a change in training data then it can affect the model’s performance in many ways. This means you’ve to jot down every change in hyperparameter and training data to be able to reproduce your work.
    When the network is small this can be done in a text-file but what if it’s a bigger project with 10s or 100s of hyperparameters? text-file not so easy now huh!
  • Increased complexity in Machine Learning projects means increased complex branching which has to be tracked and stored for future analysis.
  • Machine Learning also requires heavy computation that comes at a cost. You definitely don’t want your cloud costs to skyrocket.

Tracking experiments in an organized way helps with all of these core issues. Neptune is a complete tool that helps individuals and teams to track their experiments smoothly. It presents a host of features and presentation options that helps in tracking and collaboration easier.

Experiment tracking with Neptune

Conventional tracking procedures involved saving the logging object as text or CSV file, which is super convenient but is of no use for future analysis pertaining to the messy structure of the output logs. The image below tells this story in a pictorial format:

Neptune pytorch tracking

Although readable you’ll quickly lose interest. After some time you may lose the file also – nobody expects sudden disk failures or overzealous cleansing! 

So, in a nutshell, the txt way is convenient but not recommended. To solve this Neptune-AI tracks each one of the hyperparameters including those of the models and the training procedure in a way that you can communicate with your team efficiently as well as analyze the training procedures in the future to optimize them further. Below is a similar experiment but tracked using the Neptune-AI tool:

Neptune pytorch exp tracking

Setting up Neptune experiment in Pytorch

The process of setup is trivial. First sign-up for an account here, this will create a unique-id and dashboard where you can see all your experiments. You can always add your team members and collaborate on experiments. Follow these steps to get your unique id (to be used while setup).


RELATED ARTICLES
👉 Automatically track PyTorch Ignite model training progress to Neptune
👉 Automatically log PyTorchLightning metrics to Neptune


Neptune API

To use this dashboard from your training procedure in python, Neptune-AI developers have developed an easy to use package which you can install via pip:

pip install neptune-client

After completing installation you need to initialize Neptune like this:

import neptune
# The init() function initializes neptune object for your script
NEPTUNE_API_TOKEN="<api-token-here>"
neptune.init('<username>/sandbox', api_token=NEPTUNE_API_TOKEN)
neptune.create_experiment('Pytorch-Neptune-Example')

Now let’s see how you can utilize Neptune’s dashboard from your PyTorch script:

Basic metrics integration

Let’s start with tracking usual metrics like train/test loss, epoch loss, and gradients. To do this you just have to put neptune.log_metric() and give in the required parameter. This will go something like this in your PyTorch training loop:

def train(model, device, train_loader, optimizer, epoch):
   model.train()
   for batch_idx, (data, target) in enumerate(train_loader):
       data, target = data.to(device), target.to(device)
      
       optimizer.zero_grad()
      
       output = model(data)
      
       loss = F.nll_loss(output, target)
       loss.backward()
      
       optimizer.step()
      
       # creating a logging object so that you can track it on Neptune dashboard
       neptune.log_metric('Train loss', loss)
      
              
       if batch_idx % 100 == 0:
           print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
               epoch, batch_idx * len(data), len(train_loader.dataset),
               100. * batch_idx / len(train_loader), loss.item()))

After running the code above, check your Neptune dashboard, you’ll see the loss metric tracked and plotted for you to analyze.

Neptune Pytorch loss

In order to track hyperparameters, which you should, always! what you need to do is simply add such parameters during the create_experiment() call.

# Define parameters
PARAMS = {'batch_size_train': 64,
         'batch_size_test': 1000,
         'momentum': 0.5,
         'learning_rate': 0.01,
         'log_interval' : 10,
         'optimizer': 'Adam'}

# Pass parameters to create experiment
neptune.create_experiment('Pytorch-Neptune-Example',params=PARAMS,tags=['classification', 'pytorch', 'neptune'])

After running the experiment again with these changes you’ll see all your parameters in the dashboard like this:

Neptune Pytorch parameters

The single biggest purpose of adding parameters and tagging is to plug everything in one dashboard so the analysis could be done for optimization or feature changes in the future easily without scorching through the code.

Advanced options

Neptune gives you a lot of customization options and you can simply log more experiment-specific things, like image predictions, model weights, performance charts, and more.

All of that functionality is easily integrable for your current PyTorch script and in the next sections, I will show you how to leverage Neptune to the fullest.

While running the experiment you can log additional useful information:

  • Code: snapshot scripts, jupyter notebooks, config files, and more
  • Hyperparameters: log learning rate, number of epochs, and other things 
  • Properties: log data locations, data versions, or other things
  • Tags: add tags like “resnet50” or “no-augmentation” to organize your runs.
  • Name: every experiment deserves a meaningful name so let’s not use “default” every time

Just pass-on these as parameters, that’s how easy it is:

neptune.create_experiment('Pytorch-Neptune-Example',
                       params=PARAMS,  # Optional,
                       tags=['classification','pytorch','neptune'],
                       upload_source_files=["**/*.ipynb", "*.yaml"]                                                   )

The above code excerpt will upload your code files belonging to the regex, add tags to it which you can identify in the dashboard. Now let’s see how you can log other experiment specific things like images and model weight files:

Logging images

def train(model, device, train_loader, optimizer, epoch):
   model.train()
   for batch_idx, (data, target) in enumerate(train_loader):
       data, target = data.to(device), target.to(device)
      
       optimizer.zero_grad()
      
       output = model(data)
      
       loss = F.nll_loss(output, target)
       loss.backward()
      
       optimizer.step()
      
       # creating a logging object so that you can track it on Neptune dashboard
       neptune.log_metric('Train loss', loss)
      
       # log predicted images
       if batch_idx % 50 == 1:
           for image, prediction in zip(data, output):
               description = '\n'.join(['class {}Image.fromarray(ndarr): {}'.format(i, pred)
                                        for i, pred in enumerate(F.softmax(prediction))])
img = image.mul(255).add_(0.5).clamp_(0, 255).to('cpu', torch.uint8).numpy()
               img = Image.fromarray(img.reshape(28,28))
               neptune.log_image('predictions',
                                 img,
                                 description=description)
              
       if batch_idx % 100 == 0:
           print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
               epoch, batch_idx * len(data), len(train_loader.dataset),
               100. * batch_idx / len(train_loader), loss.item()))

After running the code with logging changes you’ll see the logged images in your Neptune’s dashboard.

 Extra things that you can log in experiments

A lot of interesting information can be logged during training. You may be interested in monitoring things like:

  • model predictions after each epoch (think prediction masks or overlaid bounding boxes)
  • diagnostic charts like ROC AUC curve or Confusion Matrix
  • model checkpoints, or other objects

For instance, we can save our model weights and configurations using the torch.save() method to a local disk as well as in Neptune’s dashboard

torch.save(model.state_dict(), 'model_dict.ckpt')

# log model
neptune.log_artifact('model_dict.ckpt')

As for the post-training analysis like ROC curves and Confusion matrices you can plot it using your favorite plotting library and log it with neptune.log_image()

from scikitplot.metrics import plot_confusion_matrix
import matplotlib.pyplot as plt
...
fig, ax = plt.subplots(figsize=(16, 12))
plot_confusion_matrix(y_true, y_pred, ax=ax)
neptune_logger.log_image('confusion_matrix', fig)

If you wish to see each and every functionality of this awesome API, head over to the documentation which contains examples with code.

You’ve reached the end!

We saw why experiment tracking is sort of a necessity in Machine Learning systems due to their silent fragility and future analysis prospects. We also saw how NeptuneAI can prove just the right tool for this task. With Neptune’s API:

  • you can monitor and keep track of your deep learning experiments
  • you can share your research with other people easily
  • you and your team can access experiment metadata and collaborate more efficiently.

You can find the code used in this notebook here.

That’s it for now, stay tuned for more! Adios!


NEXT STEPS

How to get started with Neptune in 5 minutes

1. Create a free account
Sign up
2. Install Neptune client library
pip install neptune-client
3. Add logging to your script
import neptune.new as neptune

run = neptune.init('Me/MyProject')
run['params'] = {'lr':0.1, 'dropout':0.4}
run['test_accuracy'] = 0.84
Try live notebook

Pytorch

8 Creators and Core Contributors Talk About Their Model Training Libraries From PyTorch Ecosystem

Read more
Pytorch-lightning-Neptune

How to Keep Track of PyTorch Lightning Experiments with Neptune

Read more
Switching from spreadsheets

Switching from Spreadsheets to Neptune.ai and How It Pushed My Model Building Process to the Next Level

Read more
Organize Deep Learning projects

How to Organize Deep Learning Projects – Examples of Best Practices

Read more