Working with PyTorch Lightning and wondering which logger should you choose to keep track of your experiments?
Want to find a good way to save hyperparameters, metrics, and other model-building metadata?
Thinking of using PyTorch Lightning to structure your Deep Learning code and wouldn’t mind learning about its logging functionality?
Didn’t know that Lightning has a pretty awesome Neptune integration?
This article is (very likely) for you.
Why PyTorch Lightning and Neptune?
If you have never heard of it, PyTorch Lightning is a very lightweight wrapper on top of PyTorch which is more like a coding standard than a framework. The format allows you to get rid of a ton of boilerplate code while keeping it easy to follow.
The result is a framework that gives researchers, students, and production teams the ultimate flexibility to try crazy ideas without having to learn yet another framework while automating away all the engineering details.
Some great features that you can get out-of-the-box are:
- Train on CPU, GPU, or TPUs without changing your code,
- Trivial multi-GPU and multi-node training
- Trivial 16-bit precision support
- Built-in performance profiler (Trainer(profile=True))
and a ton of other great functionalities.
Read also
8 Creators and Core Contributors Talk About Their Model Training Libraries From PyTorch Ecosystem
But with this great power of running experiments easily and flexibility in tweaking anything you want, comes a problem.
How to keep track of all the changes like:
- losses and metrics,
- hyperparameters
- model binaries
- validation predictions
and other things that will help you organize your experimentation process?
PyTorch Lightning loggers
Fortunately, PyTorch Lightning gives you an option to easily connect loggers to the pl.Trainer and one of the supported loggers that can track all of the things mentioned before (and many others) is the NeptuneLogger which saves your experiments in⊠you guessed it, Neptune.
Neptune not only tracks your experiment artifacts but also:
- lets you monitor everything live,
- gives you a nice UI where you can filter, group, and compare various experiment runs
- lets you access experiment data that you logged programmatically from a Python script or Jupyter Notebook
The best part is that this integration really is trivial to use.
Let me show you how it looks.
TIP
You can also check out this colab notebook and play with the examples we will talk about yourself.
PyTorch Lightning logging: basic integration (save hyperparameters, metrics, and more)
In the simplest case, you just create the NeptuneLogger
:
from pytorch_lightning.loggers import NeptuneLogger
neptune_logger = NeptuneLogger(
api_key="ANONYMOUS",
project_name="shared/pytorch-lightning-integration")
and pass it to the logger argument of Trainer
and fit your model.
from pytorch_lightning import Trainer
trainer = Trainer(logger=neptune_logger)
trainer.fit(model)
By doing so you automatically:
- Log metrics and losses (and get the charts created),
- Log and save hyperparameters (if defined via lightning hparams),
- Log hardware utilization
- Log Git info and execution script
Check out this experiment.
You can monitor your experiments, compare them, and share them with others.
Not too bad for a 4-liner.
But with just a bit more effort you can get a lot more.
PyTorch Lightning logging: advanced options
Neptune gives you a lot of customization options and you can simply log more experiment-specific things, like image predictions, model weights, performance charts, and more.
All of that functionality is available for Lightning users and in the next sections, I will show you how to leverage Neptune to the fullest.
Logging extra information at NeptuneLogger creation
When you are creating the logger you can log additional useful information:
- code: snapshot scripts, jupyter notebooks, config files, and more,
- hyperparameters: log learning rate, number of epochs, and other things (if you are using lightning
hparams
object from lightning it will be logged automatically) - properties: log data locations, data versions, or other things
- tags: add tags like âresnet50â or âno-augmentationâ to organize your runs.
Just pass this information to your logger:
neptune_logger = NeptuneLogger(
api_key="ANONYMOUS",
project="shared/pytorch-lightning-integration",
tags=["pytorch-lightning", "mlp"],
)
Logging extra things during training with PyTorch Lightning
A lot of interesting information can be logged during training.
You may be interested in monitoring things like:
- model predictions after each epoch (think prediction masks or overlaid bounding boxes)
- diagnostic charts like ROC AUC curve or Confusion Matrix
- model checkpoints, or other objects
It is really simple. Just go to your LightningModule
and call methods of the Neptune experiment available as self.logger.experiment
.
For example, we can log histograms of losses after each epoch:
class CoolSystem(pl.LightningModule):
def validation_epoch_end(self, outputs):
# OPTIONAL
avg_loss = torch.stack([x['val_loss'] for x in outputs]).mean()
# log debugging images like histogram of losses
fig = plt.figure()
losses = np.stack([x['val_loss'].numpy() for x in outputs])
plt.hist(losses)
neptune_logger.experiments['loss_histograms'].log(File.as_image(fig))
plt.close(fig)
return {'avg_val_loss': avg_loss}
Other things you may want to log during training are:
neptune_logger.experiment["your/metadata/metric"].log(metric)
# log custom metricsneptune_logger.experiment["your/metadata/text"].log(text)
# log text valuesneptune_logger.experiment["your/metadata/file"].upload(artifact)
# log filesÂneptune_logger.experiment["your/metadata/figure"].upload(File.as_image(artifact))
# log images, chartsneptune_logger.experiment["properties/key"] = value
# add key value pairsneptune_logger.experiment["sys/tags"].add(['tag1', 'tag2'])
# add tags for organization
Pretty cool right?
But ⊠that is not all you can do!
Logging things after PyTorch Lightning training has finished
Tracking your experiment doesnât have to finish after your .fit loop ends.
You may want to track the metrics of the trainer.test(model)
or calculate some additional validation metrics and log them.
To do that you just need to tell NeptuneLogger
not to close after fit:
neptune_logger = NeptuneLogger(
api_key="ANONYMOUS",
project_name="shared/pytorch-lightning-integration",
...
)
⊠and you can keep logging đ
Test metrics:
trainer.test(model)
Additional (external) metrics:
from sklearn.metrics import accuracy_score
...
accuracy = accuracy_score(y_true, y_pred)
neptune_logger.experiment['test/accuracy'].log(accuracy)
Performance charts on test set:
from scikitplot.metrics import plot_confusion_matrix
import matplotlib.pyplot as plt
...
fig, ax = plt.subplots(figsize=(16, 12))
plot_confusion_matrix(y_true, y_pred, ax=ax)
neptune_logger.experiment['test/confusion_matrix'].upload(File.as_image(fig))
The whole model checkpoints directory:
neptune_logger.experiment('checkpoints').upload('my/checkpoints')
Go to this experiment to see how those objects are logged:
But ⊠there is even more!
Neptune lets you fetch experiments after training.
Let me show you how.
Fetching your PyTorch Lightning experiment information directly to the notebooks
You can fetch experiments after they have finished, analyze the results, and update metrics, artifacts, or other things if you want to.
For example, letâs fetch the experiments dashboard to a pandas DataFrame:
import neptune.new as neptune
project = neptune.init('shared/pytorch-lightning-integration')
project.fetch_runs_table().to_pandas()
or fetch a single experiment and update it with some external metric calculated after training:
exp = neptune.init(project='shared/pytorch-lightning-integration', id='PYTOR-63')
exp['some_external_metric'].log(0.92)
or fetch a single experiment and update it with some external metric calculated after training:
exp = project.get_experiments(id='PYTOR-63')[0]
exp.log_metric('some_external_metric', 0.92)
As you can see there are a lot of things you can log to Neptune from Pytorch Lightning.
If you want to go deeper into this:
- read the integration docs
- go check out Neptune to see other things it can do,
- try out Lightning + Neptune on Colab
Final thoughts
Pytorch Lightning is a great library that helps you with:
- organizing your deep learning code to make it easily understandable to other people,
- outsourcing development boilerplate to a team of seasoned engineers,
- accessing a lot of state-of-the-art functionalities with almost no changes to your code
With Neptune integration, you get some additional things for free:
- you can monitor and keep track of your deep learning experiments
- you can share your research with other people easily
- you and your team can access experiment metadata and collaborate more efficiently.
Hopefully, with all that power you will know exactly what you (and other people) tried and your deep learning research will be moving at a lightning speed
Full PyTorch Lightning tracking script
pip install --upgrade torch pytorch-lightning>=1.5.0
neptune-client
matplotlib scikit-plot
import os
import numpy as np
import neptune.new as neptune
import torch
from torch.nn import functional as F
from torch.utils.data import DataLoader
from torchvision.datasets import MNIST
from torchvision import transforms
import matplotlib.pyplot as plt
import pytorch_lightning as pl
MAX_EPOCHS=15
LR=0.02
BATCHSIZE=32
CHECKPOINTS_DIR = 'my_models/checkpoints'
class CoolSystem(pl.LightningModule):
def __init__(self):
super(CoolSystem, self).__init__()
# not the best model...
self.l1 = torch.nn.Linear(28 * 28, 10)
def forward(self, x):
return torch.relu(self.l1(x.view(x.size(0), -1)))
def training_step(self, batch, batch_idx):
# REQUIRED
x, y = batch
y_hat = self.forward(x)
loss = F.cross_entropy(y_hat, y)
self.log('train/loss', loss)
return {'loss': loss}
def validation_step(self, batch, batch_idx):
# OPTIONAL
x, y = batch
y_hat = self.forward(x)
loss = F.cross_entropy(y_hat, y)
self.log('val/loss', loss)
return {'val_loss': loss}
def validation_epoch_end(self, outputs):
# OPTIONAL
avg_loss = torch.stack([x['val_loss'] for x in outputs]).mean()
fig = plt.figure()
losses = np.stack([x['val_loss'].numpy() for x in outputs])
plt.hist(losses)
neptune_logger.experiment['imgs/loss_histograms'].upload(neptune.types.File.as_image(fig))
return {'avg_val_loss': avg_loss}
def test_step(self, batch, batch_idx):
# OPTIONAL
x, y = batch
y_hat = self.forward(x)
loss = F.cross_entropy(y_hat, y)
self.log('test/loss', loss)
return {'test_loss': loss}
def test_end(self, outputs):
# OPTIONAL
avg_loss = torch.stack([x['test_loss'] for x in outputs]).mean()
return {'avg_test_loss': avg_loss}
def configure_optimizers(self):
# REQUIRED
# can return multiple optimizers and learning_rate schedulers
# (LBFGS it is automatically supported, no need for closure function)
return torch.optim.Adam(self.parameters(), lr=LR)
def train_dataloader(self):
# REQUIRED
return DataLoader(MNIST(os.getcwd(), train=True, download=True, transform=transforms.ToTensor()), batch_size=BATCHSIZE)
def val_dataloader(self):
# OPTIONAL
return DataLoader(MNIST(os.getcwd(), train=True, download=True, transform=transforms.ToTensor()), batch_size=BATCHSIZE)
def test_dataloader(self):
# OPTIONAL
return DataLoader(MNIST(os.getcwd(), train=False, download=True, transform=transforms.ToTensor()), batch_size=BATCHSIZE)
from pytorch_lightning.loggers.neptune import NeptuneLogger
neptune_logger = NeptuneLogger(
api_key="ANONYMOUS",
project_name="shared/pytorch-lightning-integration",
tags=["pytorch-lightning", "mlp"],
)
model_checkpoint = pl.callbacks.ModelCheckpoint(filepath=CHECKPOINTS_DIR)
from pytorch_lightning import Trainer
model = CoolSystem()
trainer = Trainer(max_epochs=MAX_EPOCHS,
logger=neptune_logger,
checkpoint_callback=model_checkpoint,
)
trainer.fit(model)
trainer.test(model)
# Get predictions on external test
import numpy as np
model.freeze()
test_loader = DataLoader(MNIST(os.getcwd(), train=False, download=True, transform=transforms.ToTensor()), batch_size=256)
y_true, y_pred = [],[]
for i, (x, y) in enumerate(test_loader):
y_hat = model.forward(x).argmax(axis=1).cpu().detach().numpy()
y = y.cpu().detach().numpy()
y_true.append(y)
y_pred.append(y_hat)
if i == len(test_loader):
break
y_true = np.hstack(y_true)
y_pred = np.hstack(y_pred)
# Log additional metrics
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_true, y_pred)
neptune_logger.experiment['test/accuracy'].log(accuracy)
# Log charts
from scikitplot.metrics import plot_confusion_matrix
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(16, 12))
plot_confusion_matrix(y_true, y_pred, ax=ax)
neptune_logger.experiment['confusion_matrix'].log(File.as_image(fig))
# Save checkpoints folder
neptune_logger.experiment('checkpoints').upload(CHECKPOINTS_DIR)
# You can stop the experiment
neptune_logger.experiment.stop()