Training machine learning or deep learning models can take a really long time.
If you are like me, you like to know what is happening during that time:
- want to monitor your training and validation losses,
- take a look at the GPU consumption,
- see image predictions after every other epoch
- and a bunch of other things.
Neptune lets you do all that, and in this post I will show you how to make it happen. Step by step.
Check out this example run monitoring experiment to see how this can look like.
Note:
If you want to try Neptune monitoring without registration just jump to the Initialize Neptune
section and start from there as an anonymous user.
Please note that due to the recent API update, this post needs some changes as well – we’re working on it! In the meantime, please check the Neptune documentation, where everything is up to date! 🥳
Setup your Neptune account
Setting up a project and connecting your scripts to Neptune is super easy but you still need to do it 🙂
Let’s take care of that quickly.
Create a project
Let’s create a project first.
To do that:
- go to the Neptune app,
- click on
New project
button on the left - give it a name
- decide whether you want it to be public or private
- done

Get your API token
You will need a Neptune API token (your personal key) to connect the scripts you run with Neptune.
To do that:
- click on your user logo on the right
- click on
Get Your API token
- copy your api token
- paste it to the environment variable (preferably ~/.bashrc or system equivalent), config file, or directly to your script if you feel really adventurous 🙂

A token is like a password so I try to keep it safe.
Since I am a Linux guy I put it in my environment file ~/.bashrc
. If you are using a different system just click on the operation system box up top and see what is recommended.
With that, whenever I run my training scripts Neptune will know who I am and log things appropriately.
Install client library
To work with Neptune you need a client library that deals with logging everything you care about.
Since I am using Python I will use the Python client but you can use Neptune with other languages as well.
You can install it with pip:
pip install neptune-client
Many monitoring helpers are available in the extension library so I suggest that you install that as well.
pip install neptune-contrib
Initialize Neptune
Now that you have everything set up you can start monitoring things!
First, connect your script to Neptune by adding the following towards the top of your script:
import neptune
neptune.init(project_qualified_name="You/Your-Project")
Note:
If you want to try Neptune without registration you can simply use this open project and an anonymous user neptuner
whose password is ‘ANONYMOUS’:
import neptune
neptune.init(api_token='ANONYMOUS',
project_qualified_name="shared/step-by-step-monitoring-experiments-live")
Create an experiment
With Neptune you log things to experiments. Experiments are basically dictionaries that understand machine learning things. You can log whatever you want to them and Neptune will present it nicely in the UI.
To start an experiment just run:
neptune.create_experiment('step-by-step-guide')
Once you run this, a link to an experiment will appear. You can click on it and go to the Neptune UI or you can simply find your experiment directly in the Neptune app.
Of course, there is not much there yet because we didn’t log anything.
Let’s change that!
Note:
You can organize experiments by adding names and tags or track your model hyperparameters. Read about it here but if you simply want to monitor your runs live forget I ever said anything 🙂
Monitoring basics
In a nutshell logging to Neptune is as simple as going:
neptune.log_TYPE('LOG_NAME', THING_I_WANT_TO_LOG)
Those could be:
- Metrics and losses ->
neptune.log_metric
- Images and charts ->
neptune.log_image
- Artifacts like model files ->
neptune.log_artifact
- And others.
We’ll go into those in detail in the next sections but first let’s talk about the basics.
Logging single values and logging in loops
Sometimes you may just want to log something once before or after the training is done.
In that case, you just run:
...
neptune.log_metric('test_auc', 0.93)
neptune.log_artifact('my_model.pkl')
In other scenarios, there is a training loop inside of which you want to log things.
Like in PyTorch:
for inputs, labels in trainloader:
optimizer.zero_grad()
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
neptune.log_metric('loss', loss)
Your loss is logged to Neptune after every iteration and the learning curves are created automatically.
Usually, you will not interact with the training loop directly but use a callback system that makes the logging and monitoring cleaner.
We’ll talk about those next.
Logging with callbacks
When your framework has a callback system it lets you hook your monitoring functions in different places of the training loop without actually changing the training loop.
It could be epoch end, batch end, or training start. You just specify at which point of training it should be executed.
for epoch in epochs:
callback.on_epoch_start()
for batch in dataloader:
callback.on_batch_start()
do_stuff(batch)
callback.on_batch_end()
callback.on_epoch_end()
For example, in Keras you can create your Callback and specify that you want to log metrics after every epoch.
class MonitoringCallback(Callback):
def on_epoch_end(self, logs={}):
for metric_name, metric_value in logs.items():
neptune.log_metric(metric_name, metric_value)
And pass that callback to the fit method.
... model.fit(..., callbacks=[MonitoringCallback()])
Your training metrics will be logged to Neptune automatically:
Most ML frameworks have some callback system in place. They vary slightly but the idea is the same.
Ok, now that you know how you can monitor but a more interesting question is what you can monitor to Neptune?
What can you monitor in Neptune?
There are a ton of different things that you can log to Neptune and monitor live.
Metrics and learning curves, hardware consumption, model predictions, ROC curves, console logs, and more can be logged for every experiment and explored live.
Let’s go over those one by one.
Monitor ML metrics and losses
Log evaluation metrics, or losses to a log section with the neptune.log_metric
method:
neptune.log_metric('test_accuracy', 0.76)
neptune.log_metric('test_f1_score', 0.62)
If you want to log those metrics after every training iteration simply call neptune.log_metric
many times on the same metric name:
for epoch in range(epoch_nr):
# training logic
f1_score = model.score(valid_data)
neptune.log_metric('train_f1_score', f1_score)
A chart with the learning curve will be created automatically.
Monitor hardware resources
Those are logged automatically if you have psutil
installed.
To install it just run:
pip install psutil
and have your hardware monitored for every experiment.
Just go to the Monitoring
section to see it:
Monitor console logs
Those are logged automatically if you have psutil
installed.
You can see both stderr
and stdout
in the Monitoring
section:
Monitor image predictions
Log images to a log section with the neptune.log_image
method. They can be grouped into named log sections like best_image_predictions
or validation_predictions
:
neptune.log_image(‘best_image_predictions’, image) neptune.log_image(‘worst_image_predictions’, image)
If you want to log image predictions after every epoch you can log multiple images to the same log name:
for epoch in range(epoch_nr):
# logic for plotting validation predictions on images
neptune.log_image(‘validation_predictions_epoch’, image)
Your charts will be browsable in the validation_predictions
tab of the logs
section in the UI.
Monitor performance charts
Log and monitor charts like Confusion Matrix, ROC Curve, Precision Recall curve or anything else you want. There are two options to do it.
You can either log them as images with neptune.log_image
and they will be presented in the logs
section of the UI under a name you choose:
for epoch in range(epochs):
# chart plotting logic
neptune.log_image('ROC curves epoch', fig)
You can also log and update interactive html charts from bokeh, plotly or altair with the log_chart
function. If you log matplotlib charts they will be automatically converted to plotly.
for epoch in range(epochs):
# chart plotting logic
log_chart(‘ROC curve’, fig)
Note:
If you want to create a new interactive chart after every epoch you need to give them different names:
for epoch in range(epochs):
# chart plotting logic
log_chart('ROC curve epoch:{}'format(epoch), fig)
Monitor text predictions
Log text to a log section with the neptune.log_text
method. There can be many named text log subsections if you want to:
neptune.log_text('preds_head', str(text_predictions.head()))
neptune.log_text('preds_tail', str(text_predictions.tail()))
As before, you can log things after every training epoch or batch if you want to. Just call neptune.log_text
multiple times:
for iteration in range(iter_nr):
# logic for getting parameters tried during this run
neptune.log_image('run parameters', param_dictionary)
Your charts will be browsable in the “validation_predictions” tab of the “logs” section of the UI.
Monitor file updates
You can log model checkpoints, prediction .csv files or anything else after every epoch and see it in the app:
for epoch in epochs:
# create predictions.csv
neptune.log_artifact('predictions.csv')
neptune.log_artifact('model_checkpoint.pkl')
Compare running experiments with previous ones
The cool thing about monitoring ML experiments in Neptune is that you can compare running experiments with your previous ones.
It makes it easy to decide whether the model that are training is showing promise of improvement. If it doesn’t you can even abort the experiment from the UI.
To do that:
- go to the experiment dashboard
- select a few experiments
- click compare to overlay learning curves and show diffs in parameters and metrics
- click abort on the running ones if you no longer see the point in training
Share running experiments with others with a link
You can share your running experiments by copying the link to the experiment and sending it to someone.
Just like I am sharing this experiment with you here:
https://ui.neptune.ai/o/shared/org/step-by-step-monitoring-experiments-live/e/STEP-22
The cool thing is you can send people directly to a part of your experiment that is interesting like code, hardware consumption charts, or learning curves. You can share the experiment comparisons with links as well.
Use integrations to monitor training in your frameworks
Neptune comes with a bunch of framework integrations to make the monitoring even easier.
Let me show you how it usually works with two examples: Keras and Optuna.
Monitor deep learning models: Keras
Instead of creating the monitoring callback in Keras you can use the Neptune+Keras integration
Simply import it and pass to the model.fit
. Don’t forget to create the experiment first.
from neptune.new.integrations.tensorflow_keras import NeptuneCallback
neptune_cbk = NeptuneCallback(run=run, base_namespace='metrics')
model.fit(x_train, y_train,
epochs=5,
batch_size=64,
callbacks=[neptune_cbk])
Monitor hyperparameter optimization: Optuna
Parameter tuning framework Optuna also has a callback system that you can plug Neptune in nicely. All the results are logged and updated after every parameter search iteration.
import neptune
from neptunecontrib.monitoring.optuna import NeptuneCallback
neptune.init()
neptune.create_experiment('my-optuna-experiment')
#
# your logic
#
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=100,
callbacks=[NeptuneCallback(log_charts=True])
Note:
Neptune has 25+ framework integrations (and counting) so check them out and see if your frameworks are available or drop us a comment and we may just build it for you!
Final thoughts
With all this information you should be able to monitor every piece of the machine learning experiment that you care about.
For even more info you can:
- See how the monitoring works in this google colab notebook that comes with snippets for logging all sorts of things to Neptune
- Check out this example run monitoring experiment to see how this can look like
- Read the updated list of things that you can log
- Check out the full list of our integrations with ML frameworks
- Talk to us on Intercom (that blue thing in the corner) or on our Discourse Forum.
Happy experiment monitoring!
NEXT STEPS
How to get started with Neptune in 5 minutes
1. Create a free account
Sign up2. Install Neptune client library
pip install neptune-client
3. Add logging to your script
import neptune.new as neptune
run = neptune.init('Me/MyProject')
run['params'] = {'lr':0.1, 'dropout':0.4}
run['test_accuracy'] = 0.84
Try live notebook