We Raised $8M Series A to Continue Building Experiment Tracking and Model Registry That “Just Works”

Read more

How to Keep Track of Deep Learning Experiments in Notebooks

Jupyter notebooks are cool. They’re language-independent, great for collaboration, easy to customize, you can add extensions—the list goes on. 

Issues begin when you need to track training hyperparameters, metrics, test results, or graphs. That’s when the chaos starts. 

Then there are spreadsheets, which can quickly become unmanageable, especially in a team environment where you have multiple people who need to edit them at the same time. 

Managing spreadsheets is painful, and not great for productivity. Once you finish tracking and recording your experiments, you then need to:

  • Search through and visualize those experiments,
  • Organize the experiments / spreadsheet into something all team members can understand,
  • Make data sharable and accessible.

It’s not a new issue, ML practitioners have long needed tools to track and manage experiments in one place. There are multiple products that solve this issue in unique ways:

In this article, we’ll focus on Neptune. This platform lets you visualize and compare experiment runs, and keeps your results and metadata all in one place.

It stores, retrieves, and analyzes vast amounts of data. There are easy collaboration tools inside, along with Jupyter notebook tracking. In my experience, Neptune is the most lightweight experiment management tool. 

If you want to follow along with this article, make a free Neptune account first.

Please note that due to the recent API update, this post needs some changes as well – we’re working on it! In the meantime, please check the Neptune documentation, where everything is up to date! 🥳

Deep Learning experiments tracking with Neptune

Neptune’s Jupyter notebook tracking and collaboration features make it easy to use, and to scale your projects. In fact, the best thing about Neptune is how easy it is to cooperate with your team. 

It happens seamlessly, whereas other platforms I used often have lag time when multiple teams or developers work on the same project at once. 

Another great thing is that Neptune gives you a single place to organize your machine learning experiments. Your team can keep Jupyter notebooks, runs, and more in one convenient space. Few other platforms give you that. Something else unique Neptune provides are auto snapshots. Whenever you run an experiment in a notebook of your choice, such as Jupyter, the current state of the notebook is sent to Neptune automatically.

All of this means that you can focus on developing, rather than having to swift through several apps to find different parts of your experiment. 

Neptune’s recording features allow your team to see statistics and run data right from the platform. You don’t have to run Matplotlib operations manually, you can automatically view project statistics relevant to your project. 

Neptune combines all essential functionalities of competing products, and they offer a generous free tier. 

Pros and cons of Jupyter Notebooks

Jupyter notebooks are awesome tools, but there are some issues which make them a bit annoying to work with. Neptune solves these issues, making Jupyter notebooks a stress-free tool. 

Developers like to run experiments in Jupyter notebooks because they’re a proven method of showcasing work, and they make it easy to host server sides. This makes them great for security and communication within the developer community. 

So what are the issues? For one thing, you can’t collaborate with multiple people when working on a Jupyter notebook. Real-time collaboration doesn’t come with Jupyter notebooks, which can be a big problem. 

Luckily, Neptune allows multiple developers to collaborate on Jupyter notebooks in an interactive way. Link sharing lets you see and download notebooks that your teammates create. Neptune lets you share views of experiments, notebooks, and projects by sharing the URL to a specific view.

The other issue with Jupyter notebooks is inefficient plotting. In theory, you can copy-paste plots from Jupyter directly into external editors like Powerpoint. 

But having to copy-paste plots every time your data changes is inefficient because data changes all the time. Neptune’s built-in plotting features resolve this problem completely. 

Parts of a Machine Learning experiment 

There are many entities within machine learning experiments. Parameters, artifacts, jobs and relationships, and so on. 

The dictionary says machine learning experiments are procedures to test a hypothesis. For example, is model 1 better than model 2? Do hyperparameters XYZ have a negative effect on response ABC? 

In your experiments, you’re dealing with variables, trials, and trial components. 

Variables are controllable factors that can vary and record responses (for example model architectures, or hyperparameters). 

Trials are simply training iterations on a specific variable set. 

Trial components include various parameters, jobs, datasets, models, metadata, and other artifacts. These can be associated with trials, or they can be independent. 

Furthermore, ML flow tracking is an essential part of the experiments we conduct. It is recommended we understand how to use ML flow tracking before continuing on to the tutorial. 

There are two components of ML flow tracking: experiments and runs. 

An experiment is the primary unit of organization and access control for ML flow runs. All ML flow runs belong to an experiment. Experiments let you visualize, search for, and compare runs, as well as download run artifacts and metadata for analysis in other tools. 

Runs, on the other hand, correspond to a single execution of model code. Each run records the source, version, start and end time, parameters, metrics, tags, and artifacts. The source is the name of the notebook that launched the run, or the project name and entry point for the run.

That’s not all. Other important terms include:

  • Version: Notebook revision if run from a notebook or Git commit hash if run from an ML Project. 
  • Start & end time: Start and end time of a run.
  • Parameters: Model parameters saved as key-value pairs. Both keys and values are strings.
  • Metrics: Model evaluation metrics saved as key-value pairs. The value is numeric. Each metric can be updated throughout the course of the run (for example, to track how your model’s loss function is converging), and ML flow records and lets you visualize the metric’s history.
  • Tags: Run metadata saved as key-value pairs. You can update tags during and after a run completes. Both keys and values are strings.
  • Artifacts: Output files in any format. For example, you can record images, models (like a pickled scikit-learn model), and data files (like a Parquet file) as an artifact.

Tutorial: tracking Deep Learning experiments in Jupyter Notebooks with Neptune

First, let’s create an experiment in Neptune. Neptune helps you with experiment management, or basically tracking experiment metadata (such as code versions, data versions, hyperparameters, environments, and metrics).

    name = 'experiment-example',

Logging as many metrics as you can will save your future self a lot of trouble when metrics change with new discoveries or specifications. 

Different evaluation metrics include classification accuracy, logarithmic loss, confusion matrix, area under curve, F1 Score, mean absolute error, mean squared error. Tracking these can help you evaluate the performance of your machine learning algorithm and model. 

For example, your DL model may give you an appropriate accuracy score, but a poor logarithmic loss. Neptune keeps all these metrics nicely organized for each experiment. 

You can log a simple metric such as accuracy with Neptune:  

neptune.log_metric('classification_accuracy', 0.99)

You can also log more complex metrics such as a confusion matrix: 

neptune.log_metric('diagnostics', 'confusion_matrix.png')
DL in notebooks Neptune

Now, let’s go through a deep learning example where we log metrics after each batch and epoch. 

More advanced DL models will go through hundreds of epochs. Manually tracking accuracy, batch loss, and other metrics after each epoch is impossible. 

If your model performance is not great, you must figure out if you need more data, reduce the model complexity, or find another solution. 

To determine the next step, you must know if your model has a bias or variance problem. If it does, we can use data preprocessing techniques to troubleshoot it. 

Learning curves show the relationship between training set size and a chosen evaluation metric, and can be useful for diagnosing model performance. Neptune automatically generates learning curves as the model trains, and logs metrics after each epoch or batch. 

But first, you must create a NeptuneLogger callback:

from tensorflow.keras.callbacks import Callback
class NeptuneLogger(Callback):
   def on_batch_end(self, batch, logs={}):
       for log_name, log_value in logs.items():
           neptune.log_metric(f'batch_{log_name}', log_value)
   def on_epoch_end(self, epoch, logs={}):
       for log_name, log_value in logs.items():
           neptune.log_metric(f'epoch_{log_name}', log_value)

Next, create an experiment, give it a name, and log some hyperparameters. 

Here, we made the epoch size 7, and the batch size 40.  

                         params={'epoch_nr': EPOCH_NR,
                                 'batch_size': BATCH_SIZE},

Finally, pass the Neptune logger as Keras callback: 

history = model.fit(x=x_train,
                   validation_data=(x_test, y_test),

Inside Neptune, you can monitor your learning curves as they train, which is a unique feature that makes it easy for you to observe efficiency.

DL in notebooks Neptune monitoring

What’s next?

Jupyter notebooks are great tools for machine learning, but the best way to work with them is to use a platform like Neptune. 

The features and affordability make Neptune the least problematic platform on the market. 

If you want to learn more about it, and about experiment management within Neptune, check out these resources:

That’s it from me. Thanks for reading!

Co-founder @ Tonnelier Technologies | Tech Contributor


How to get started with Neptune in 5 minutes

1. Create a free account
Sign up
2. Install Neptune client library
pip install neptune-client
3. Add logging to your script
import neptune.new as neptune

run = neptune.init_run("Me/MyProject")
run["parameters"] = {"lr":0.1, "dropout":0.4}
run["test_accuracy"] = 0.84
Try live notebook

19 Best JupyterLab Extensions for Machine Learning

19 Best JupyterLab Extensions for Machine Learning

Read more
MLOps guide

MLOps: What It Is, Why It Matters, and How to Implement It

Read more
How to Track and Organize ML Experiments That You Run in Google Colab

How to Track and Organize ML Experiments That You Run in Google Colab

Read more
Machine Learning Model Management in 2021 and Beyond - Everything That You Need to Know

Machine Learning Model Management in 2021 and Beyond – Everything That You Need to Know

Read more