MLOps Blog

How to Version, Debug, Compare and Share Jupyter Notebooks

4 min
25th August, 2023

ML model development has improved by leaps and bounds, and Jupyter Notebooks have been a big factor in this change. Owing to its interactive development, support for markdowns, and LaTex, a huge repository of plugins, it has become a go-to tool for any Data scientist or ML practitioner. 

The popularity of Notebooks in this space has led to many offerings in ML experiment tracking as they don’t come with a native tracking feature (as of writing this article), so one has to look for solutions elsewhere. In this blog, we’re going to touch bases on:

  • 1 Why it’s important to version Notebooks?
  • 2 Different ways to version, debug and compare experiments done in Notebooks.
  • 3 How can help track, debug and compare Jupyter Notebooks?

Why should you version Notebooks?

Building ML models is experimentative in nature, and it’s common to run numerous experiments in search of a combination of an algorithm, parameters, and data preprocessing steps that would yield the best model for the task at hand. This requires some form of organization once the complexity of the problem grows. 

While running experiments in Notebooks, you’ll feel the need for versioning and tracking the same way you would if you were building your ML models in another IDE. Here are some key points on why you should adopt the best practice of setting up some form of versioning for your ML experiments:

  1. Collaboration: Working in a team requires a collaborative effort in decision making which would become cumbersome if there are no centrally logged experiment details like model metadata, metrics, etc.
  2. Reproducibility: It saves a lot of time for retraining and testing if you are logging the model configurations somewhere. By taking snapshots of the entire Machine Learning pipeline, it becomes possible to reproduce the same output again.
  3. Dependency tracking: By using version control, you can track different versions of the datasets (training, validation, and test), test more than one model on different branches or repositories, tune the model parameters and hyperparameters, and monitor the accuracy of each change.
  4. Model updates: Model development is not done in one step, it works in cycles. With the help of version control, you can control which version is released while continuing the development for the next release.

How to version Jupyter Notebooks?

There are many ways to version experiments you run in notebooks, ranging from simple log files to full-scale experiment tracking tools that offer a lot of features. Let’s talk about some from each category and understand what would be the right choice, given your requirements.

1. Tracking Notebooks in spreadsheets

Notebook versioning in spreadsheet

Tracking ML Experiments in Excel or Google spreadsheets is a fast yet brute-force solution. Spreadsheets provide a comfortable easy-to-use experience to directly paste your metadata and create multiple sheets for multiple runs. But it comes with lots of caveats, let’s see where it shines and where it doesn’t:


  1. Easy to use with a familiar interface.
  2. Reports for stakeholders can be directly created within the tool.
  3. It can be a boon for non-technical folks on the team to contribute.


  1. Tracking experiment in spreadsheets is a tedious affair, where you would either need to copy and paste model metadata and metrics onto the spreadsheet or use a module like pandas to log information and later save it to a spreadsheet.
  2. Once the number of experiments increases, it will get unmanageable to log each run in a separate sheet.
  3. Tracking and managing countless variables and artifacts in a simple spreadsheet is not the best way to approach the problem.

2. Versioning Notebooks using Git


Git can be a versatile tool for your project. It can not only track changes in your notebook but can serve as the version control tool for your entire project. With its power, you can push model-related metadata like trained weights, evaluation reports like confusion matrix, etc, to a central repository that your Data Science team can use to make informed decisions. Let’s look at some pros and cons of using Git for experiment tracking:


  1. A single version control system for all code and notebook files.
  2. A popular tool in the tech community.
  3. It gives access to millions of other repositories which can be used as a starting point.


  1. Hard to onboard non-programmers and other stakeholders.
  2. An unintuitive interface that may create friction for collaborative work.
  3. Need technical expertise to execute and maintain experiment-related repositories.

3. Versioning Notebooks with experiment tracking tools


Experiment tracking tools are tailor-made for this use case. They cover almost all of the requirements you might want from a metadata management tool, from experiment tracking to model registry. There have been a lot of tools in this space in the last few years, with prominent players being, Weights and Biases, and MLflow. Let’s look at some advantages/disadvantages of these tools:


  1. Covers all the functionalities you need while organizing your experiment runs.
  2. All of these tools come with a dedicated interactive UI that can be used for comparisons, debugging, or report generation.
  3. Each tool offers a plethora of features for team collaboration.


  1. Unlike Git or spreadsheets, experiment tracking tools usually come with a fee. Although almost all of them have a free tier for a single user, it has its limitations. But on the other hand, paying for the tool means you don’t have to worry about the setup, maintenance, or developing features.

Explore more tools

15 Best Tools for ML Experiment Tracking and Management

There may be numerous makeshift solutions pertaining to your specific problem to experiment tracking. A lot of legacy tools can solve a few areas for tracking and organizing your ML experiment. But if you want a full-fledged fix to your organization’s needs, you should ideally go for an experiment tracking tool.

Tracking, debugging, and comparing Jupyter Notebooks in Neptune is an ML metadata store that was built for research and production teams that run many experiments. It has a flexible metadata structure that allows you to organize training and production metadata the way you want to.

It gives you a central place to log, store, display, organize, compare, and query all metadata generated during the machine learning lifecycle. Individuals and organizations use Neptune for experiment tracking and model registry to keep their experimentation and model development under control.

The web app was built for managing ML model metadata and it lets you:

  • filter experiments and models with an advanced query language.
  • customize which metadata you see with flexible table views and dashboards.
  • monitor, visualize, and compare experiments and models.
Example custom dashboard Neptune
An example dashboard in the Neptune app with various metadata displayed

Neptune-Jupyter extension

Neptune offers seamless integration with Jupyter Notebooks using the Neptune–Jupyter extension. You can directly utilize the power of experiment tracking without having to flounder with many tools. Head over to the Neptune-Jupyter Notebooks docs to get started in the easiest way possible.

With the Neptune-Jupyter integration, you can:

  • Log and display notebook checkpoints either manually or automatically during model training.
  • Connect notebook checkpoints with model training runs in Neptune.
  • Organize checkpoints with names and descriptions.
  • Browse checkpoints history across all Notebooks in the project.
  • Compare notebooks side-by-side, with diffs for source, markdown, output, and execution count cells.
Comparing Notebooks in the Neptune app

Here’s an open example in the Neptune app with a few notebooks logged.  

Why should you use Neptune with Jupyter Notebook?

The aforementioned features make Neptune a great choice for tracking and versioning experiments with the Jupyter Notebook. Here’s what makes it a top contender for the role apart from the technical features we discussed in the last section:

  1. Seamless integration: With the Neptune-Jupyter extension, you can seamlessly integrate your Notebook with the Neptune dashboard achieving versioning and sharing capabilities. This reduces friction as compared to other methods.
  2. An abundance of features: Features offered by Neptune give you the freedom to monitor/log/store/compare whatever you want to make your experiment successful.
  3. Availability of free tier: A free tier is available for single users and offers important features at no cost.
  4. User and customer support: Thanks to the quick and helpful support team, you can get your problems fixed at a faster pace and only focus on building models. 

You’ve reached the end!

Congratulations! You are now fully equipped to understand what you require in terms of your ideal method to achieve organization in your Notebook experiments. In this article, we explored straightforward ad-hoc methods like Spreadsheets and Git, as well as more nuanced approaches like experiment tracking tools. Here are some more bonus tips to help you choose your next tool easily:

  1. Stick to what you need! It’s easier to get lost in the sea of tools and methods, but absolutely sticking to your requirements would help you make better decisions.
  2. I’d recommend using the “Try for Free” feature in every tool before you lock in on any single solution.

Thanks for reading! Stay tuned for more! Adios!

Was the article useful?

Thank you for your feedback!