TensorBoard vs Neptune: How Are They ACTUALLY Different

Posted November 18, 2020
Tensorboard vs Neptune

ML model development typically involves a tedious workflow of managing data, feature engineering, model training and evaluation. 

A data scientist could easily run in the order of hundreds of combinations of these things before converging onto a final model which solves the business problem. Managing those experiments, tracking progress and comparing them is an uphill battle which most data scientists fight everyday. 

There are multiple tools available to make this process easier and today we will take a look at two of them. This writeup will take you through a deep comparison of TensorBoard with Neptune, one of the modern experiment management tools. We will take you through a model development cycle and compare the utility of TensorBoard and Neptune at various steps of the process. For the purpose of this comparison, we will be using the digit recognition problem with the MNIST dataset.


EDITOR’S NOTE
See also: 
Deep Dive into TensorBoard: Tutorial With Examples
The Best TensorBoard Alternatives


Areas of comparison

We will compare TensorBoard and Neptune by dividing the ML model development process into the following parts:

  • Exploratory Data Analysis: perform ad-hoc analysis on data to help in deciding training parameters, feature engineering, etc.
  • Experiment Setup: provide means to store multiple experiments together as an entity to allow easy comparison in the future.
  • Model Training & Evaluation: train a model and look at the evaluation metrics to debug and compare performance.
  • Model Debugging: dig deeper into the training process and figure out what went wrong.
  • Hyperparameter Tuning: the ability to train multiple models, compare them easily, and pick a winner
  • Versioning: ability to add data/code/feature/model metadata for comparison.
  • Collaboration: allow multiple users to work together and manage access.

Exploratory Data Analysis (EDA)

EDA is typically the first step in the process of developing an ML model. This is usually very custom to the problem at hand and requires high flexibility. Both TensorBoard and Neptune have light support for this.

Even though Neptune does not have tools to help you in EDA per-say, it allows us to visualize the ad-hoc analysis done using jupyter notebooks on the Neptune UI. This can be done using 2 ways:

  • You can save notebook checkpoints directly into Neptune projects. Here is an example of a public notebook containing EDA. If you want to check out how this works here are the docs.
  • You can attach custom charts or visualizations to the Neptune experiment. We will go through an example of this later.

Experiment Setup

Before you start using any experiment management tool you need to set up an experiment so that it can be easily identified and referred to in the future. Here we will compare TensorBoard and Neptune on these aspects. 

First, let’s define a project and experiment name which we will use for tracking.

PROJECT = 'blog-neptune-vs-tensorboard'
EXPERIMENT = 'model-1'

Download the dataset:

mnist = tf.keras.datasets.mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train, X_test = X_train / 255.0, X_test / 255.0

Tensorboard

Tensorboard has no direct support for setting up an experiment. We need to create a directory structure ourselves and store logs in there. The good thing is that it works with Google Cloud Storage and AWS S3 paths so that all data can be stored and read from the cloud directly.

Here, we will create a local directory for a project and then another for an experiment.

# hide this later
DATA_BASE = Path('../neptune.ai')
# create directories using path lib
project_dir = DATA_BASE / 'blog-tensorboard-vs-neptune'
exp_dir = project_dir / 'first-model'
if not project_dir.exists():
    project_dir.mkdir()
if not exp_dir.exists():
    exp_dir.mkdir()

Neptune

Experiment creation and management are first-class citizens in Neptune. It allows us to create a project within which multiple experiments can be added and compared. All this is managed seamlessly on the Neptune server.

First, we created a project called “blog-tensorboard-vs-neptune” on the Neptune UI (docs on how to do it). Now we can initialize the neptune client pointing to that project and create experiments in it.

import neptune
neptune_project = neptune.init(f'aarshay/{PROJECT}')
neptune_experiment = neptune_project.create_experiment(name=EXPERIMENT)

As we can see, Neptune gave us an automated ID called `BLOG-1` for the experiment. We can see this on Neptune UI as:

neptune vs tensorboard

Model Training & Evaluation

Model training is typically carried out using off-the-shelf tools like Keras which we’ll be using here. Model evaluation involves looking at the loss function and other metrics of concern. Both TensorBoard and Neptune provide fairly good support for model evaluation.

Let’s train a simple model using Keras.

model = tf.keras.models.Sequential([
   tf.keras.layers.Flatten(input_shape=(28, 28)),
   tf.keras.layers.Dense(64, activation='relu'),
   tf.keras.layers.Dense(64, activation='relu'),
   tf.keras.layers.Dense(10, activation='softmax')])

model.compile(optimizer='sgd', 
   loss='sparse_categorical_crossentropy',
   metrics=['accuracy'])

Both neptune and tf logging can be achieved using callbacks:

from tensorflow.keras.callbacks import TensorBoard, Callback

Tensorboard

There is a predefined callback method that allows you to log model metrics onto a specific location. Note that this is the same location which we defined in step 1. You can read more about the parameters in the official doc.

tb_callback = TensorBoard(
    log_dir=exp_dir,
    histogram_freq=1,
    write_graph=True,
    write_images=True,
    update_freq='epoch',
    profile_batch=2,
    embeddings_freq=1)

Neptune

For Neptune, a custom callback can be defined which will log model metrics to Neptune at the end of every epoch (you can also use the predefined Neptune callback for Keras).  

class NeptuneLoggingCallback(Callback):
     def on_epoch_end(self, epoch, logs=None):
        for metric_name, metric_value in logs.items():
            neptune.log_metric(metric_name, metric_value)

neptune_callback = NeptuneLoggingCallback()

Now we will pass both of these callbacks to the model fit method and train the model:

callbacks = [tb_callback, neptune_callback]
model.fit(X_train, y_train,
          epochs=10,
          validation_split=0.2,
          callbacks=callbacks)

Test set performance

test_loss, test_accuracy = model.evaluate(X_test, y_test)
neptune.log_metric('test-loss', test_loss)
neptune.log_metric('test-accuracy', test_accuracy)

Both TensorBoard and Neptune show the loss and accuracy metrics for training and evaluation runs.

The TensorBoard UI looks as:

tensorboard UI

Here each plot contains a line for train and validation runs.

In Neptune, the project page has additional metrics added:

neptune vs tensorboard metrics

Also, inside the experiment, we see these metrics as both charts and logs:

Neptune metrics

Neptune metrics

Model Train Debugging

Simply knowing the eval metrics is not always enough. If the model does not perform as expected, we enter into the hell of neural network debugging (if you do get there check out this resource on DL troubleshooting). This typically involves evaluating the model on more metrics and understanding if something went wrong during the training itself.

TensorBoard provides a lot of model training characteristics right off-the-bat. 

  • We can visualize network architecture and distributions of weights and gradients over time. This allows us to get an intuition to tune the model’s training parameters like learning rate and weight initialization. 
  • It provides a projector for visualizing vector embeddings.
  • There is a profiler which can be used to debug model training times. This is particularly useful when we’re using a GPU and trying to minimize model training time by parallelizing reads with GPU processing.
  • TensorBoard also has the flexibility to add custom images which can be used to understand training data, confusion matrices or any ad-hoc information.

You could refer to our dedicated blog on TensorBoard for detailed examples on these. For illustration, the network graphs and histograms are shown below:

tensorboard graphs

tensorboard histograms

Neptune does not provide any metrics off-the-bat, but it makes it really easy to add custom images and dynamic charts through its integrations with third-party libraries like Matplotlib, Optuna, Dalex, Altair, etc.

Image Logging with Matplotlib

Let us compare the process of logging a custom image on TensorBoard and Neptune. We need an image first, let’s use the scikit-plots library to create ROC curves for all the labels and push that as an image.

import numpy as np
from scikitplot.metrics import plot_roc

predicted_probs = model.predict(X_test)
predicted_labels = np.argmax(predicted_probs, axis=1)
figure, ax = plt.subplots(1,1,figsize=(8,8))
plot_roc(y_test, predicted_probs, ax=ax)

Pushing this in TensorBoard requires the following code:

import io

def figure_to_tf_image(figure):    
    buffer = io.BytesIO()
    figure.savefig(buffer, format='png')
    buffer.seek(0)

    tf_image = tf.image.decode_png(buffer.getvalue(), channels=4)
    tf_image = tf.expand_dims(tf_image, 0)

    return tf_image

file_writer = tf.summary.create_file_writer(str(exp_dir))
with file_writer.as_default():
    tf.summary.image("ROC Curves", figure_to_tf_image(figure), step=0)

This would show up in the tensorboard UI as:

tensorboard image logging

However, pushing this to Neptune requires just 2 lines:

from neptunecontrib.api import log_chart
log_chart(name='ROC Curves', chart=figure, experiment=neptune_experiment)

This would appear as below. Note that the chart below is actually an interactive plot, Neptune handles that under the hood.

neptune image logging

You can find this chart on Neptune here.

Along with logging images, Neptune enables us to log almost anything to the experiments API for future reference. Neptune has integrations with some powerful third-party ML libraries which makes logging much easier. These libraries include dalex, plotly, bokeh, pandas, etc. Let’s see a couple of examples here.

Table Logging with Pandas

Let’s plot the confusion matrix itself as a table, this can be done in Neptune with the following code:

from neptunecontrib.api import log_table
from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test, predicted_labels)
cm_df = pd.DataFrame(cm)
log_table('confusion matrix table', cm_df)

This would show up in the tables directory under Artifacts as:

neptune artifacts

Interactive Confusion Matrix using Altair

Neptune has integration with altair, which we will now use to log the confusion matrix:

def make_example(selector, df):
    base = alt.Chart(df).encode(
        x="predicted:O",
        y="actual:O",
    )
    heatmap = base.mark_rect().encode(
            color=alt.condition(selector, 'confusion_matrix:Q', alt.value('lightgray'))
        ).properties(
            width=600,
            height=480
        ).add_selection(
            selector
        )

    text = base.mark_text(baseline='middle').encode(text='confusion_matrix:Q')

    return heatmap + text

interval_x = alt.selection_interval()
chart = make_example(interval_x, df)
log_chart(name='altair_confusion_matrix', chart=chart)

This would appear in the UI as:

neptune confusion matrix

You can find this chart on Neptune here. More information about this integration can be found in the official docs.

Hyperparameter Tuning

Hyperparameter tuning is a key aspect of ML development lifecycle. Both TensorBoard and Neptune have good support for this with some subtle differences. We will go through an example here by tuning the sizes of the 2 hidden layers we’ve been using in our model so far.

# set values of hyper-parameters to be tuned
NUM_UNITS_LAYER_1 = [8, 16, 32, 64, 128]
NUM_UNITS_LAYER_2 = [8, 16, 32, 64, 128]

# define metric names
METRIC_LOSS = 'loss'
METRIC_ACCURACY = 'accuracy'

Tensorboard

Tensorboard has a hyperparam API which allows us to log metrics in a specific way so that they can appear in a dedicated HPARAMS tab on the TensorBoard UI. The following code block covers an example of how this can be done for our use case.

from tensorboard.plugins.hparams import api as hp
def setup_tensorboard(layer_1_units, layer_2_units, tuning_dir):
    hp_units_layer_1 = hp.HParam('num_units_layer1', hp.Discrete(layer_1_units))
    hp_units_layer_2 = hp.HParam('num_units_layer2', hp.Discrete(layer_2_units))
    
    with tf.summary.create_file_writer(tuning_dir).as_default():
        hp.hparams_config(
            hparams=[hp_units_layer_1, hp_units_layer_2],
            metrics=[hp.Metric(METRIC_ACCURACY), hp.Metric(METRIC_LOSS)])
    
    return hp_units_layer_1, hp_units_layer_2

def train_model(units_layer1, units_layer2):
    model = tf.keras.models.Sequential([
       tf.keras.layers.Flatten(input_shape=(28, 28)),
       tf.keras.layers.Dense(units_layer1, activation='relu'),
       tf.keras.layers.Dense(units_layer2, activation='relu'),
       tf.keras.layers.Dense(10, activation='softmax')])

    model.compile(optimizer='sgd', 
       loss='sparse_categorical_crossentropy',
       metrics=['accuracy'])
    
    model.fit(X_train, y_train, epochs=10)
    loss, accuracy = model.evaluate(X_test, y_test)
    return loss, accuracy

param_turning_dir = f'{project_dir}/hparam_tuning'
hp_units_layer_1, hp_units_layer_2 = setup_tensorboard(NUM_UNITS_LAYER_1, 
                                                       NUM_UNITS_LAYER_2, 
                                                       param_turning_dir)
for n_layer1, n_layer2 in zip(hp_units_layer_1.domain.values, hp_units_layer_2.domain.values):
    hparams = {
        hp_units_layer_1: n_layer1,
        hp_units_layer_2: n_layer2
    }
    exp_id = f'model-{n_layer1}-{n_layer2}'
    exp_dir = f'{param_turning_dir}/{exp_id}'
    print(f'\nrunning experiment {exp_id}')
    with tf.summary.create_file_writer(exp_dir).as_default():
        hp.hparams(hparams)
        loss, accuracy = train_model(n_layer1, n_layer2)
        tf.summary.scalar(METRIC_LOSS, loss, step=1)
        tf.summary.scalar(METRIC_ACCURACY, accuracy, step=1)

The resulting UI would look like this:

tensorboard hparams

It allows us to compare the metrics we’ve logged against the values of hyperparameters in a simplistic UI. There are a couple other views which could be interesting to look at. However, note that we have lost all other details about each run except for the values we explicitly logged.

Neptune

Now we will go through the same tuning exercise, but with Neptune logging.

def create_new_experiment(layer_1_units, layer_2_units):
    exp_id = f'model-{n_layer1}-{n_layer2}'
    print(f'\nrunning experiment {exp_id}')
    params = {
        'num_units_layer1': n_layer1,
        'num_units_layer2': n_layer2
    }
    return neptune.create_experiment(name=EXPERIMENT, params=params)

def train_model(units_layer1, units_layer2):
    model = tf.keras.models.Sequential([
       tf.keras.layers.Flatten(input_shape=(28, 28)),
       tf.keras.layers.Dense(units_layer1, activation='relu'),
       tf.keras.layers.Dense(units_layer2, activation='relu'),
       tf.keras.layers.Dense(10, activation='softmax')])

    model.compile(optimizer='sgd', 
       loss='sparse_categorical_crossentropy',
       metrics=['accuracy'])
    
    model.fit(X_train, y_train, epochs=10, validation_split=0.2, callbacks=[NeptuneLoggingCallback()])
    loss, accuracy = model.evaluate(X_test, y_test)
    return loss, accuracy

for n_layer1, n_layer2 in zip(NUM_UNITS_LAYER_1, NUM_UNITS_LAYER_2):
    nexp = create_new_experiment(n_layer1, n_layer2)
    test_loss, test_accuracy = train_model(n_layer1, n_layer2)
    nexp.log_metric('test-loss', test_loss)
    nexp.log_metric('test-accuracy', test_accuracy)

The summary of experiments look like this:

neptune summary of experiments

The UI looks much nicer and provides some additional features like sorting on a metric, eg: the image above is sorted on test_accuracy. On top of this, each experiment is logged in its entirety, i.e. we can go into the details of any of these and see any related metrics or custom charts.

Comparing Neptune with the Tensorboard UI, you would notice:

  • The Neptune UI itself is much more clean and intuitive when compared to TensorBoard.
  • We get training metrics for free in the comparison chart without any additional explicit logging
  • Each row on the UI links to the full experiment with more details. So not only do we get a better comparative view, but also the ability to expand into each individual run for more details.
  • Neptune provides a cleaner and simpler python API. We were able to achieve all of the above with less code.
  • TensorBoard provides some additional charts off-the-bat like network architecture, histograms of weights and gradients. These could be crucial for model debugging during development.

Versioning

ML projects are usually ongoing efforts and different runs could vary not only on the hyperparameters but also the data, feature engineering, modeling library, etc. This calls for better versioning of the data, features, model and the code. 

TensorBoard has no direct support for this. Neptune provides handy utilities which fulfil basic requirements of versioning in ML experiments. These are:

  • The data, feature, and model versions could be logged as experiment parameters, similar to that way we logged the sizes of hidden layers. This will expose them on the same UI which we saw for model comparison.
  • The model itself can be serialized and stored as an experiment artifact. This can be downloaded later on demand.
  • There is explicit support for code versioning through 2 means:
    1. Source Code: we can upload the source code used to run an experiment. Neptune also reads the .git directory and logs git sha and other information for every experiment.
    2. Notebook Checkpoints: we can cache any given state of a notebook onto the Neptune UI using the neptune-notebooks extension. Neptune also provides an intuitive interface to compare notebooks stored in an experiment.

Collaboration

It is common for multiple people to work together on developing an ML model, which calls for ease of collaboration. Neptune makes this process much easier as compared to TensorBoard as:

  • Neptune provides a managed server for storing and viewing all experiments, much simpler than maintaining a TensorBoard server
  • Neptune Team product allows easy access management by allowing you to add multiple members on the project.

Comparison summary

Attribute TensorBoard Neptune
Experiment Setup self-managed neptune-managed
Exploratory Data Analysis no-support save Jupyter notebooks,
save-custom-charts
Model Traning & Evaluation off-the-shelf-callback custom-callback or off-the-shelf-callback
Model Debugging advanced in-built support good support with custom logging
and third-party integrations
Hyperparameter Tuning basic comparison advanced comparison
Versioning no-support good code version support,
basic general support
Collaboration self-managed neptune-managed

I’m sure you are wondering why can’t I get everything. Don’t worry, you can use TensorBoard + Neptune integration to get the best of both tools. 

TensorBoard + Neptune

We have designed a neptune-tensorboard extension which will pick up experiment metrics directly from the TensorBoard output. 

With that, you don’t even need to create a custom callback or do any explicit logging if you don’t need to. This allows us to carry out our regular workflow and get Neptune logging and UI capabilities for free.

Installation

The plugin can be installed using:

pip install neptune-tensorboard

Integration

The plugin can be used by a simple global step:

import neptune
import neptune_tensorboard

neptune.init(api_token='ANONYMOUS', project_qualified_name='shared/showroom')
neptune.create_experiment('debug')

neptune_tensorboard.integrate_with_tensorflow()

Now we can log results to Neptune without a custom callback using just the following code:

model = tf.keras.models.Sequential([
   tf.keras.layers.Flatten(input_shape=(28, 28)),
   tf.keras.layers.Dense(64, activation='relu'),
   tf.keras.layers.Dense(64, activation='relu'),
   tf.keras.layers.Dense(10, activation='softmax')])

model.compile(optimizer='sgd', 
   loss='sparse_categorical_crossentropy',
   metrics=['accuracy'])

tb_callback = TensorBoard(
    log_dir=exp_dir,
    histogram_freq=1,
    write_graph=True,
    write_images=True,
    update_freq='epoch',
    profile_batch=2,
    embeddings_freq=1)

callbacks = [tb_callback]
model.fit(X_train, y_train,
          epochs=10,
          validation_split=0.2,
          callbacks=callbacks)

test_loss, test_accuracy = model.evaluate(X_test, y_test)
neptune.log_metric('test-loss', test_loss)
neptune.log_metric('test-accuracy', test_accuracy)

The Neptune Experiment logs look like:

neptune experiments log

We can see that all the metrics are present similar to what we got with the custom callback.

What is great is that if you don’t need to you can keep your logging basic with no additional code. But when you want to log something extra like interactive charts or model checkpoints you can easily add it (as I showed you before). 

Actually, in the next post I’ll dive into more details and show you all you can get with TensorBoard + Neptune integration. 

Until then, happy training!

ML Expert | Ex-Spotify | MS in Data Science, Columbia University

Jump to section