Blog » ML Tools » How to Track and Organize ML Experiments That You Run in Google Colab

How to Track and Organize ML Experiments That You Run in Google Colab

Have you ever tried working with a large dataset of machine learning or deep learning algorithms on Jupyter Notebooks? Well, then you probably ran into the all – too familiar and annoying ‘memory-error.’

memory error

Let’s also not forget the financial pain of owning a decent enough GPU. 

Luckily for us, Google Colab comes to the rescue and lets you train complex machine learning models using free GPU computing power. Machine learning has never been easier! 

Benefits of Google Colab include:

  • Inbuilt version controlling the system via Git
  • A Jupyter Notebook that leverages the Google Docs collaborations features
  • No need to install any packages + online, browser-based platform

Sounds pretty good! 

However, Google Colabs lacks the tools to help you organize your experimentation process

Seemingly endless columns and rows, random color coding, and not knowing where to find any values- I’m describing the all too familiar and chaotic spreadsheet of experiments you as a machine learning developer may have had to experience. 

Tracking and managing countless variables and artifacts in a simple spreadsheet can be exhausting. Just to name a few things to keep in mind during the process:

  • Parameters: hyperparameters, model architectures, training algorithms
  • Jobs: pre-processing job, training job, post-processing job — these consume other infrastructure resources such as compute, networking, and storage
  • Artifacts: training scripts, dependencies, datasets, checkpoints, trained models
  • Metrics: training and evaluation accuracy, loss
  • Data used for debugging: Weights, biases, gradients, losses, optimizer state
  • Metadata: experiment, trial and job names, job parameters (CPU, GPU and instance type), artifact locations (e.g. S3 bucket)

Such a lack of organization isn’t sustainable in the long run. Luckily, Neptune AI lets you manage your machine learning experiments in a natural, robust fashion. In fact, Neptune allows you to streamline and organize your experimentation process by integrating with your experiments on Google Colab.

You can: 

  • Log metrics, hyperparameters, data version, hardware usage, and more
  • Monitor your experiments live
  • Filter, group, and compare experiment runs in an intuitive UI 
  • Collaborate with team and organize projects, experiments, etc. 

Additionally, creating and tracking experiments with Google Colab and Neptune is extremely easy.

In this article, you will learn:

  • The basics of Neptune/quick introduction
  • Basic model development in Google Colab 
  • Hooking up Neptune to track your machine learning experiment in Google Notebooks. 

Keep on reading!

Basic model development in Google Colab 

Let’s walk through creating a basic model in Google Colab before setting it up with Neptune. If you already know how to navigate Google Colab and have done a project with it before, you can skip this section. 

We will be using the classic Iris flower dataset. Our model will classify an iris flower into three species (Iris setosa, Iris virginica, Iris versicolor) based on the width and length measurements of its sepals and petals.

Our steps:

  1. Import necessary libraries
  2. Import and prepare the dataset
  3. Encode the dependent variable
  4. Split the dataset
  5. Use an algorithm to train the model 

Import the libraries

import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

How to import and prepare the dataset in Google Colab

After loading the dataset, we must split it up by the independent and dependent variables. Independent variables include sepal/petal width and length, while dependent variables include species name.

dataset = pd.read_csv('') #import dataset through link 
#Split up dataset by columns
x = dataset.iloc[:, 0:4].values # input attributes
y = dataset.iloc[:, -1].values  # target attributes

Encode the dependent variable

Because the species names are not given in numerical values, we must encode them so the computer can process/ read them.

le = LabelEncoder()
y = le.fit_transform(y)

Split the dataset

We must split the dataset into training and testing. For our intents and purposes, we will be using 20% of the dataset for training and 80% for testing. Setting aside 10-20% of a given dataset is a common practice in machine learning.

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 1) #20% of dataset will be used for testing

Using an algorithm to train the model 

We will now need to apply a KNeighborsClassifier upon our dataset. 

model = KNeighborsClassifier() #model = KNN classifier object, np.ravel(y_train)) #fit the model using x as training data and y as target values
model.predict(x_test[0:10]) #predict the first 10 lines of the x_test dataset (returns array of containing estimated categories)

KNeighborsClassifier is part of the sklearn library. 

Finally, let’s print out the accuracy of our model. 

We can use Neptune with Google Colab to make our experimentation organization much cleaner. 

Incorporating Neptune into Google Colab 

Now, we are going to switch gears and hook up Neptune with Google Colab. To do this, we’re going to start a new project in Neptune. In this new project, we are going to predict if a hypothetical customer will leave a bank based on variables such as their credit score, gender, age, number of products, and more. Go ahead and download the dataset at this URL

If you don’t already have a Neptune account, you need to make one. After creating a Neptune account, create a new project. You can do this by going to the projects tab (second from left) and clicking the “new project” button in the top right (docs). 

Make sure to take note of your API token, which is needed to authorize communication between the training scripts and Neptune (docs). 

In google colab, make a new document. In a new code cell, import Neptune using the following command line:

import neptune

Next, we must initialize Neptune. Paste in the personalized API token and the Project qualified name (your username/ project-name). 


Here we will create a deep learning model and incorporate it with Neptune. We need to import libraries as we had done in the introductory previous section. 

import numpy as np
import pandas as pd
import tensorflow as tf

The next step is to use pandas to read the csv file. We also need to make it so that the input attributes are stored in variable X and the target attributes in variable Y. Make a new code cell and type the following:

dataset = pd.read_csv('Churn_Modelling.csv')#read dataset
X = dataset.iloc[:, 3:-1].values 
y = dataset.iloc[:, -1].values

Next, we want to encode the categorical variables through sklearn and using OneHotEncoding. Instead of string values (words) we are going to encode them so that they turn into numerical values instead. This way, the model can process the numerical values.

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
X[:, 2] = le.fit_transform(X[:, 2])
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [1])], remainder='passthrough')
X = np.array(ct.fit_transform(X))

Then we will split up the training and testing data into portions so that we can train using 80% of our dataset and test using 20%. 

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

Next, we’re going to incorporate some feature scaling. The purpose of feature scaling is to distribute weight to all values. This standardizes the values. 

There are two methods of feature scaling: 

  • Standardization: x_stand = x-mean(x)/standard dev.
  • Normalization x_norm = x-min(x)/max(x)-min(x)

Normalization is recommended when you have a normal distribution in most of your features.

Standardization works all the time and on most datasets, which is why we’re using it.  

from sklearn.preprocessing import StandardScaler #import scaling library
sc = StandardScaler() #set variable
X_train = sc.fit_transform(X_train) #performs feature scaling on Xtrain
X_test = sc.transform(X_test)#performs feature scaling on Ytrain

Now we will use TensorFlow to create a deep learning model that accesses the variables and computes the chance of someone leaving the bank. 

ann = tf.keras.models.Sequential()
ann.add(tf.keras.layers.Dense(units=6, activation='relu'))
ann.add(tf.keras.layers.Dense(units=6, activation='relu'))
ann.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))
ann.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy']), y_train, batch_size = 32, epochs = 100)

Next, we will create a sample Neptune experiment. Neptune will help organize the outputs and tags of our project. 

Usually, projects stay disorganized, and it’s hard to see the accuracy rate or other essential outputs. Neptune helps us keep everything in one place. 

The tags will be shown in Neptune and need to be coded so they appear in Neptune. They reflect aspects of the code that you’d want to remember. We’ll log accuracy to Neptune via “.log_metric“ method:

   name = 'experiment-example',
neptune.log_metric('accuracy', 0.93)
neptune.append_tags(['basic', 'finished_successfully'])

To log metrics after every batch and epoch, let’s create “NeptuneLogger“ callback using the following lines of code:

from tensorflow.keras.callbacks import Callback
class NeptuneLogger(Callback):
   def on_batch_end(self, batch, logs={}):
       for log_name, log_value in logs.items():
           neptune.log_metric(f'batch_{log_name}', log_value)
   def on_epoch_end(self, epoch, logs={}):
       for log_name, log_value in logs.items():
           neptune.log_metric(f'epoch_{log_name}', log_value)

We will now create 6 epochs, and an experiment called Keras-metrics which will track everything that comes after each epoch. 

                         params={'epoch_nr': EPOCH_NR,
                                 'batch_size': BATCH_SIZE},

A graph such as the one below, will auto-generate within Neptune. For example, the batch loss is the number of samples processed before the model is updated. The number of epochs is the number of complete passes through the training dataset. 

You can monitor your learning curves as they train! Scroll down in the charts section of your experiment to see batch loss, batch accuracy, epoch loss, epoch accuracy, epoch val loss, and any other additional metrics you log.

Neptune batch loss
Neptune batch accuracy

Now we pass our `NeptuneLogger` as Keras callback and we’re done!

history =,
                   validation_data=(x_test, y_test),

To log more metrics into Neptune during or after the training, simply do the following: 

import numpy as np
y_test_pred = np.asarray(ann.predict(X_test))
y_test_pred_class = np.argmax(y_test_pred, axis=1)

The f1 score is a measure of a model’s accuracy on a  dataset. 

from sklearn.metrics import f1_score
f1 = f1_score(y_test, y_test_pred_class, average='micro')
neptune.log_metric('test_f1', f1)
Neptune f1 score

Now, we will log images/ diagnostic charts into Neptune, as such: 

The diagnostic charts logged here are a confusion matrix and a ROC AUC curve.

!pip install scikit-plot
from scikitplot.estimators import plot_feature_importances
from scikitplot.metrics import plot_confusion_matrix, plot_roc
fig, ax = plt.subplots()
plot_confusion_matrix(y_test, y_test_pred_class, ax=ax)
neptune.log_image('diagnostic_charts', fig)
fig, ax = plt.subplots()
plot_roc(y_test, y_test_pred, ax=ax)
neptune.log_image('diagnostic_charts', fig)

You can find these in logs, then by clicking on diagnostic charts.

Neptune diagnostics charts

After we are done logging everything, we must stop the experiment by: 


But we are not done yet! We can also fetch data that was logged into Neptune earlier programmatically!

First, do: 

my_project = neptune.init(api_token="ANONYMOUS",project_qualified_name="asrithabodepudi1/tester")

For example, now I will fetch the experimental dashboard data, however, with only the tag “basic” 


You can also fetch back numerical values such as the epoch loss and epoch val loss: 

exp.get_numeric_channels_values("epoch_loss", "epoch_val_loss")

Doing more with Neptune and Google Colab…

Why stop here? You can learn more about Neptune features and integration with Google Colab by reading through the documentation. Make sure to create a free account and begin tracking your machine learning experiments with Neptune in an organized and easy to read fashion!


Watch this tutorial to see how to integrate Neptune with Google Colab.

Check the docs ->
Colab files guide

How to Deal with Files in Google Colab: Everything You Need to Know

Read more
Google Colab for deep learning

How to Use Google Colab for Deep Learning – Complete Tutorial

Read more
Best tools featured

15 Best Tools for Tracking Machine Learning Experiments

Read more

How to Keep Track of Deep Learning Experiments in Notebooks

Read more