We Raised $8M Series A to Continue Building Experiment Tracking and Model Registry That “Just Works”

Read more

Blog » ML Tools » How to Set Up Continuous Integration for Machine Learning With Github Actions and Neptune: Step by Step Guide

How to Set Up Continuous Integration for Machine Learning With Github Actions and Neptune: Step by Step Guide

In software development, Continuous Integration (CI) is a practice of merging code changes from the entire team to the shared codebase often. Before any new code can be merged it is tested and checked for quality automatically. 

CI makes the codebase up-to-date, clean, and tested by design and helps to find any problems with it quickly. 

But what does Continuous Integration mean for machine learning?

The way I see it:

Continuous Integration in machine learning extends the concept to running model training or evaluation jobs for each trigger event (like merge request or commit).

This should be done in a way that is versioned and reproducible to ensure that when things are added to the shared codebase they are properly tested and available for audit when needed.  

Some examples of CI workflows in machine learning could be:

  • running and versioning the training and evaluation for every commit to the repository,
  • running and comparing experiment runs for each Pull Request to a certain branch.
  • creating model predictions on a test set and saving them somewhere on every PR to the feature branch. 
  • about a million other model training and testing scenarios that could be automated.

Good news is today there are tools for that and in this article, I will show you how to set up Continuous Integration workflow with two of those: 

  • Github Actions: that lets you run CI workflows directly from Github
  • Neptune: which makes experiment tracking and model versioning easy

Please note that due to the recent API update, this post needs some changes as well – we’re working on it! In the meantime, please check the Neptune documentation, where everything is up to date! 🥳

You will learn

How to set up a CI pipeline that automates the following scenario.

On every Pull Request from branch develop to master:

  • Run model training and log all the experiment information to Neptune for both branches
  • Create a comment that contains a table showing diffs in parameters, properties, and metrics, links to experiments and experiment comparison in Neptune 

See this Pull Request on Github

CI for machine learning: Step-by-step guide

Before you start

Make sure you meet the following prerequisites before starting the how-to steps:


You can see this example project with the markdown table in the Pull Request on Github. Workflow config, environment file, and the training script are all there.

Step 1: Add Neptune logging to your training scripts

In this example project, we will be training a lightGBM multiclass classification model. 

Since we want to properly keep track of models we will also save the learning curves, evaluation metrics on testset, and performance charts like the ROC curve. 

1. Add Neptune tracking to your training script

Let me show you first and explain later.

import os

import lightgbm as lgb
import matplotlib.pyplot as plt
import neptune
from neptunecontrib.monitoring.lightgbm import neptune_monitor
from scikitplot.metrics import plot_roc, plot_confusion_matrix, plot_precision_recall
from sklearn.datasets import load_wine
from sklearn.metrics import f1_score, accuracy_score
from sklearn.model_selection import train_test_split

PARAMS = {'boosting_type': 'gbdt',
         'objective': 'multiclass',
         'num_class': 3,
         'num_leaves': 8,
         'learning_rate': 0.01,
         'feature_fraction': 0.9,
         'seed': 1234

data = load_wine()
X_train, X_test, y_train, y_test = train_test_split(data.data,
lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)

# Connect your script to Neptune

# Create an experiment and log hyperparameters
                                 'num_boosting_round': NUM_BOOSTING_ROUNDS})

gbm = lgb.train(PARAMS,
               valid_sets=[lgb_train, lgb_eval],
               valid_names=['train', 'valid'],
               callbacks=[neptune_monitor()],  # monitor learning curves
y_test_pred = gbm.predict(X_test)

f1 = f1_score(y_test, y_test_pred.argmax(axis=1), average='macro')
accuracy = accuracy_score(y_test, y_test_pred.argmax(axis=1))

# Log metrics to Neptune
neptune.log_metric('accuracy', accuracy)
neptune.log_metric('f1_score', f1)

fig_roc, ax = plt.subplots(figsize=(12, 10))
plot_roc(y_test, y_test_pred, ax=ax)

fig_cm, ax = plt.subplots(figsize=(12, 10))
plot_confusion_matrix(y_test, y_test_pred.argmax(axis=1), ax=ax)

fig_pr, ax = plt.subplots(figsize=(12, 10))
plot_precision_recall(y_test, y_test_pred, ax=ax)

# Log performance charts to Neptune
neptune.log_image('performance charts', fig_roc)
neptune.log_image('performance charts', fig_cm)
neptune.log_image('performance charts', fig_pr)

It is a typical model training script with a few additions:

  • We connected Neptune to the script with neptune.init() and passed our API token and the project name
  • We created an experiment and saved parameters with neptune.create_experiment(params=PARAMS)
  • We added learning curves callback with callbacks=neptune_monitor()
  • We logged test evaluation metrics with neptune.log_metric()  
  • We logged performance charts with neptune.log_image()

Now when you run your script:

python train.py

You should get something like this:

See this experiment in Neptune

2. Add a snippet that is run only in the CI environment. 

Add the following snippet at the bottom of your training script.

if os.getenv('CI') == "true":
   neptune.append_tag('ci-pipeline', os.getenv('NEPTUNE_EXPERIMENT_TAG_ID'))

What this does is:

  • fetch the CI environment variable to see whether code is run inside the Github Actions workflow
  • add a ci-pipeline tag to the experiment so that it is easier to filter out things in Neptune UI 
  • get the NEPTUNE_EXPERIMENT_TAG_ID environment variable used to identify the experiment in the CI workflow and log it to Neptune (this will become clear later).

Step 2: Create an environment file

Having an environment setup file which makes it easy to create your training or evaluation environment from scratch is generally a good practice.

But when you are training models in the CI workflow (like Github Actions) this is a must. The environment where the workflow is executed (and models are trained) will be created from scratch every time your workflow is triggered. 

There are a few choices when it comes to environment setup files. You can use:

  • Pip and the requirements.txt
  • Conda and the environment.yaml,
  • Docker and the Dockerfile (this is often the best option) 

Let’s go with the simplest solution and create a requirements.txt file with all the packages we need:



Now, whenever you need to run your training install all the packages with:

pip install -r requirements.txt

Step 3: Set up Github Secrets

GitHub Secrets allow you to pass sensitive information like keys or passwords to the Github CI workflow runners so that your automated tests can be executed. 

In our case, two sensitive things are needed:

  • NEPTUNE_API_TOKEN: I’ll set it to the key of anonymous Neptune user ANONYMOUS
  • NEPTUNE_PROJECT_NAME: I’ll set it to the open project shared/github-actions


You can set those to your API token and the Neptune project you created.

Without those, Github wouldn’t know where to send the experiments and Neptune wouldn’t know who is sending them (and whether this should be allowed). 

To set up GitHub Secrets:

  • Go to your Github project
  • Go to the Settings tab
  • Go to the Secrets section
  • Click on New secret 
  • Specify the name and value of the secret (similarly to environment variables)

Step 4: Create .github/workflows directory and a .yml action file

Github will run all workflows that you define in the .github/workflows directory with .yml configuration files. Which means you need to:

1. Create .github/workflows directory:

Go to your project repository and create both .github and .github/workflows directories.

mkdir -p .github/workflows

2. Create neptune_action.yml

Workflow configs that define actions are .yml files of a certain structure that you put in the .github/workflows directory. You can have multiple .yml files to fire multiple workflows. 

I will create just one neptune_action.yml

touch .github/workflows/neptune_action.yml

As a result, you should see:

├── .github
│   └── workflows
│       └── neptune_action.yml
├── .gitignore
├── README.md
├── requirements.txt
└── train.py

Step 5:  Define your workflow .yml config

Workflow configs are .yml files where you specify what you want to happen and when. 

In a nutshell, you define:

  • on which Github event you would like to trigger the workflow. For example, on a commit to the master branch, 
  • what are the jobs you would like to perform. This is mostly to organize the config, 
  • where should those jobs be performed. For example, runs-on: ubuntu-latest will run your workflow on the latest version of ubuntu.  
  • what are the steps within each job that you would like to run sequentially. For example, create an environment, run training, and run evaluation of the model. 

In our machine learning CI workflow we need to run the following sequence of steps:

  1. Checkout to branch develop
  2. Setup the environment and run model training on branch develop
  3. Checkout to branch master
  4. Setup the environment and run model training for branch master
  5. Fetch data from Neptune and create an experiment comparison markdown table 
  6. Comment on the PR with that markdown table

Here is the neptune_action.yml that does all that. I know it seems complex but in reality, it’s just a bunch of steps that run terminal commands with some boilerplate around it.  


name: Neptune actions

   branches: [master]

   runs-on: ubuntu-latest
       python-version: [3.7]

     - name: Set up Python
       uses: actions/setup-python@v2
         python-version: ${{ matrix.python-version }}

     - name: Checkout pull request branch
       uses: actions/checkout@v2
         ref: develop

     - name: Setup pull request branch environment and run experiment
       id: experiment_pr
       run: |
         pip install -r requirements.txt
         export NEPTUNE_EXPERIMENT_TAG_ID=$(uuidgen)
         python train.py
         echo ::set-output name=experiment_tag_id::$NEPTUNE_EXPERIMENT_TAG_ID

     - name: Checkout main branch
       uses: actions/checkout@v2
         ref: master

     - name: Setup main branch environment and run experiment
       id: experiment_main
       run: |
         pip install -r requirements.txt
         export NEPTUNE_EXPERIMENT_TAG_ID=$(uuidgen)
         python train.py
         echo ::set-output name=experiment_tag_id::$NEPTUNE_EXPERIMENT_TAG_ID

     - name: Get Neptune experiments
         MAIN_BRANCH_EXPERIMENT_TAG_ID: ${{ steps.experiment_main.outputs.experiment_tag_id }}
         PR_BRANCH_EXPERIMENT_TAG_ID: ${{ steps.experiment_pr.outputs.experiment_tag_id }}
       id: compare
       run: |
         pip install -r requirements.txt
         python -m neptunecontrib.create_experiment_comparison_comment \
           --api_token $NEPTUNE_API_TOKEN \
           --project_name $NEPTUNE_PROJECT_NAME \
           --filepath comment_body.md
         result=$(cat comment_body.md)
         echo ::set-output name=result::$result

     - name: Create a comment
       uses: peter-evans/commit-comment@v1
         body: |
           ${{ steps.compare.outputs.result }}

You can just copy this file and paste it into your .github/workflows directory and it will work out of the box. 

That said, there are some things that you may need to adjust to your setup:

  • Branch names if you want to trigger your workflow on PR from a branch different then develop or to a branch different than master.
  • Environment setup steps if you are using anything different than pip and requirements.txt.
  • The command that runs your training scripts.


Explaining this config in detail would make this post really long so I decided not to :). If you’d like to understand everything about it check out Github Actions Documentation (which is great by the way).

Step 6: Push it to Github 

Now you need to push this workflow to GitHub. 

git add .github/workflows train.py requirements.txt;
git commit -m "added continuous integration"

Since our workflow will be triggered on every Pull Request to master, nothing will happen just yet. 

Step 7: Create a Pull Request 

Now everything is ready and you just need to create a PR from branch develop to master.

  1. Checkout to a new branch develop
git checkout -b develop

2. Change some parameters in train.py


PARAMS = {'boosting_type': 'gbdt',
          'objective': 'multiclass',
          'num_class': 3,
          'num_leaves': 15, # previously 8
          'learning_rate': 0.01,
          'feature_fraction': 0.85, #previous 0.9
          'seed': 1234

3. Add, commit and push your changes to the previously created branch develop

git add train.py;
git commit -m"tweaked parameters"
git push origin develop

4. Go to Github and create a Pull Request from branch develop to master.

The workflow is triggered and it goes through all the steps one by one. 

Explore the result

If everything worked correctly you should see a Pull Request comment that shows:

  • Diffs in parameters, properties, and evaluation metrics.
  • Experiment IDs and links to both the main and PR branch runs in Neptune. You can go and see all the details of those experiments including learning curves and performance charts that were logged for those experiments.
  • A link to a full comparison between those runs in Neptune.

See this Pull Request on Github

Final thoughts

Ok, so in this how-to guide, you learned how to set up a Continuous Integration workflow that creates a comparison table for every Pull Request to master. 

Hopefully, with this information, you will be able to create the CI workflow that works for your machine learning project!


Continuum Industries Case Study: How to Track, Monitor & Visualize CI/CD Pipelines

7 mins read | Updated August 9th, 2021

Continuum Industries is a company in the infrastructure industry that wants to automate and optimize the design of linear infrastructure assets like water pipelines, overhead transmission lines, subsea power lines, or telecommunication cables.  

Its core product Optioneer lets customers input the engineering design assumptions and the geospatial data and uses evolutionary optimization algorithms to find possible solutions to connect point A to B given the constraints. 

As Chief Scientist Andreas Malekos, who works on the Optioneer AI-powered engine, explains:

“Building something like a power line is a huge project, so you have to get the design right before you start. The more reasonable designs you see, the better decision you can make. Optioneer can get you design assets in minutes at a fraction of the cost of traditional design methods.”

Andreas Malekos

Andreas Malekos

Chief Scientist @Continuum Industries

But creating and operating the Optioneer engine is more challenging than it seems:

  • The objective function does not represent reality
  • There are a lot of assumptions that civil engineers don’t know in advance
  • Different customers feed it completely different problems, and the algorithm needs to be robust enough to handle those

Instead of building the perfect solution, it’s better to present them with a list of interesting design options so that they can make informed decisions.

The engine team leverages a diverse skillset from mechanical engineering, electrical engineering, computational physics, applied mathematics, and software engineering to pull this off.


A side effect of building a successful software product, whether it uses AI or not, is that people rely on it working. And when people rely on your optimization engine with million-dollar infrastructure design decisions, you need to have a robust quality assurance (QA) in place.

As Andreas pointed out, they have to be able to say that the solutions they return to the users are:

  • Good, meaning that it is a result that a civil engineer can look at and agree with
  • Correct, meaning that all the different engineering quantities that are calculated and returned to the end-user are as accurate as possible

On top of that, the team is constantly working on improving the optimization engine. But to do that, you have to make sure that the changes:

  • Don’t break the algorithm in some way or another
  • They actually improve the results not just on one infrastructure problem but across the board

Basically, you need to set up a proper validation and testing, but the nature of the problem the team is trying to solve presents additional challenges:

  • You cannot automatically tell whether an algorithm output is correct or not. It is not like in ML where you have labeled data to compute accuracy or recall on your evaluation set. 
  • You need a set of example problems that is representative of the kind of problem that the algorithm will be asked to solve in production. Furthermore, these problems need to be versioned so that repeatability is as easily achievable as possible.
Continue reading ->
CI CD tool for ML

Continuous Integration and Continuous Deployment (CI/CD) Tools for Machine Learning

Read more

4 Ways Machine Learning Teams Use CI/CD in Production

Read more
Why You Should Use Continuous Integration and Continuous Deployment in Your Machine Learning Project

Why You Should Use Continuous Integration and Continuous Deployment in Your Machine Learning Projects

Read more
GreenSteam MLOps toolstack

MLOps at GreenSteam: Shipping Machine Learning [Case Study]

Read more