In software development, Continuous Integration (CI) is a practice of merging code changes from the entire team to the shared codebase often. Before any new code can be merged it is tested and checked for quality automatically.
CI makes the codebase up-to-date, clean, and tested by design and helps to find any problems with it quickly.
But what does Continuous Integration mean for machine learning?
The way I see it:
Continuous Integration in machine learning extends the concept to running model training or evaluation jobs for each trigger event (like merge request or commit).
This should be done in a way that is versioned and reproducible to ensure that when things are added to the shared codebase they are properly tested and available for audit when needed.
Some examples of CI workflows in machine learning could be:
- running and versioning the training and evaluation for every commit to the repository,
- running and comparing experiment runs for each Pull Request to a certain branch.
- creating model predictions on a test set and saving them somewhere on every PR to the feature branch.
- about a million other model training and testing scenarios that could be automated.
Good news is today there are tools for that and in this article, I will show you how to set up Continuous Integration workflow with two of those:
- Github Actions: that lets you run CI workflows directly from Github
- Neptune: which makes experiment tracking and model versioning easy
You will learn
How to set up a CI pipeline that automates the following scenario.
On every Pull Request from branch develop to master:
- Run model training and log all the experiment information to Neptune for both branches
- Create a comment that contains a table showing diffs in parameters, properties, and metrics, links to experiments and experiment comparison in Neptune
See this Pull Request on Github

CI for machine learning: Step-by-step guide
Before you start
Make sure you meet the following prerequisites before starting the how-to steps:
Note:
You can see this example project with the markdown table in the Pull Request on Github. Workflow config, environment file, and the training script are all there.
Step 1: Add Neptune logging to your training scripts
In this example project, we will be training a lightGBM multiclass classification model.
Since we want to properly keep track of models we will also save the learning curves, evaluation metrics on testset, and performance charts like the ROC curve.
1. Add Neptune tracking to your training script
Let me show you first and explain later.
import os
import lightgbm as lgb
import matplotlib.pyplot as plt
import neptune
from neptunecontrib.monitoring.lightgbm import neptune_monitor
from scikitplot.metrics import plot_roc, plot_confusion_matrix, plot_precision_recall
from sklearn.datasets import load_wine
from sklearn.metrics import f1_score, accuracy_score
from sklearn.model_selection import train_test_split
PARAMS = {'boosting_type': 'gbdt',
'objective': 'multiclass',
'num_class': 3,
'num_leaves': 8,
'learning_rate': 0.01,
'feature_fraction': 0.9,
'seed': 1234
}
NUM_BOOSTING_ROUNDS = 10
data = load_wine()
X_train, X_test, y_train, y_test = train_test_split(data.data,
data.target,
test_size=0.25,
random_state=1234)
lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)
# Connect your script to Neptune
neptune.init(api_token=os.getenv('NEPTUNE_API_TOKEN'),
project_qualified_name=os.getenv('NEPTUNE_PROJECT_NAME'))
# Create an experiment and log hyperparameters
neptune.create_experiment('lightGBM-on-wine',
params={**PARAMS,
'num_boosting_round': NUM_BOOSTING_ROUNDS})
gbm = lgb.train(PARAMS,
lgb_train,
num_boost_round=NUM_BOOSTING_ROUNDS,
valid_sets=[lgb_train, lgb_eval],
valid_names=['train', 'valid'],
callbacks=[neptune_monitor()], # monitor learning curves
)
y_test_pred = gbm.predict(X_test)
f1 = f1_score(y_test, y_test_pred.argmax(axis=1), average='macro')
accuracy = accuracy_score(y_test, y_test_pred.argmax(axis=1))
# Log metrics to Neptune
neptune.log_metric('accuracy', accuracy)
neptune.log_metric('f1_score', f1)
fig_roc, ax = plt.subplots(figsize=(12, 10))
plot_roc(y_test, y_test_pred, ax=ax)
fig_cm, ax = plt.subplots(figsize=(12, 10))
plot_confusion_matrix(y_test, y_test_pred.argmax(axis=1), ax=ax)
fig_pr, ax = plt.subplots(figsize=(12, 10))
plot_precision_recall(y_test, y_test_pred, ax=ax)
# Log performance charts to Neptune
neptune.log_image('performance charts', fig_roc)
neptune.log_image('performance charts', fig_cm)
neptune.log_image('performance charts', fig_pr)
It is a typical model training script with a few additions:
- We connected Neptune to the script with
neptune.init()
and passed our API token and the project name - We created an experiment and saved parameters with
neptune.create_experiment(params=PARAMS)
- We added learning curves callback with
callbacks=neptune_monitor()
- We logged test evaluation metrics with
neptune.log_metric()
- We logged performance charts with
neptune.log_image()
Now when you run your script:
python train.py
You should get something like this:
See this experiment in Neptune
2. Add a snippet that is run only in the CI environment.
Add the following snippet at the bottom of your training script.
if os.getenv('CI') == "true":
neptune.append_tag('ci-pipeline', os.getenv('NEPTUNE_EXPERIMENT_TAG_ID'))
What this does is:
- fetch the
CI
environment variable to see whether code is run inside the Github Actions workflow - add a
ci-pipeline
tag to the experiment so that it is easier to filter out things in Neptune UI - get the
NEPTUNE_EXPERIMENT_TAG_ID
environment variable used to identify the experiment in the CI workflow and log it to Neptune (this will become clear later).
Step 2: Create an environment file
Having an environment setup file which makes it easy to create your training or evaluation environment from scratch is generally a good practice.
But when you are training models in the CI workflow (like Github Actions) this is a must. The environment where the workflow is executed (and models are trained) will be created from scratch every time your workflow is triggered.
There are a few choices when it comes to environment setup files. You can use:
- Pip and the
requirements.txt
, - Conda and the
environment.yaml
, - Docker and the
Dockerfile
(this is often the best option)
Let’s go with the simplest solution and create a requirements.txt
file with all the packages we need:
requirements.txt
lightgbm==2.3.1
neptune-client==0.4.125
neptune-contrib==0.24.8
numpy==1.19.0
scikit-learn==0.23.1
scikit-plot==0.3.7
Now, whenever you need to run your training install all the packages with:
pip install -r requirements.txt
Step 3: Set up Github Secrets
GitHub Secrets allow you to pass sensitive information like keys or passwords to the Github CI workflow runners so that your automated tests can be executed.
In our case, two sensitive things are needed:
NEPTUNE_API_TOKEN
: I’ll set it to the key of anonymous Neptune userANONYMOUS
NEPTUNE_PROJECT_NAME
: I’ll set it to the open projectshared/github-actions
Note:
You can set those to your API token and the Neptune project you created.
Without those, Github wouldn’t know where to send the experiments and Neptune wouldn’t know who is sending them (and whether this should be allowed).
To set up GitHub Secrets:
- Go to your Github project
- Go to the Settings tab
- Go to the Secrets section
- Click on New secret
- Specify the name and value of the secret (similarly to environment variables)
Step 4: Create .github/workflows directory and a .yml action file
Github will run all workflows that you define in the .github/workflows
directory with .yml configuration files. Which means you need to:
1. Create .github/workflows
directory:
Go to your project repository and create both .github
and .github/workflows
directories.
mkdir -p .github/workflows
2. Create neptune_action.yml
Workflow configs that define actions are .yml files of a certain structure that you put in the .github/workflows
directory. You can have multiple .yml files to fire multiple workflows.
I will create just one neptune_action.yml
touch .github/workflows/neptune_action.yml
As a result, you should see:
your-repository ├── .github │ └── workflows │ └── neptune_action.yml ├── .gitignore ├── README.md ├── requirements.txt └── train.py
Step 5: Define your workflow .yml config
Workflow configs are .yml files where you specify what you want to happen and when.
In a nutshell, you define:
on
which Github event you would like to trigger the workflow. For example, on a commit to the master branch,- what are the
jobs
you would like to perform. This is mostly to organize the config, - where should those
jobs
be performed. For example,runs-on: ubuntu-latest
will run your workflow on the latest version of ubuntu. - what are the
steps
within each job that you would like to run sequentially. For example, create an environment, run training, and run evaluation of the model.
In our machine learning CI workflow we need to run the following sequence of steps:
- Checkout to branch
develop
- Setup the environment and run model training on branch
develop
- Checkout to branch
master
- Setup the environment and run model training for branch
master
- Fetch data from Neptune and create an experiment comparison markdown table
- Comment on the PR with that markdown table
Here is the neptune_action.yml
that does all that. I know it seems complex but in reality, it’s just a bunch of steps that run terminal commands with some boilerplate around it.
neptune_action.yml
name: Neptune actions
on:
pull_request:
branches: [master]
jobs:
compare-experiments:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.7]
env:
NEPTUNE_API_TOKEN: ${{ secrets.NEPTUNE_API_TOKEN }}
NEPTUNE_PROJECT_NAME: ${{ secrets.NEPTUNE_PROJECT_NAME }}
steps:
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Checkout pull request branch
uses: actions/checkout@v2
with:
ref: develop
- name: Setup pull request branch environment and run experiment
id: experiment_pr
run: |
pip install -r requirements.txt
export NEPTUNE_EXPERIMENT_TAG_ID=$(uuidgen)
python train.py
echo ::set-output name=experiment_tag_id::$NEPTUNE_EXPERIMENT_TAG_ID
- name: Checkout main branch
uses: actions/checkout@v2
with:
ref: master
- name: Setup main branch environment and run experiment
id: experiment_main
run: |
pip install -r requirements.txt
export NEPTUNE_EXPERIMENT_TAG_ID=$(uuidgen)
python train.py
echo ::set-output name=experiment_tag_id::$NEPTUNE_EXPERIMENT_TAG_ID
- name: Get Neptune experiments
env:
MAIN_BRANCH_EXPERIMENT_TAG_ID: ${{ steps.experiment_main.outputs.experiment_tag_id }}
PR_BRANCH_EXPERIMENT_TAG_ID: ${{ steps.experiment_pr.outputs.experiment_tag_id }}
id: compare
run: |
pip install -r requirements.txt
python -m neptunecontrib.create_experiment_comparison_comment \
--api_token $NEPTUNE_API_TOKEN \
--project_name $NEPTUNE_PROJECT_NAME \
--tag_names $MAIN_BRANCH_EXPERIMENT_TAG_ID $PR_BRANCH_EXPERIMENT_TAG_ID \
--filepath comment_body.md
result=$(cat comment_body.md)
echo ::set-output name=result::$result
- name: Create a comment
uses: peter-evans/commit-comment@v1
with:
body: |
${{ steps.compare.outputs.result }}
You can just copy this file and paste it into your .github/workflows
directory and it will work out of the box.
That said, there are some things that you may need to adjust to your setup:
- Branch names if you want to trigger your workflow on PR from a branch different then
develop
or to a branch different thanmaster
. - Environment setup steps if you are using anything different than pip and
requirements.txt
. - The command that runs your training scripts.
Note:
Explaining this config in detail would make this post really long so I decided not to :). If you’d like to understand everything about it check out Github Actions Documentation (which is great by the way).
Step 6: Push it to Github
Now you need to push this workflow to GitHub.
git add .github/workflows train.py requirements.txt;
git commit -m "added continuous integration"
Since our workflow will be triggered on every Pull Request to master, nothing will happen just yet.
Step 7: Create a Pull Request
Now everything is ready and you just need to create a PR from branch develop
to master
.
- Checkout to a new branch
develop
git checkout -b develop
2. Change some parameters in train.py
train.py
PARAMS = {'boosting_type': 'gbdt',
'objective': 'multiclass',
'num_class': 3,
'num_leaves': 15, # previously 8
'learning_rate': 0.01,
'feature_fraction': 0.85, #previous 0.9
'seed': 1234
}
3. Add, commit and push your changes to the previously created branch develop
git add train.py;
git commit -m"tweaked parameters"
git push origin develop
4. Go to Github and create a Pull Request from branch develop
to master
.
The workflow is triggered and it goes through all the steps one by one.
Explore the result
If everything worked correctly you should see a Pull Request comment that shows:
- Diffs in parameters, properties, and evaluation metrics.
- Experiment IDs and links to both the main and PR branch runs in Neptune. You can go and see all the details of those experiments including learning curves and performance charts that were logged for those experiments.
- A link to a full comparison between those runs in Neptune.
See this Pull Request on Github

Final thoughts
Ok, so in this how-to guide, you learned how to set up a Continuous Integration workflow that creates a comparison table for every Pull Request to master.
Hopefully, with this information, you will be able to create the CI workflow that works for your machine learning project!
NEXT STEPS
Get started with Neptune in 5 minutes
If you are looking for an experiment tracking tool you may want to take a look at Neptune.
It takes literally 5 minutes to set up and as one of our happy users said:
“Within the first few tens of runs, I realized how complete the tracking was – not just one or two numbers, but also the exact state of the code, the best-quality model snapshot stored to the cloud, the ability to quickly add notes on a particular experiment. My old methods were such a mess by comparison.” – Edward Dixon, Data Scientist @intel
To get started follow these 4 simple steps.
Step 1
Install the client library.
pip install neptune-client
Step 2
Connect to the tool by adding a snippet to your training code.
For example:
import neptune
neptune.init(...) # credentials
neptune.create_experiment() # start logger
Step 3
Specify what you want to log:
neptune.log_metric('accuracy', 0.92)
for prediction_image in worst_predictions:
neptune.log_image('worst predictions', prediction_image)
Step 4
Run your experiment as you normally would:
python train.py
And that’s it!
Your experiment is logged to a central experiment database and displayed in the experiment dashboard, where you can search, compare, and drill down to whatever information you need.
