Neptune Blog

4 Ways Machine Learning Teams Use CI/CD in Production

Stephen Oladele

9 min

31st July, 2023

MLOps

One of the core concepts in DevOps that is now making its way to machine learning operations (MLOps) is CI/CD—Continuous Integration and Continuous Delivery or Continuous Deployment. CI/CD as a core DevOps practice embraces tools and methods to deliver software applications reliably by streamlining the building, testing, and deployment of your applications to production. Let’s define these concepts below:

Continuous integration (CI) is the practice of automating the building and testing of code every time it is committed with version control and pushed to a code repository (to build the application).

Continuous delivery (CD) is the practice of deploying every build to a production-like environment and performing automated integration and testing of the application before it is deployed.

Continuous deployment (CD) compliments continuous integration with additional steps by automating the configuration and deployment of the application to a production environment.

*Continuous Integration vs Continuous Delivery vs Continuous Deployment | Source*

Most CI/CD tools developed over the past years have been purpose-built for traditional software applications. As you (probably) know, developing and deploying traditional software applications is quite different from building and deploying machine learning (ML) applications in a number of ways. The questions then become:

How would ML teams adopt existing CI/CD tools to suit their machine learning use cases?
Are there better options out there that are specially purpose-built for ML applications?

In this article, you will learn about how 4 different teams are using—or have used—CI/CD concepts, tools, and techniques to build and deploy their machine learning applications. The purpose of this article is to give you a broad perspective of CI/CD usage from the different solutions implemented by these teams, including their use cases.

Continuous Integration and Delivery (CI/CD) for Machine Learning (ML) with Azure DevOps

In this section, we walk through the workflow of a team that orchestrates CI/CD processes for their machine learning workloads in Azure DevOps. Their machine learning workloads mostly run on Azure Cloud.

Thanks to Emmanuel Raj for granting me an interview on how his team does CI/CD for their ML workloads. This section leverages both the responses gotten from Emmanuel during the interview and his very practical book on MLOps; Engineering Machine Learning Operations (MLOps).

Industry

Retail and consumer goods.

Use case

This team helps a retail client to resolve tickets in an automated way using machine learning. When users raise tickets or they are generated by maintenance problems, machine learning is used to classify the tickets into different categories, helping in the faster resolution of the tickets.

Core CI/CD tools

Overview

To orchestrate their CI/CD workflow, the team used the Azure DevOps suite of products. They also configured development and production environments for their ML workloads. These workflows consist of all CI/CD processes that happen before deploying the model to production and after deployment.

Azure DevOps logo | Source

CI/CD workflow before deploying the model to production

To automate the dev-to-production cycle, the team set up build and release tasks with Azure DevOps Pipelines. The build pipeline generates the model artifacts from a candidate source code and after model serialization (mostly using ONNX).

The artifacts are deployed to infrastructure targets using the release pipelines. The release pipelines move the artifacts to the quality assurance (or QA) stage after they have been tested in the development environment.

The model testing happens in the QA stage where A/B tests and stress tests are performed on the model service by the team to make sure the model is ready to be deployed to the production environment.

A human validator, usually the product owner, ensures the model passes the tests, has been validated, and then approves the model to be deployed to the production environment using the release pipelines.

CI/CD workflow after deploying the model to production

After deploying the model to production, the team sets up cron jobs (as part of their CI/CD pipeline) that monitor model metrics for data drift and concept drift on a weekly basis so that the pipeline can be triggered when an unacceptable drift occurs that requires retraining the model.

They also monitor the performance of their CI/CD pipeline in production by inspecting the pipeline releases in Azure DevOps Pipelines. The purpose of the inspection is to ensure their CI/CD pipeline is healthy and in a robust state. The guidelines they follow to inspect their CI/CD pipeline, keeping it healthy and robust, include:

Auditing system logs and events periodically.
Integrating automated acceptance tests.
Requiring pull requests to make changes to the pipeline.
Peer code reviews for each story or feature before they are added to the pipeline.
Regularly reporting metrics that are visible to all members of the team.

To summarize, Azure DevOps provides the team with a set of useful tools that enable the development (machine learning; model building) and operations teams for this project to work in harmony.

Continuous Integration and Delivery (CI/CD) for ML with GitOps using Jenkins and Argo workflows

In this section, you will learn how a team is able to create a framework for how to orchestrate their machine learning workflows and run CI/CD pipelines with GitOps.

Thanks to Tymoteusz Wolodzko, a former ML Engineer at GreenSteam for granting me an interview. This section leverages both the responses gotten from Tymoteusz during the interview and his case study blog post on the neptune.ai blog.

Industry

Computer software

Use case

GreenSteam – An i4 Insight Company provides software solutions for the marine industry that help reduce fuel usage. Excess fuel usage is both costly and bad for the environment, and vessel operators are obliged to get more green by the International Maritime Organization and reduce the CO2 emissions by 50 percent by 2050.

Core CI/CD tools

Argo
Jenkins

Overview

To implement CI/CD, the team leveraged GitOps using Jenkins running code quality checks and smoke tests using production-like runs in the test environment. The team had a single pipeline for model code where every pull request was going through code reviews and automated unit tests.

The pull requests also went through automated smoke tests where they were training models and making predictions, running the entire end-to-end pipeline on some small chunk of real data to ensure each stage of the pipeline does what is expected and nothing breaks.

For the continuous delivery of models, after training each model a model quality report was generated and was reviewed by a domain expert through a manual process before they were eventually deployed manually after getting validated by the domain expert and passing all prior checks.

Understanding GitOps

GitOps applies a Git-centric approach on top of some common DevOps principles, practices, and tools. In GitOps the code and configuration stored in a repository are considered as a source of truth, where the infrastructure adapts to the changes in code. GitOps helped deliver their pipelines on Amazon EKS at the pace the team required without operational issues.

Code quality checks and using Jenkins to manage the CI pipeline

Jenkins is one of the most popular tools used for continuous integration among developers. The team adopted Jenkins for continuous integration to make their suite of tests, checks, and reviews more efficient.

To maintain consistency in code quality, they moved all the code checks into the Docker container containing the model code, so versions and configs of tools for code quality checks including flake8, black, mypy, pytest were all unified. This also helped them with unifying the local development setup with what they used on Jenkins.

Docker ensured they had no more problems with different versions of dependencies that could lead to different results locally and in Jenkins or in production.

For local development, they had a Makefile to build the Docker image and run all the checks and tests on the code.

For code reviews, they set up Jenkins and it was running the same checks as a part of the CI pipeline.

Using Argo to manage CI/CD pipelines

The team needed to test their model for multiple datasets of different clients in different scenarios. As Tymoteusz Wolodzko admitted in his explainer blog post, that was not something they wanted to set up and run manually.

They needed orchestration and automated pipelines which they should be able to plug into the production environment easily. Dockerizing their ML code made it easy to move the application across different environments, and that includes the production environment.

For orchestration, the team switched from Airflow to Argo Workflows, so plugging in their container was just a matter of writing a few lines of YAML code.

Argo Workflows & Pipelines is an open-source container-native workflow engine for orchestrating parallel jobs on Kubernetes. It is a cloud-native solution designed from the ground up for Kubernetes. You can define pipeline workflows, where individual steps are taken as a container.

Argo Workflows allowed the team to easily run compute-intensive jobs for machine learning or data processing on their Amazon EKS clusters. The models in the pipeline would retrain periodically, based on scheduled jobs, and also undergo the necessary tests and checks. But before the models were deployed, they were reviewed and audited by a domain expert. Once the expert validated that the model is good to be deployed, the models would then be deployed manually.

Below is an illustration showing the team’s entire stack for ML workloads:

GreenSteam-MLOPs-toolstack_1 — *MLOps technological stack at GreenSteams | Source*

Continuous Integration and Delivery (CI/CD) for ML with AWS CodePipeline and Step Functions

To orchestrate their CI/CD workflow, the team in this section used a combination of AWS CodePipeline and AWS Step Functions to ensure they are building an automated MLOps pipeline.

Thanks to Phil Basford for granting me an interview on how his team did CI/CD for a public ML use case.

Industry

Transportation and logistics.

Use case

For this use case, the team is from a consulting and professional services company that worked on a public project. Specifically, they built machine learning applications that solved problems like:

Predicting how long it will take to deliver a parcel,
Predicting a location, based on unstructured address data and resolving it to a coordinate system (latitude/longitude).

Core CI/CD tools

AWS CodeBuild – A fully managed continuous integration service that compiles source code, runs tests, and produces software packages that are ready to deploy.
AWS CodePipeline – A fully managed continuous delivery service that helps you automate your release pipelines.
AWS Step Functions – A serverless function orchestrator that makes it easy to sequence AWS Lambda functions and multiple AWS services.

Overview

AWS Cloud provides managed CI/CD workflow tools like AWS CodePipeline and AWS Step Functions to carry out continuous integration and continuous delivery for their machine learning projects. For continuous integration, the team used git to make commits to AWS CodeCommit which triggers a build step in CodePipeline (through an AWS CodeBuild job), with AWS Step Functions handling the orchestration of the workflows for every action from CodePipeline.

Understanding the architecture

The workflow orchestration process from AWS Step Functions made it easy for the team to manage the complexities that arise from running multiple models and pipelines with CodePipelines. Multi-model deployments made by the team are easier to manage and update because each pipeline job in CodePipeline focuses on one process, builds are also simpler to deliver and troubleshoot.

Below is an example of a project that uses AWS CodePipeline along with Step Functions for orchestrating ML pipelines that require custom containers. Here, CodePipeline invokes Step Functions and passes the container image URI and the unique container image tag as parameters to Step Functions:

*Architecture to build a CI/CD pipeline for deploying custom machine learning models using AWS services | Source*

You can learn more about the architecture above in this blog post. While this team opted to use those tools to manage and orchestrate, it is worth noting that for continuous integration and continuous delivery (CI/CD) pipelines, AWS released Amazon SageMaker Pipelines, an easy-to-use, specifically designed CI/CD service for ML.

Pipelines is a native workflow orchestration tool for building ML pipelines that takes advantage of direct SageMaker integration. You can learn more about building, automating, managing, and scaling ML workflows using Amazon SageMaker Pipelines in this blog post.

Continuous Integration and Delivery (CI/CD) for ML with Vertex AI and TFX on Google Cloud

In this section, we will take a look at a team that was able to leverage pipelines that are more native to machine learning projects than traditional software engineering projects, in choosing and using their workflow orchestration and management tools.

This section leverages Hannes Hapke’s (ML Engineer at Digits Financial, Inc.) workshop on ‘Rapid Iteration with Limited DevOps Resources” during Google Cloud’s Applied ML online summit.

Industry

Business intelligence and financial technology services.

Use case

Digits Financial, Inc. is a fin-tech company offering a visual, machine learning-powered expense monitoring dashboard for startups and small businesses. Their use cases are focused on:

Creating the most powerful finance engine for modern businesses that is able to ingest and convert a company’s financial information into a live model of business.
Extracting information from unstructured documents to predict future events for customers.
Clustering information to surface what’s most important for the customers’ businesses.

Core CI/CD tools

Overview

The team at Digits was able to orchestrate and manage the continuous integration, delivery, and deployment of their machine learning pipelines through the managed Vertex AI Pipelines product and TensorFlow Extended, all running on Google Cloud infrastructure.

Using an ML-native pipeline tool over traditional CI/CD tools helped the team to ensure consistency in the quality of models, and make sure the models are going through the standard workflows of feature engineering, model scoring, model analysis, model validation, and model monitoring in one unified pipeline.

Machine learning pipelines with TFX

With Tensorflow Extended, the team was able to treat each component of their machine learning stack as individual steps that can be orchestrated by third-party tools such as Apache Beam, Apache Airflow, or Kubeflow Pipelines, when the pipeline is deployed to a testing environment or to their production environment. They were also able to create custom components and add them to their pipeline which would have been very difficult to leverage using traditional CI/CD tools.

Along with this, they also moved their ML pipelines from Kubeflow to Vertex AI Pipeline from Google Cloud—helping them easily tie together model development (ML) and operations (Ops) into high-performance and reproducible steps.

One of the core advantages of using Vertex AI Pipelines provided by the team was that it helped them transition from managing their pipeline (self-hosted Kubeflow Pipelines) to leveraging the managed Vertex AI Pipeline service for workflow orchestration, thus shedding the need to maintain databases that store metadata, launch clusters to host and operate the build servers and pipelines.

Orchestrating with Vertex AI Pipelines

Vertex AI is a managed ML platform for every practitioner to speed up the rate of experimentation and accelerate the deployment of machine learning models. It helped the team to automate, monitor, and govern their ML systems by orchestrating their ML workflow in a serverless manner and storing their workflow’s artifacts using Vertex ML Metadata.

By storing the artifacts of their ML workflow in Vertex ML Metadata, they could analyze the lineage of their workflow’s artifacts — for example, an ML model’s lineage may include the training data, hyperparameters, and code used by the team to create the model.

A screenshot of Vertex AI ML Pipeline orchestration from the Digits team — *A screenshot of Vertex AI ML pipeline orchestration from the digits team | Source*

The workflow for the team involved preparing and executing their machine learning pipelines with TensorFlow Extended and shipping them to Vertex AI. They could then manage and orchestrate their pipelines from Vertex AI Pipelines without having to operate their own clusters.

Benefits from using machine learning pipelines

The team was able to benefit from using ML pipelines to orchestrate and manage their ML workloads in a couple of ways. As described in this video by Hannes Hapke, the startup was able to gain the following benefits:

Using ML pipelines reduced the DevOps requirements for the team.
Migrating to managed ML pipelines reduced the expense of running 24/7 clusters when they hosted the pipelines on their infrastructure.
Since ML pipelines are native to ML workflows, model updates were easy to integrate and were automated, freeing up the team to focus on other projects.
Model updates were consistent across all ML projects because the teams could run the same tests and reuse the entire pipeline or components of the pipeline.
One-stop place for all machine learning-related metadata and information.
Models could now automatically be tracked and audited.

Check also

How to Build a CI/CD MLOps Pipeline [Case Study]

Conclusion

One interesting point that this article sheds light on is that using CI/CD tools is not enough to successfully operationalize your machine learning workloads. While the majority of teams in this article still use traditional CI/CD tools, we are beginning to see the emergence of ML-native pipeline tools that could help teams (regardless of their size) deliver better machine learning products faster and more reliably.

If you and your team are considering adopting a CI/CD solution for your machine learning workloads, any one of the ML-native pipeline tools may be worth starting out with over traditional software engineering-based CI/CD tools, depending on your team’s circumstances and the tool or vendor that is favorable to work with, of course.

For these tools, you can check out:

Kubeflow Pipelines (which has some distributions and managed options on AWS as MiniKF on AWS Marketplace and on Google Cloud)
Google Cloud Vertex AI Pipelines
AWS SageMaker Pipelines
MLflow

For the next steps on leveraging CI/CD for ML, you can check out the following articles:

Till next time, happy Ops-ing!

References and resources

Orchestrating CI/CD with Azure DevOps

CI/CD with GitOps using Jenkins and Argo Workflows

Simplified CI/CD on AWS Cloud with AWS CodePipeline and Step Functions

Using Vertex AI and TFX to orchestrate ML pipelines on Google Cloud.

Was the article useful?

More about 4 Ways Machine Learning Teams Use CI/CD in Production

Check out our product resources and related articles below:

We are joining OpenAI

Synthetic Data for LLM Training

What are LLM Embeddings: All you Need to Know

Detecting and Fixing ‘Dead Neurons’ in Foundation Models

Explore more content topics:

Computer Vision General LLMOps ML Model Development ML Tools MLOps Natural Language Processing Paper Reflections Reinforcement Learning Tabular Data Time Series

Neptune is the experiment tracker purpose-built for foundation model training.

It lets you monitor and visualize thousands of per-layer metrics—losses, gradients, and activations—at any scale. Drill down into logs and debug training issues fast. Keep your model training stable while reducing wasted GPU cycles.

Play with a live project

See Docs