Blog » ML Tools » Continuous Integration and Continuous Deployment (CI/CD) Tools for Machine Learning

Continuous Integration and Continuous Deployment (CI/CD) Tools for Machine Learning

In modern software development teams, continuous integration (CI) and continuous deployment (CD) are standard practices. 

CI is about how the project should be built and tested in various runtimes, automatically and continuously. 

CD is needed so that every new bit of code that passes automated testing can be released into production with no extra effort. 

Adopting CI/CD tools can be very beneficial in ML projects. These tools help you find errors and contradictions in code quickly and, in the long run, reduce the costs of downtime. 

In real-world projects, it can be difficult for specialists to select high-quality machine learning models that won’t conflict with the project management schemes adopted at enterprises. This results in missed deadlines and over-budget spending. 

To stay on budget and meet deadlines (especially in the early stages), ML teams have adopted best practices like CI/CD from traditional software development.

Doing everything manually is painful. You have to allocate a separate server and keep it in working order, ensure the availability of necessary software systems, set up runtime environments, make backup copies of data, etc. 

It’s convenient to delegate these responsibilities to third-party services. There are many tools that facilitate this adoption, a lot of them are specialized for machine learning engineers. 

In this post, we’re going to talk about the seven best tools for CI/CD in your machine learning project. 

Best CI/CD services for Machine Learning

Machine learning engineers spend a lot of time checking their models for trouble and optimizing model performance. CI/CD tools eliminate time waste and automate manual work as much as possible.

Let’s take a closer look at them.

Plugins Documentation Container Support Hosting Operating System Pricing
CML 2/5 Reasonable Yes On premise/cloud Any Free
GitHub Actions 3/5 Poor Yes On premise/cloud Linux, macOS, Windows Free
GitLab CI/CD 2/5 Good Yes On premise/cloud Linux, macOS, Windows Free/Paid
Jenkins 5/5 Great Yes On premise/cloud Windows, Linux, macOS and other Unix-like operating systems Free
TeamCity 5/5 Great Yes On premise/cloud Linux, macOS, Windows $45-$1999
Circle CI 4/5 Reasonable Yes On premise/cloud Ubuntu, macOS, Linux, Android and Windows Free/paid for some configurations
Travis CI 5/5 Good Yes On premise/cloud Ubuntu, macOS Free for open-source/ $69-$479 for commercial use

CI/CD Tools Comparison

1. CML 

CML - CICD tools

CML, or Continuous Machine Learning, is an open-source tool for continuous integration and deployment of machine learning projects. This is one of the few projects specifically tailored for the needs of MLOps, so you can easily give it a go.

This tool was created by the same team that made DVC, an open-source library for versioning ML models. CML strives to make the implementation and deployment of ML models easy and bring them to the stage of delivery faster, with fewer bugs.

CML was designed to automate some of the routine tasks that ML engineers have to perform daily, such as training models, evaluating their performance, creating and labeling datasets, and so on. CML has a wide range of features: from making and sending automated reports, to publishing data, to distributing cloud resources for a project. For infrastructure tasks, you can apply the already-mentioned DVC or use Docker instead. 

This instrument supports GitFlow for data science projects, allows automatic generation of reports, and saves you from the necessity to dive deeper into the complex details of using external services. Examples of these external services are cloud platforms like AWS, Azure, and Google Cloud.

Advantages: 

  • CML uses GitFlow for data science. Whether you’re used to working with GitLab or GitHub, you can manage ML experiments in the environment you’re most familiar with. It’s also possible to use DVC instead of pushing to a Git repo.
  • It has automated reports for ML projects. CML automatically monitors changing datasets and compares ML experiments across project history to generate visual reports with metrics and graphs in each Git pull request. This way, your engineering team can stay alert and make data-driven decisions.
  • CML enables publishing model artifacts from Github Actions workflows into comments with attached GitHub issues and pull requests. This completes the gaps in pull request accounting and leads to higher model accuracy and effectiveness.
  • With CML, you don’t have to use additional services. CML enables you to build your own ML platform with the help of GitLab, Bitbucket, or GitHub. If you prefer, you can also use cloud storage (but it’s optional).

Disadvantages: 

  • This project is young. The creators of CML claim that there are hundreds of active contributors, but the reality is that CML is still at the stage of development. It can contain bugs and low-quality contributions, and there’s no guarantee that the project will be supported in the future. 

Summary:

  • Operating system: Any
  • Container support: Yes
  • Documentation: Good
  • Pricing: Free

2. GitHub Actions

Github Actions - CICD tools

GitHub Actions is a workflow automation feature recently introduced on Github. GitHub provides functionality for CI/CD such as pushing code, creating a release, or managing issues.

After you log in to GitHub, you can immediately get access to GitHub Actions. It provides great possibilities for CI/CD including automated testing, building a container, deploying a web service, or automating the onboarding of new users in your open-source project.

GitHub isn’t exclusively an MLOps tool ― the use cases are much more universal. However, since many companies write and store code for their ML projects on GitHub, for some teams it might easily result in more convenience than any other tool in our list. An undeniable benefit is one of the most active communities in the world, that works on improving and developing this project. 

Github Actions is definitely worth a look. If you don’t want to waste time on custom configurations, Github provides a graphical interface where you can quickly get it all done.

Advantages:

  • You have direct access to the GitHub API with authentication out of the box. No need for complex setup.
  • GitHub actions are just consecutive docker runs. That makes them very easy to reason about and debug. 
  • You can isolate actions in workflow for seamless testing. You can host compilation and testing in two different computing environments. 

Disadvantages:

  • No native caching. You get image and layer caching but nothing else. For built artefacts, you have to roll your own cache, which can be a lot of work. 
  • No support for pull requests from forks. For a workflow that involves forks, that makes GitHub actions largely unusable as a CI/CD tool. 
  • The documentation leaves much to be desired. Hopefully, in the future it will get better. 

Summary:

  • Operating system: Linux, macOS, Windows
  • Container support: Yes
  • Documentation: Poor
  • Pricing: Free

3. GitLab for CI/CD

Gitlab - CICD tools

This is a free, open-source product released under the MIT license and written in Go and Ruby. A separate server based on this solution can handle more than 25,000 users. 

GitLab CI/CD enables you to work with repositories and conduct code review. It also has an error control system. For better user privacy, you can link the tool to Active Directory and LDAP servers by installing it locally.

A large and involved community makes working with the product rather comfortable. GitLab is used not only to write code but also to thoroughly review it. Almost all build environments and version control systems are supported.

Advantages: 

  • Detailed documentation and easy management. There are many active contributors both to the platform and the documentation, which makes it easier to find what you need.
  • Convenient user interface for monitoring test results. Both large and small teams can use this tool for CI/CD, since you can give read and modify rights to individuals and groups of users.
  • Easy resource management. In GitLab CI/CD, you can easily assign people to project checkpoints and group by tasks.
  • Convenient side-by-side testing of pull requests and branches. This makes it a good choice for open-source projects.

Disadvantages: 

  • Artifacts need to be defined and loaded for each task. 
  • Stages within large tasks aren’t supported.

Summary:

  • Operating system: Linux, macOS, Windows
  • Container support: Yes
  • Documentation: Good
  • Pricing: Free, paid for self-hosting 

4. Jenkins

Jenkins - CICD tools

Jenkins is a popular CI/CDJava application that runs on Windows, Mac OS X, other Unix-like operating systems, and it runs under the MIT license. It has a rich set of features to automate the tasks of building, testing, deploying, integrating, and releasing software.

In addition to installation via a traditional installation package, Jenkins can be installed both stand-alone and as Docker on any machine with the Java Runtime Environment (JRE) installed. There’s also a subproject of the Jenkins team called Jenkins X that specializes in CI/CD within Kubernetes clusters.

The Jenkins team has released about 1,500 plugins, so it can be used together with other solutions, for example, with Slack or Jira. Integration is also available for a range of DevOps testing tools. There’s REST API support for remote access to the system. Just like with GitLab, the product has a large, passionate community. 

Advantages: 

  • Easy to set up and update. In just a few steps, you can install Jenkins and start working on your project (however, there’s a twist here that we describe later on). It can also be set up and configured in the web interface.
  • Detailed, understandable documentation. Jenkins has a professional team that works on keeping the documentation in order. 
  • Easy integration with other tools. This instrument provides hundreds of plugins, so it can be integrated with, basically, any CI/CD tool.
  • Multiple machines for distributed assemblies. This is useful in tasks when you need to test a product in several different environments. It also allows you to run assemblies with different conditions.

Disadvantages: 

  • Dedicated server. If you want to use Jenkins you’ll have to set up a server, which entails additional costs for the server itself.
  • For larger teams. Jenkins is an enterprise-level tool. The solution is not great for small projects, as the setup takes a lot of work.

Summary:

  • Operating system: Windows, Linux, macOS
  • Container support: Yes.
  • Documentation: Good
  • Pricing: Free

5. TeamCity

Teamcity - CICD tools

TeamCity is an enterprise-level continuous integration server created by the JetBrains team. It has powerful functionality and a very robust free version for small projects (up to 100 build configurations).

TeamCity comes with extensive support for many open-source plugins – both JetBrains’ products and third-party applications and tools. Also, TeamCity offers awesome .NET support.

Thanks to all this, this continuous integration server is highly reliable and independent of the launch of assemblies. TeamCity integrates with multiple version control systems. 

The software provides the ability to track commits, and then immediately launch build creation and unit tests. For example, if tests or compilation fail after some commit, the developer will receive a notification that the code needs to be revised. Thanks to the convenient web interface, you can see what’s happening in real-time.

Java code is analyzed by IntelliJ IDEA inspections. So, the inspection responsible for code quality can quickly identify and alert programmer errors. It can also find duplicates that happen when developers reuse code from others in the same project.

In the pre-testing mode, you can upload a patch to the CI system before committing, and the system itself will apply it to the project code and all assemblies, then checks will start automatically. This feature is especially useful in cases where it’s difficult for the developer to start testing himself, and the process is performed under different architectures of operating system versions.

Advantages: 

  • Supports multiple platforms and multiple version control systems. JetBrains tried to make their product as well-versed as possible so it supports different programming languages (Ruby, Java, .NET), multiple version control systems (like Git, SVN, Jira) and build runners (Rake, MSBuild, Nant, Ant)
  • Excellent reporting options. TeamCity automatically checks running code and provides reports about bugs even before the build is complete. This way you can identify and fix errors in a reduced amount of time. 
  • Several collaboration options. TeamCity is flexible and efficient to use both for individuals and enterprises. The platform offers subscription packages based on the functionality you need. 

Disadvantages:

  • There can be only one build configuration.
  • Build options can’t be changed when triggered.
  • When running a Bash script, only the result of the last command is checked.
  • Since the main configuration work is carried out in the interface, the performance is proportional to the speed of this interface and is rather slow.

Summary:

  • Operating system: Linux, macOS, Windows
  • Container support: Yes
  • Documentation: Very good
  • Pricing: from $45 to $1999 and more (for enterprises)

6. Circle CI

CircleCI - CICD tools

CircleCI is a cloud-based system that doesn’t require you to set up a separate server and doesn’t need administration. CircleCI is a turnkey solution with minimal configuration. However, there’s also an on-premises version that you can deploy on a private cloud. Even for commercial use, it’s possible to use CircleCI for free. 

Using the REST API, you can access projects, assemblies, and artifacts. An assembly result is an artifact or a group of artifacts. An artifact can be a compiled application or executable files (for example, an APK for Android) or metadata (for example, information about a successful test).

CircleCI caches third-party dependencies to avoid constantly installing required environments. 

CircleCI can be tuned to run very complex pipelines efficiently with sophisticated caching, dock-level caching, resource classes for running on faster machines, and performance evaluations. As a developer, you can use SSH to perform any debugging task, or set up concurrency in the .circleci/config.yml file to speed up jobs. 

As an operator or administrator for CircleCI installed on your servers, you will like that CircleCI monitor and analyzes your builds and uses Nomad Cluster for scheduling.

Advantages:

  • Custom build environments and language support. CircleCI can be configured to simulate the target deployment environment by setting up the pipeline runtime using a Docker image or a LINUX / Windows / macOS virtual machine (VM). You can use the pre-built Docker image provided by CircleCI or choose your own Docker Hub image as the runtime.
  • CircleCI pre-installs the Android SDK, NDK, and other dependencies, and allows you to build and test Android apps in a Linux virtual machine.
  • It supports a wide variety of languages ​​including PHP, Python, Java, Javascript, ReactNative, RoR, Elixir, Scala, etc.
  • CircleCI provides application-level security and sandboxed runtime security. CircleCI is SOC 2 and FedRAMP certified.
  • Free version for commercial use. Even small teams can benefit from the extensive functionality of CircleCI. You don’t even need to set up a dedicated CircleCI server.
  • Flexible configurations. CircleCI can be configured to deploy code to a variety of environments including AWS CodeDeploy, AWS EC2 Container Service (ECS), AWS S3, Google Kubernetes Engine (GKE), Microsoft Azure, and Heroku. Other cloud service deployments are easily written using SSH or by installing a service API client with your work configuration.

Disadvantages:

  • CircleCI in the free version only supports Ubuntu 12.04 and 14.04. You have to pay to use macOS.
  • Despite the fact that CircleCI can work with any programming language, only Go (Golang), Haskell, Java, PHP, Python, Ruby/Rails, and Scala are supported out of the box.

Summary:

  • Operating system: Ubuntu, macOS, Linux, Android, and Windows
  • Container support: Yes
  • Documentation: Reasonable
  • Pricing: Free/paid options up to $3150 and more

7. Travis CI

Travis CI - CICD tools

Travis CI is a continuous integration platform that’s free for all open-source projects hosted on GitHub. With only a file named .travis.yml containing some information about your project, you can trigger automatic builds whenever your codebase changes in the main branch, other branches, or even on a pull request. Don’t confuse travis-ci.org and travis-ci.com. The first is a free service for open-source projects, the second is a paid CI system. Travis CI is very similar to CircleCI.

Both systems:

  • use configuration files in YAML format;
  • are deployed in the cloud;
  • support Docker for running tests.

However, TravisCI has some extra things that CircleCI doesn’t have:

  • Running tests simultaneously under Linux and Mac OS X.
  • Support for more languages out of the box:
    • Android, C, C #, C ++, Clojure, Crystal, D, Dart, Erlang, Elixir, F #, Go, Groovy, Haskell, Haxe, Java, JavaScript (with Node.js), Julia, Objective-C, Perl, Perl6, PHP, Python, R, Ruby, Rust, Scala, Smalltalk, Visual Basic.
  • Build matrix support.

Advantages:

  • Easy and quick start of work. CircleCI has seamless onboarding and can run both on premise and in the cloud.
  • CircleCI can be used to organize a CI/CD pipeline either in CircleCI cloud computing, or you can also implement CI/CD by running CircleCI on your infrastructure. This will give you full control over the functions and maintenance work.
  • You can integrate CircleCI with the cloud ecosystem AWS, Azure, DeployHub, Cloud Foundry, Google Cloud, OpenShift, Serverless, and a range of other environments/platforms to deploy your application.

Disadvantages:

  • Prices are higher compared to CircleCI, there’s no free version for commercial use;
  • It has limited customization options (some things may require third-party software).

Summary:

  • Operating system: Ubuntu, macOS
  • Container support: Yes
  • Documentation: Good
  • Pricing: Free for open-source/ $69-$479 for commercial use

Bonus: Docker/Docker Hub

Docker - CICD tools

We’ve mentioned Docker a lot in this article, and there’s a reason for that. Docker is a tool that, thanks to containerization technology, makes it easy to distribute applications, as well as deploy and run them in the same environment even if the Docker platform itself is running in different environments. 

Many developers consider Docker to be the most efficient container technology. It can be tricky to clear up the instructions for installing Docker but, as soon as you do install it, you won’t regret it. 

Docker Hub is roughly the same as GitHub for Git repositories, or NPM registry for JavaScript packages. It has the following benefits:

  • Continuous development and zero-downtime deployment. Imagine that you have written a script that requires libraries of a certain version, as well as some pre-installed software to run, and you want to share the result of your work with your colleagues. Your colleagues need to either reinstall all the necessary libs for themselves, or everyone should configure Virtualenv on their own. This whole dreary process can be facilitated by supplying your script as a container. All the necessary environments for launching will be supplied with the application, and it will be enough for colleagues to execute one simple command to start.
  • Ease of migration. You have a web application with a deployed web server and a server application. During migration, you can re-register permissions, configs and set up the environment, but it will be easier just to start the prepared container on a new instance than to reconfigure everything again.
  • Running unsafe code. Docker allows you to run any software, including graphical software, by safely isolating it in a container. Therefore, it’s ideal for running all kinds of untrusted or just unsafe code.

Conclusion

Out-of-the-box CI/CD tools are necessary to monitor changes to the code repo and generate unit & integration tests at any stage of the ML project. Since data is an important business resource, just like developer time, automation and optimization tools should become your best friends. 

Machine learning models go through many transformations every day. Frequent changes in data, their transition from one state to another ― all this forces managers to experiment with such software management methods as CI/CD, which let you use complex machine learning models without losing the quality of data management.


READ NEXT

How to Set Up Continuous Integration for Machine Learning with Github Actions and Neptune: Step by Step Guide

Jakub Czakon | Posted August 14, 2020

In software development, Continuous Integration (CI) is a practice of merging code changes from the entire team to the shared codebase often. Before any new code can be merged it is tested and checked for quality automatically. 

CI makes the codebase up-to-date, clean, and tested by design and helps to find any problems with it quickly. 

But what does Continuous Integration mean for machine learning?

The way I see it:

Continuous Integration in machine learning extends the concept to running model training or evaluation jobs for each trigger event (like merge request or commit).

This should be done in a way that is versioned and reproducible to ensure that when things are added to the shared codebase they are properly tested and available for audit when needed.  

Some examples of CI workflows in machine learning could be:

  • running and versioning the training and evaluation for every commit to the repository,
  • running and comparing experiment runs for each Pull Request to a certain branch.
  • creating model predictions on a test set and saving them somewhere on every PR to the feature branch. 
  • about a million other model training and testing scenarios that could be automated.

Good news is today there are tools for that and in this article, I will show you how to set up Continuous Integration workflow with two of those: 

  • Github Actions: that lets you run CI workflows directly from Github
  • Neptune: which makes experiment tracking and model versioning easy
Continue reading ->
ML model management

Machine Learning Model Management: What It Is, Why You Should Care, and How to Implement It

Read more
Best MLOps tools

The Best MLOps Tools and How to Evaluate Them

Read more
CI CD in Machine Learning

Why You Should Use Continuous Integration and Continuous Deployment in Your Machine Learning Projects

Read more
Monitoring models in production

A Comprehensive Guide On How to Monitor Your Models in Production

Read more