Based on the McKinsey survey, 56% of orgs today are using machine learning in at least one business function. It’s clear that the need for efficient and effective MLOps and CI/CD practices is becoming increasingly vital.
This article is a real-life study of building a CI/CD MLOps pipeline. We’ll delve into the MLOps practices and strategies we tried and implemented across some of our projects. This includes the tools and techniques we used to streamline the ML model development and deployment processes, as well as the measures taken to monitor and maintain models in a production environment.
CI/CD pipeline: key thoughts and considerations
Continuous integration and continuous deployment (CI/CD) are crucial in ML model deployments because it allows faster and more efficient model updates and enhancements. CI/CD ensures that models are thoroughly tested and validated before they are deployed to a production environment. This helps to minimize the risk of errors and bugs in the deployed models, which can lead to costly downtime and damage to the organization’s reputation.
Additionally, CI/CD also provides the organization with a clear and transparent audit trail of the changes that have been made to the model, which can be useful for troubleshooting and compliance purposes
The points elaborated below were some of the key considerations that went into our MLOps system design.
We broke the scope of MLOps practices into several key areas, including model development, deployment, and monitoring.
In addition to these key areas, the MLOps system was also planned to address aspects such as
- Version Control: Keeping track of the different versions of the model and code.
- Automation: Automating as many tasks to reduce human error and increase efficiency.
- Collaboration: Ensuring that all teams involved in the project, including data scientists, engineers, and operations teams, are working together effectively.
- Security: Implementing security measures such as access control.
- Costs: Oftentimes, cost is the most important aspect of any ML model deployment.
Technology landscape of CI/CD MLOps system
The infrastructure provided by the client mostly influences the technology landscape of ML model deployments. I would say the same happened in our case.
- 1 The serverless services which can be triggered on demand are in fact a great help for anyone who is looking for efficient ML model deployments. AWS provides several tools to create and manage ML model deployments. Some of these services aren’t specifically meant for ML models, but we managed to adeptly repurpose them for our model deployment.
- 2 If you are somewhat familiar with AWS ML base tools, the first thing that comes to mind is “Sagemaker”. AWS Sagemeaker is in fact a great tool for machine learning operations (MLOps) to automate and standardize processes across the ML lifecycle. But we chose not to go with the same in our deployment due to a couple of reasons. I will discuss the reasons for that in the subsequent sections.
Cost and resource requirements
There are several cost-related constraints we had to consider when we ventured into the ML model deployment journey
- Data storage costs: Storing the data used to train and test the model, as well as any new data used for prediction, can add to the cost of deployment. In the case of our CI/CD-MLOPs system, we stored the model versions and metadata in the data storage services offered by AWS i.e S3 buckets.
- Licensing costs: Oftentimes, we need third-party software libraries to power our solutions. Even though we mostly used the AWS suite of services, some of the offerings turned out to be too costly to be used on a continuous basis. An example would be AWS recognition.
- Cloud service costs: If deploying the model on a cloud platform, usage and service costs can vary depending on the provider and usage. Since they were managed services by AWS, they usually add up to the total bill. We need to be cognizant of these cost additions while using Cloud services and should always opt for serverless on-demand services, which are triggered only on demand. Usage of AWS lambdas to host our code would be a very efficient way of saving cloud costs. This is a blog post from AWS to optimize cloud services costs. Always build the cost of your system efficiently!
- Human resources cost: If a team is needed to develop, deploy, and maintain the model, the cost of human resources can be a constraint. Usually, Machine learning Engineers who are specialized in operationalizing/productionizing are required in order to deploy & maintain an MLOps system.
Accessibility & governance
- Access controls: Implement access controls to ensure that only authorized users have access to the deployed model and its predictions. This can include authentication and authorization mechanisms such as user roles and permissions.
- Auditing: Keep track of who is accessing the deployed model and for what purpose. This can help with compliance and security and in troubleshooting any issues that may arise.
- Monitoring: Continuously monitor the performance of the deployed model to ensure it is working as expected and to detect any issues or errors.
- Code Versioning: By keeping all the previous versions of the deployed model, deployed code can be easily rolled back to a previous version if necessary.
- Data governance: Ensure that the data used to train and test the model, as well as any new data used for prediction, is properly governed. For small-scale/low-value deployments, there might not be many items to focus on, but as the scale and reach of deployment go up, data governance becomes crucial. This includes data quality, privacy, and compliance.
- ML model explainability: Make sure the ML model is interpretable and understandable by the developers as well as other stakeholders and that the value addition provided can be easily quantified.
- Documentation: Keep detailed documentation of the deployed model, including its architecture, training data, and performance metrics, so that it can be understood and managed effectively.
- Security: Implement security measures to protect the model and its predictions from unauthorized access and malicious attacks.
Before discussing how we implemented our pipeline, let’s get a brief background on the project itself.
Building a CI/CD MLOps pipeline: project background
The problem statement
The deployment was to detect and manage claims fraud for a Major insurer. Traditional methods of detecting insurance fraud rely on manual review and investigation of claims, which can be time-consuming, costly, and prone to human error. To address this problem, an automated fraud detection and alerting system was developed using insurance claims data. The system used advanced analytics and mostly classic machine learning algorithms to identify patterns and anomalies in claims data that may indicate fraudulent activity.
The primary goal of this system was to detect potential fraud early and accurately, reducing the financial impact of fraudulent claims on the insurance company and its customers. Our activities mostly revolved around:
- 1 Identifying data sources
- 2 Collecting & Integrating data
- 3 Developing Analytical/ML models
- 4 Integrating the above into a cloud environment
- 5 Leveraging the cloud to automate the above processes
- 6 Making the deployment robust & scalable
Who was involved in the project?
It was a relatively small team, around 6+ people.
- Two Data Scientists: Responsible for setting up the ML models training and experimentation pipelines.
- One Data Engineer: Cloud database integration with our cloud expert.
- One cloud expert (AWS): Setting up the cloud-based systems.
- One Business Analyst: To capture all of the client requirements.
And me as a project lead (basically a Senior Data Scientist), who worked individually on a model development track as well the overall project management aspects too. You might be wondering if it was really small! But this was, in fact, the case. We also developed several APIs to integrate well into the backend application.
As you read through the problem statement section, I mentioned a series of activities we performed to make the ML model deployment to fruition. Let’s discuss those steps we carried out one by one.
Sourcing and preparing the data
There is a popular saying in the data industry that goes like this: “Garbage in, garbage out.” Hence the very first thing to do is to make sure that the data being used is of high quality and that any errors or anomalies are detected and corrected before proceeding with ETL and data sourcing.
If you aren’t aware already, let’s introduce the concept of ETL.
ETL usually stands for “Extract, Transform and Load,” and it refers to a process in data warehousing. The idea behind ETL is to get data from multiple sources, transform it into a format that can be loaded into a target data store (such as a data warehouse), and then load the transformed data for downstream activities such as model building, inference, streaming, etc.
Sourcing the data
In our case, the data was provided by our client, which was a product-based organization. It included data stored in relational databases, simple storage locations, and so on.
As a first step, we built ETL pipelines to transform and store the data in our preferred location. We primarily used ETL services offered by AWS. This data was later consumed by processes like model building, testing validation, and so on.
We had several sources of data, including S3, RDS, streaming data, and so on. But based on how we were about to consume the downstream process, they were again stored in S3, AWS Postgres RDS, and so on. There are specific AWS-managed services to perform these actions, i.e., DataPipeline, Kinesis Firehose service, etc. For more information, please refer to this video.
The data pipelines can be scheduled as event-driven or be run at specific intervals the users choose. Below are some pictorial representations of simple ETL operations we used for data transformation.
In order to give you more context on building complex & serverless ETL pipelines, here is an image from AWS documentation that shows the orchestration of an ETL pipeline with validation, transformation, and so on.
As you can observe, the pipeline is orchestrated by AWS Step Functions and includes error handling, automated retry, and user notification features despite every process being serverless.
There is a huge advantage to the services being serverless. They are triggered on demand and charged only for the volume & computational costs for the time they were engaged. Hence they are ideal for streamed data or in cases where the data needs to be refreshed periodically. You would also see AWS Glue used in the extraction architecture (AWS Glue is touched upon a bit in detail below)
Now we had the data in some simple storage locations like AWS S3. The next job was to ingest and transform them to AWS RDS (MySQL/PostgreSQL), mongo DB, etc. For that, we used another pipeline based on AWS Glue.
AWS Glue consists of a metadata repository known as Glue catalog, an engine to generate the Scala or Python code for the ETL Job, and also does job monitoring, scheduling, and so on. Glue supports only services that run on AWS, such as Aurora, RDS (MySQL, PostgreSQL etc.), Redshift, S3, and so on.
The key steps in creating an ETL pipeline in AWS Glue involved:
- Creating the crawler: Firstly Glue has to crawl the file in order to discover the data schema, so we created one. That involves creating a new crawler and then giving it a name.
- Viewing the table: Once the crawler is in place, select it and run it. It should return with something similar to what is shown below
- Configuring the job: In our case, there was an RDS – PostgreSQL already created (with the appropriate size) to store the structured data. Hence this step essentially includes setting up the connection to this database to S3 from where the Glue job crawls the data and dumps into.
For a detailed view of the process, please refer to this article.
Once the job is saved, you can see the flow diagram of the job and a provision to edit the generated script. Oftentimes, the data lies in multiple tables and with complex relationships governing them. It’s also possible to use some inbuilt transform features provided by AWS Glue
For more information, refer to the article about ETL Data Pipeline In AWS.
Apart from the above methods discussed above, there were other pipelines used as well, based on the data source & transformations required in the same.
So coming to how we addressed these, it was a combination of the above approaches and a few more. Many times we had to build multiple independent data pipelines(based on multiple sources) feeding to the same Target (say, a data warehouse). That means sometimes the data could originally come from a streaming source, in our case, they were text/CSV files being dumped to an S3 storage location at regular intervals.
Another type of data was images with specific event IDs getting dumped to an S3 location. But for this, instead of a Glue job, we had to use Python-based Lambdas to process and resize the images which were getting triggered on demand, then converted into a byte representation and passed to the RDS – PostgreSQL DB. We even had structured data from on-prem servers, which were pulled at regular intervals, processed, and stored in our RDS databases.
Collaborative development and version control
Version control is quite crucial in modern software development that helps teams keep track of changes to their code over time. Git is a distributed version control system for software development. Since its inception, it has become one of the most widely used version control systems in the software industry.
Since our project was mostly powered by AWS infrastructure, naturally, we used AWS CodeCommit, which is a fully-managed version control service provided by Amazon Web Services (AWS). It uses the Git version control system to store and manage source code and provides a centralized repository for developers to collaborate and track changes to their code.
The image below shows a typical code commit usage.
Moreover, AWS CodeCommit easily integrates with other AWS services, such as AWS CodePipeline and AWS CodeBuild, to provide a complete continuous integration and continuous delivery (CI/CD) pipeline.
In the above few sections, I introduced you to the tools we used and the step-by-step process. Now we reached a point where there was a need for a computing environment( preferably not serverless).
We realized in our case Computing environment could be set up on-premise as well as on the cloud. But we went with an AWS EC2 instance for the same considering the deployment could be more robust, meanwhile ensuring high availability as well.
There were other thoughts, such as the total cost of running an EC2 instance and configuring its usage such that the overall costs were at a minimum. We went with the On-Demand EC2 Instances: with this option, we had to pay only by the hour or second, with no long-term commitments or upfront costs. This option is best suited for applications with short-term, irregular workloads or for users who want to try out EC2 without making a long-term commitment.
This configuration turned out to be quite ideal because the whole experimentation/training pipeline with recently available data was found to be taking only around 3-4 hours (considering we were using a high capacity EC2 instance and without GPU), and there wasn’t any significant usage until the next retraining exercise.
Just like using any other computing instance (be it your local machine), working with an AWS EC2 wasn’t much different. The key steps to operationalize it were,
- Launch an EC2 instance: i.e., by going to the EC2 console and launching an EC2 instance. You will need to select the instance type (e.g., t2.micro, m5.large, etc.), choose an Amazon Machine Image (AMI), and configure any necessary settings, such as security groups and key pairs.
- Connecting to the instance: Once the instance is running, you can connect to it using a remote desktop protocol (RDP) client or Secure Shell (SSH). Windows instances can be accessed using RDP, while Linux instances can be accessed using SSH.
- Install and configure required software/packages: Once you are connected to the instance, you can install and configure any necessary software to run your workload. In our case, this was setting up the ML modeling environment.
You may also like
Leveraging MLflow for model experimentation and tracking
At this point, we had our computing instance ready. The next major step was about creating an environment where we could experiment and build machine learning models. Considering we had to do a lot of experimentation with modeling and tuning hyperparameters and so on, using AWS Sagemaker seemed to be a logical choice at that point. But we had to weigh in several aspects which led to the usage of MLflow as a framework for ML model experimentation and tracking.
- MLflow was vendor-agnostic: Could be used with a variety of machine learning frameworks, cloud providers, and deployment platforms, giving users greater flexibility and control over their machine learning workflows.
- It’s open source: That means users have access to the underlying code and are customizable
- Experiment tracking: MLflow provides robust tools for tracking experiments, allowing users to keep track of model performance and compare different versions of models.
Setting up an MLflow tracking server in EC2 could be as easy as running the command “pip install mlflow” in your terminal. But the work doesn’t really stop there. You need to create a framework or write custom code to create the training/retraining pipeline on top of the experimentation tracking facilities provided by MLflow. For an experienced Data Scientist/ML engineer, that shouldn’t come as so much of a problem.
This is an excellent article you can refer to while setting up an MLflow server in your computing environment. You need either cloud or local databases set up beforehand to support tracking features in Mlflow.
The following features of MLflow that we specifically leveraged were:
- MLflow tracking API to log experiments, metadata, artifacts, code versions, and results/inferences of our machine learning experiments.
- MLflow has a model registry that was leveraged for a centralized repository for storing and sharing models.
- Packaging code into reproducible runs.
May be useful
If you’d like to avoid setting up and maintaining MLflow yourself, you can check neptune.ai. It’s an out-of-the-box experiment tracker and model registry. It also offers user access management, so it may be a good alternative if you work in a highly collaborative environment.
Getting our CI/CD pipeline ready!
Alright! Now we have the data ready, computing environment ready, Model experimentation, and tracking server ready too. Now the next steps were about setting up the MLOps pipeline. Here we were already done with the data collection & sourcing, model building & experimentation part. Now logically, the following steps remaining were, i.e., model evaluation. Deployment, monitoring, version control, CI/CD pipelines, and so on. The steps mentioned in this section broadly address these points.
Before jumping into those steps, I would like to briefly introduce you to the several tools (mostly serverless) provided by AWS to enable us to build the CI/CD MLOps pipeline.
- CodeCommit: Fully-managed version control service that provides a centralized repository for storing and managing source code. We have already briefed upon this service in the previous section. CodeCommit acted as our remote project repository. There is nothing wrong with going for other remote repository services like GitHub, Bitbucket etc.
- AWS CodeDeploy: AWS CodeDeploy is again another fully-managed deployment service that can be used to deploy code changes to various environments, such as test, staging, and production. In our use case, CodeDeploy automated the entire deployment process, from uploading code to a deployment package to deploying it to the target environment. This really helped us with reducing any human error and ensured consistency across deployments.
The following steps are a high-level view of how these services were used to implement a CI/CD pipeline:
- Install GIT and configure it in the local system: This is done by going to git-scm.com and downloading and then installing it. Every developer in our project had to do the same.
- Create code commit repository: Created a new repository for the project. Clone the repo in your local system using SSH/HTTP etc.
- Part of the build and testing activities were done locally. Then they were committed to the CodeCommit repository.
- Front end/UI development: Another crucial part of the application was a robust UI, built using React JS. We always used to test the UI/APIs locally before they were pushed into the remote repository.
- Pull requests: To merge changes made in one branch of a Git repository into another branch. When a developer creates a pull request, they propose a set of changes that they have made to the code, typically in a separate branch of the repository.
- The process of merging involves taking the changes made in one branch and incorporating them into another branch. Often one of the senior developers/myself acted as the code reviewer before approving any changes to be merged to the master branch.
- Setting up MLflow for experimentation & Tracking: Already discussed in the previous section.
- The master repository in CodeCommit is cloned to the EC2 instance environment, where we execute the python code hosted for the ML application backend.
- Code hosted on EC2 where MLflow running is triggered after the code is committed. This, in turn, triggers a series of ML model experiments, and the best model is selected out of these and staged for production. This part of the code is completely built within the team
- After this step, CodeBuild is triggered and then compiles, tests, and packages the code.
- And If the build is successful, CodeDeploy is triggered and deploys the code changes to the desired environment. Oftentimes these environments are AWS EC2, ECR – Elastic container registry, and so on. We were using EC2 for deployment.
- CodePipeline, another AWS service, was leveraged for a visual interface for managing the pipeline and for visibility into the status of each stage of the pipeline.
A high-level representation of AWS CodePipeline based production deployment is shown here.
CodePipeline were in fact used to orchestrate each step in the release process. As part of our setup, we plugged in other AWS services into CodePipeline to complete the software delivery pipeline. The image below shows a typical CodePipeline based deployment in AWS documentation. For more information and doing a hands-on, I would request you to go through this video tutorial.
The subsequent steps i.e model monitoring, retraining etc are discussed in detail in the upcoming section.
Why didn’t we go with AWS Sagemaker for code deployment?
Another popular way of ML models deployments is using AWS Sagemaker. There were few reasons why we didn’t consider the use of AWS sagemaker.
- The usage cost turned out to be higher. The cost of running a SageMaker instance varies based on the instance type you choose. There are costs associated with EC2 instances used, the size of the deployed model, data transfer etc.
- While SageMaker is a powerful platform, it requires a very good understanding of machine learning concepts and AWS services to use effectively. This can make it challenging for beginners.
- It was limiting our ability to customize the environment. For example, we couldn’t access the underlying operating system to be able to install specific software packages.
- While AWS SageMaker offers many pre-built machine learning algorithms and frameworks, it didn’t support certain tasks or specific custom models that we wanted to train and deploy.
Model testing and monitoring
Metrics for model testing
As I mentioned while explaining the problem statement, we were building a fraud detection and management solution. Two of the crucial metrics for our use case (based on our observations & discussions with the clients) were recall and model lift.
A lift chart is a visualization tool that helps evaluate the performance of a classification model by comparing the model’s predictions with the actual outcomes. Please read through this article to get a better grasp of the model lift concept.
The steps we carried out to prepare the model lift as a monitoring metric were:
- 1 Train the model on the training set and evaluate its performance on the test set.
- 2 Sort the test set by the model’s predicted probabilities of fraud, from highest to lowest.
- 3 Divide the sorted test set into equal-sized bins or deciles, for example, 10% of the data in each bin is a good practice.
- 4 For each bin, calculate the predicted vs actual fraudulent transactions in that bin. This is the actual lift for that bin.
- 5 Calculate the average percentage of fraudulent transactions across all bins.
- 6 Plot the lift chart, with the x-axis showing the percentage of the dataset (from 0% to 100%) and the y-axis showing the actual lift or the lift for each bin.
The overall model lift can be considered as the predicted vs actual fraud ratios of the top 1 or 2 deciles. The image shown below is a representative lift chart.
In the image shown above, the model lift can be assumed as 3.29 (considering only top decile performance).
Mitigating the problem of data drift
One among our other concerns was data drift, which usually occurs when the data used in production slowly changes in some aspects over time from the data used to train the model. We approached the data drift with some of the aspects mentioned below:
- Data Quality: Data quality validation ensures data is structured as expected and falls in the range to which the ML models are exposed to while training. Also, we wanted to ensure that the data doesn’t contain any empty or nan values as the model will not be expecting those values.
- Model performance monitoring: In this step, we compared actual values with predictions. For example, if you are deploying a forecasting model you can compare the forecast with actual data after say a week.
- Drift evaluation and feedback: Here we put some mechanisms to evaluate the metrics and put some triggers for subsequent actions. AWS Cloudwatch is an excellent tool we used to log these events and send notifications.
Measures to keep the model robust
Apart from the steps mentioned directly above, a few more checks/tests were put in to make the deployment more robust, they were there for:
- Analyzing the model errors and understanding their patterns. This was, in turn, used to improve the models.
- Comparing the performance of the production model to a benchmark model using A/B testing. This was used to evaluate the effectiveness of changes made to the model during retraining etc.
- Real-time monitoring to detect any issues as soon as they occur. Any errors generated, evaluation time taken by the model, etc, were monitored as well.
- Bias detection test: This test checks for bias in the model predictions, which can occur if the model is trained on a biased dataset or even if the data used for testing the model is biased.
- Feature importance test: This test helped actually to identify the most important features used by the model for making predictions. If the importance of a feature changes significantly over time, it may indicate a change in the underlying relationship between the variables. Bi-variate analysis(p-value, correlation, etc) for each of the features w.r.t to the target variable was monitored over the period of time in our use case.
All of these tests, monitoring, etc, were packaged in AWS lambdas which were triggered on demand or through scheduling. For example, in order to test the data drift at a periodic rate of, say 1 week, we set up a lambda with a rule type as ‘Schedule Expression’ and a schedule expression frequency of, say 7 days. This is based on the assumption that the processed as well as streamed data is already available as preprocessed in AWS RDS tables. For more details, you can read through this AWS blog.
Other aspects of our CI/CD pipeline development
In the above sections, we have discussed the usage of AWS-managed services in building a CI/CD pipeline. But most of the code which was used for model building was built locally. For that, we used Pycharm.
Code building IDEs
The reasons for which PyCharm was selected were:
- PyCharm has quite a clean and intuitive interface. That means it makes it easy for users to navigate and access different features.
- PyCharm supports code highlighting, completion, and refactoring and has built-in tools for debugging and testing code.
- It is easy to manage projects with Pycharm with features such as version control integration, virtual environment management, and so on.
- Pycharm is quite customizable, ability to install additional plugins and customize settings to suit our individual development needs.
Cloud 9 (AWS managed service)
AWS Cloud 9 is a cloud-based integrated development environment (IDE) that lets you write, run, and debug your code with just a browser. It includes a code editor, debugger, and terminal. Once you have the code in the cloud (i.e AWS code commit), it’s easier to make quick edits through the browser and commit there itself, rather than relying on local IDEs. But I would recommend major code changes through local environments.
Some of the other aspects cloud 9 helped us with were:
- Easy setup: Eliminated the need for local IDE installation eventually and configuration by providing a fully managed environment that can be accessed through a web browser.
- Collaboration: Allowed multiple users to work on the same codebase simultaneously.
- Integrated tools: Code completion, debugging, and version control. These were extremely useful and helped to reduce the time and effort required to write high-quality code.
- Scalability: This eliminated our worry about hardware limitations or infrastructure maintenance as it could scale w.r.t to the project requirements.
Recommended for you
Creating a CI/CD MLOps pipeline using primarily AWS services, Mlflow, and other open-source tools can significantly improve the efficiency and reliability of machine learning deployments. With AWS services like AWS CodePipeline, CodeBuild, CodeDeploy, Mlflow, etc., developers can create an efficient pipeline that automates the building, testing, and production deployment of their models, making the process faster and more consistent.
By using open-source tools like Mlflow, developers can take advantage of a robust and flexible platform for managing the entire model development lifecycle while ensuring high levels of customizability. This also allows users to easily track experiments, share models, and reproduce results, ensuring that models are reliable in production.
But there is definitely room for improvement in our deployment as well.
- 1 Developers could consider implementing additional monitoring and automation tools to detect issues and improve performance in real-time.
- 2 Additionally, they could explore Amazon SageMaker which was avoided in this deployment due to several reasons. I am recommending its usage because this is a platform that can manage machine learning model deployments at scale.
Overall, with the right tools and strategies in place, organizations can use AWS services and other open-source tools to create a streamlined, efficient, and reliable CI/CD MLOps pipeline that improves their machine learning deployment process and delivers greater value to their customers.
- AWS Code pipeline
- What is ETL – AWS
- AWS Glue for loading data from a file to the database (Extract, Transform, Load)
- AWS CodeCommit
- Using AWS lambda with scheduled events
- A better way to explain your classification model