MLOps Blog

Building MLOps Pipeline for NLP: Machine Translation Task [Tutorial]

12 min
7th August, 2023

Machine learning operations popularly known as MLOps enable us to create an end-to-end machine learning pipeline right from designing the experiment, building the ML model, training and testing, to deploying and monitoring in other words machine learning lifecycle. This field of MLOps is similar to DevOps but specifically tailored for machine learning projects. 

Being a relatively new field, MLOps like machine learning has gained a lot of traction and because of this AI-powered software is becoming prevalent in all industries. It is essential that we have dedicated operations for such a process. MLOps enables us to build AI-powered software with its two main components: continuous integration and continuous deployment. We can create a seamless pipeline from inception to deployment and modify builds at the go. 

In this article, we are going to discuss in detail how to build an MLOps pipeline for machine translation using various technologies. Some of the key technologies that we will be using are:

  • TensorFlow,
  • GitHub Actions,
  • Docker,
  • Kubernetes,
  • and Google Cloud Build.

This tutorial aims to provide you with a complete understanding of how to logically implement MLOps for your own machine learning or data science project. 

What is an MLOps pipeline?

MLOps can be described as a lifecycle of machine learning or data science project. The lifecycle itself consists of three main sections:

  • 1 Design
  • 2 Model development
  • 3 Operations

By combining these three sections we can build an integrated system that can leverage the power of both machine learning and software applications. This pipeline automates the process of data gathering, data preprocessing, training and testing, deploying, and monitoring. Apart from that, it is also capable of detecting any new changes in the build and simultaneously updating the new changes globally. 

Learn more

MLOps: What It Is, Why It Matters, and How to Implement It

Building MLOps pipeline for machine translation: where to start?

In order to make a sleek MLOps lifecycle pipeline, one has to consider the following steps.


Designing is basically a process of understanding the business problem that usually deals with the targeted audience. 

Designing an experiment also includes researching the available resources like knowledge-gathering, type of data that can be used, suitable architectures, financial resources, computation resources, and so forth. 

Generally, in this process, data scientists and machine learning engineers try to look for available solutions and modify them according to the requirements in order to save time.   

In essence, designing sets goals through predicated solutions. 

Problem statement 

As a part of this article, let us consider that we need to build an app that translates Portuguese to English. This problem falls into the category of natural language processing and more specifically machine translation. Now, as a data scientist or an ML engineer, you need to consider the following:

  • 1 What computing language and related library should be used?
  • 2 Where can we get the data from?
  • 3 What must be the core architecture of the model?
  • 4 What should be the training objectives and the output, along with accuracy and loss metrics, optimization techniques, and so forth.
  • 5 What are the deadlines and budget?

Check also

Building Machine Learning Pipelines: Common Pitfalls


Research is part where we explore every possible solution to build the product. For instance, when it comes to choosing a language for building a deep learning model python is the way to go. But if you are an iOS developer then Swift is the preferred language, and some companies like Tesla do consider C and C++ along with Python. For this article let us stick to python as it is the most widely used language for building deep learning and machine learning models.  

Now, for building a deep learning model one of the two python libraries can be used: Tensorflow and Pytorch. Both are extremely popular, versatile, and have large community support. And at this stage, it all boils down to preferences and the advantages one has over the other. In our case, we will use Tensorflow because it has a very well-structured API i.e. Keras, and implementing it requires fewer lines of code compared to Pytorch. 

When the language and the core library are set we can then research the architecture that can be used to implement machine translation. At present we know that most of the SOTA language models heavily use Transformers because of its self-attention mechanism. So we will do the same. 

When it comes to data we can easily download language translation data from almost anywhere but the best practice is to download curated data from legit resources like Kaggle. In our case, we will use the TensorFlow-dataset API to download the data. 

Now let’s understand the directory structure. 

Directory structure

In all projects, the key thing is the directory structure. A well-structured project is beneficial for the reader to follow and collaborate efficiently. In terms of MLOps, it plays an important role since we will be using different technologies to access endpoints from building to deploying. 

A general structure of the MLOps projects looks something like this:

β”œβ”€β”€ kube
β”œβ”€β”€ metadata
β”œβ”€β”€ notebook
β”œβ”€β”€ requirements.txt
└── source

This is the main directory structure and its sub-directories, requirements.txt and files. As we will move on we will keep adding more files to the directories. 

MLOps pipeline for machine translation: model development 

For the sake of this article, we will be using the Notebook provided on the Tensorflow website. The notebook is very informative and gives a thorough idea of how to write and train a machine translation model. 

We will do a few modifications to the notebook and integrate the Neptune client to monitor the model during the training. Now let’s briefly explore and modify the notebook. 


To begin with we must install three libraries: Tensorflow-datasets to download the data, Tensorflow for deep learning, and Neptune-client for monitoring and saving the metadata. 

!pip install tensorflow_datasets
!pip install -U 'tensorflow-text==2.8.*'

!pip install neptune

Once the libraries are installed we can then import then all of them into the notebook. 

Downloading the dataset

The dataset that we will be using i.e. translating Portuguese to English, can be directly downloaded from the TensorFlow-datasets library. Once the dataset is downloaded we can then split it into training and validation datasets. 

examples, metadata = tfds.load('ted_hrlr_translate/pt_to_en', with_info=True,
train_examples, val_examples = examples['train'], examples['validation']

Creating requirements.txt

Requirements.txt is an important file because it contains all the libraries as a bundle. This allows new contributors to quickly install all the libraries or dependencies quickly in their working environment. 

To create a requirement.txt file all we need to do is to run: 

!pip freeze > requirements.txt 

This can be done right after we install and import all the files or once you are done training and performing inference to the model. Preferably the latter is practiced. 

Here is what the requirements.txt file should look like:


Keeping track of model metadata

Logging into the dashboard is quite simple. First, we create a class that stores all the hyperparameters. This approach is really handy when creating separate python modules (we will see that later). 

class config():
 BUFFER_SIZE = 20000
 TRAIN_LOSS = 'train_loss'
 TRAIN_ACCURACY = 'train_accuracy'
 BETA_1 = 0.9
 BETA_2 = 0.98
 EPSILON = 1e-9
 D_MODEL = 128
 DFF = 512
 DROP_OUT = 0.1

Then we can create a dictionary that stores all the hyperparameters. 

params = {
   'BATCH_SIZE' : config.BATCH_SIZE,
   "MAX_TOKENS" : config.MAX_TOKENS,
   "MAX_EPOCHS" : config.MAX_EPOCHS,
   "TRAIN_LOSS" : config.TRAIN_LOSS,
   "BETA_1" : config.BETA_1,
   "BETA_2" : config.BETA_2,
   "EPSILON" : config.EPSILON,
   "NUM_LAYER" : config.NUM_LAYER,
   "D_MODEL" : config.D_MODEL,
   "DFF" : config.DFF,
   "NUM_HEAD" : config.NUM_HEAD,
   "DROP_OUT" : config.DROP_OUT,

Once the dictionary is created we can then initialize the Neptune client using the API token and pass the parameters as a dictionary. 

run = neptune.init_run(

run["parameters"] = params

Once executed this is how it will look in the dashboard. 

Loging into Neptune
Parameters logged in | Source

As you can see all the hyperparameters are logged. 

Model training

Once all the components of the model like encoder, decoder, self-attention mechanism et cetera are ready, the model can be trained. But once again we must make sure that we integrate Neptune-client to monitor the model during the training to see how it performs. 

To do that we just need to define accuracy and loss function and pass them into the training loop. 

train_loss = tf.keras.metrics.Mean(name=config.TRAIN_LOSS)
train_accuracy = tf.keras.metrics.Mean(name=config.TRAIN_ACCURACY)

In the training loop we will use the same method as we did earlier to log the accuracy and loss.

run['Training Accuracy'].log(train_accuracy.result())
run['Training Loss'].log(train_loss.result())

Let’s integrate them in the training loop. 

for epoch in range(config.MAX_EPOCHS):
 start = time.time()


 # inp -> portuguese, tar -> english
 for (batch, (inp, tar)) in enumerate(train_batches):
   train_step(inp, tar)
   run['Training Accuracy'].log(train_accuracy.result())
   run['Training Loss'].log(train_loss.result())

   if batch % 50 == 0:
     print(f'Epoch {epoch + 1} Batch {batch} Loss {train_loss.result():.4f} Accuracy {train_accuracy.result():.4f}')

 if (epoch + 1) % 5 == 0:
   ckpt_save_path =
   print(f'Saving checkpoint for epoch {epoch+1} at {ckpt_save_path}')

 print(f'Epoch {epoch + 1} Loss {train_loss.result():.4f} Accuracy {train_accuracy.result():.4f}')

 print(f'Time taken for 1 epoch: {time.time() - start:.2f} secsn')


This is what the dashboard will look like during training. 

Training in Neptune
Training accuracy and loss logged in | Source

One key point to remember is to stop the run after the experiment is completed or the training loop is full executed. 

One good thing about Neptune-client API is that you can log almost anything. 

Validation and testing the model

Once the training is done we can then proceed to do an inference test on the model before creating an app. While creating a class object for inference you must keep in mind that all the preprocessing step must be included because this same class will be used in the to create endpoints. 

Here is an example:

class Translator(tf.Module):
 def __init__(self, tokenizers, transformer):
   self.tokenizers = tokenizers
   self.transformer = transformer

 def __call__(self, sentence, max_length=config.MAX_TOKENS):
   # input sentence is portuguese, hence adding the start and end token
   assert isinstance(sentence, tf.Tensor)
   if len(sentence.shape) == 0:
     sentence = sentence[tf.newaxis]

   sentence =

   encoder_input = sentence

   # As the output language is english, initialize the output with the
   # english start token.
   start_end = self.tokenizers.en.tokenize([''])[0]
   start = start_end[0][tf.newaxis]
   end = start_end[1][tf.newaxis]

   # `tf.TensorArray` is required here (instead of a python list) so that the
   # dynamic-loop can be traced by `tf.function`.
   output_array = tf.TensorArray(dtype=tf.int64, size=0, dynamic_size=True)
   output_array = output_array.write(0, start)

   for i in tf.range(max_length):
     output = tf.transpose(output_array.stack())
     predictions, _ = self.transformer([encoder_input, output], training=False)

     # select the last token from the seq_len dimension
     predictions = predictions[:, -1:, :]  # (batch_size, 1, vocab_size)

     predicted_id = tf.argmax(predictions, axis=-1)

     # concatentate the predicted_id to the output which is given to the decoder
     # as its input.
     output_array = output_array.write(i+1, predicted_id[0])

     if predicted_id == end:

   output = tf.transpose(output_array.stack())
   # output.shape (1, tokens)
   text = tokenizers.en.detokenize(output)[0]  # shape: ()

   tokens = tokenizers.en.lookup(output)[0]

   # `tf.function` prevents us from using the attention_weights that were
   # calculated on the last iteration of the loop. So recalculate them outside
   # the loop.
   _, attention_weights = self.transformer([encoder_input, output[:,:-1]], training=False)

   return text, tokens, attention_weights

As you can see the steps required for preprocessing and prediction are included in the same class object. Now we test how our model performs on unseen data. 

def print_translation(sentence, tokens, ground_truth):
 print(f'{"Input:":15s}: {sentence}')
 print(f'{"Prediction":15s}: {tokens.numpy().decode("utf-8")}')
 print(f'{"Ground truth":15s}: {ground_truth}')

sentence = 'este Γ© um problema que temos que resolver.'
ground_truth = 'this is a problem we have to solve .'

translator = Translator(tokenizers, transformer)
translated_text, translated_tokens, attention_weights = translator(
print_translation(sentence, translated_text, ground_truth)


Input:         : este Γ© um problema que temos que resolver.
Prediction     : this is a problem that we have to solve .
Ground truth   : this is a problem we have to solve .

Downloading the metadata and notebook

Once the training and inference are completed we can then move ahead and download the metadata and the notebook itself from the Google Colab into our local directory followed by creating separate python modules for each of the class objects. 

For example, you can see that all the directory has been completely filled with their respective files and metadata. 

β”œβ”€β”€ metadata
β”‚   β”œβ”€β”€ checkpoints
β”‚   β”‚   └── train
β”‚   β”‚       β”œβ”€β”€ checkpoint
β”‚   β”‚       β”œβ”€β”€
β”‚   β”‚       └── ckpt-1.index
β”‚   β”œβ”€β”€ ted_hrlr_translate_pt_en_converter
β”‚   β”‚   β”œβ”€β”€ assets
β”‚   β”‚   β”‚   β”œβ”€β”€ en_vocab.txt
β”‚   β”‚   β”‚   └── pt_vocab.txt
β”‚   β”‚   β”œβ”€β”€ saved_model.pb
β”‚   β”‚   └── variables
β”‚   β”‚       β”œβ”€β”€
β”‚   β”‚       └── variables.index
β”‚   β”œβ”€β”€
β”‚   └── translator
β”‚       β”œβ”€β”€ assets
β”‚       β”‚   β”œβ”€β”€ en_vocab.txt
β”‚       β”‚   └── pt_vocab.txt
β”‚       β”œβ”€β”€ saved_model.pb
β”‚       └── variables
β”‚           β”œβ”€β”€
β”‚           └── variables.index
β”œβ”€β”€ notebook
β”œβ”€β”€ requirements.txt
└── source

In the example above the source directory consists of python modules that contain functions and class objects taken directly from the notebook.

The next step is to create an that will leverage the flask API to serve the model. To serve the model we need to:

  • 1 Import all the functions from the preprocessing and transformer modules.
  • 2 Load the weights which are saved in the translator directory.
  • 3 Define an endpoint for getting predictions.
import flask
from flask import Flask
import logging
import time

import numpy as np
import matplotlib.pyplot as plt

import tensorflow_datasets as tfds
import tensorflow as tf
import tensorflow_text

import source.config as config
from source.preprocessing import *
from source.transformer import *

app = Flask(__name__)

def predict():
   sentence = request.args.get("sentence")
   response = translator(sentence).numpy()
   return flask.jsonify(response)

if __name__ == "__main__":
   translator = tf.saved_model.load('../translator')"", port="9999")

MLOps pipeline for machine translation: operations (CI/CD)

Once the codebase is ready we can then move ahead with our third and final phase i.e. operations, this is where we will implement continuous deployment and continuous integration. This is where things get a bit tricky. But fret not, I have divided the whole operations into different sections so that each and every module will be clear and easy to follow. These are the steps that we will follow:

  • 1 Creating a Github repo.
  • 2 Creating an image using a Docker file.
  • 3 Pushing the image to Google Cloud Build.
  • 4 Deploying.

See also

4 Ways Machine Learning Teams Use CI/CD in Production

Creating a GitHub repo

The first step is to push all your code from the local directory to your GitHub account. This will help us to connect the repo directory to the Google Cloud Development. And from there we can create Kubernetes pods and deploy the app in the same.  


The dockerfile enables us to create a container which is a type of software that wraps up an application including its libraries and dependencies. It also creates a static environment which enables the application to run in any environment. 

Docker on the other hand is software that builds and manages the container. And we can create as many containers as we want. 

In order to create a container for our, we need to create a Dockerfile. This file i.e. image must store all the instructions that need to be followed to build the container. Some of the general instructions are:

  • 1 Installing the programming language, in our case python.
  • 2 Creating environment.
  • 3 Installing the libraries and dependencies.
  • 4 Copying the models, API, and other utility files for proper execution.

In the example below you will see how I have structured the Docker configuration. Quite simple and minimalist. 

FROM python:3.7-slim
RUN apt-get update

COPY . ./

RUN ls -la $APP_HOME/

RUN pip install -r requirements.txt

CMD ["python3","" ]

Google Cloud development

After configuring the Dockerfile we will then use Google Cloud Development or GCD to automate the CI/CD pipeline. Of course, you can use any other service like Azure, AWS, and so on but I found that GCD is simple and easy to use. I will explain this section in detail so that it will be easy for you to grasp the full concept and understand the process. 

The first step will be to log in to your GCD account, if you are a new user then you will get a free $300 credit that you can use within 90 days. 

Google Cloud development: Cloud Shell

Once you log into your GCD, create a new project. After that click on the Cloud Shell icon on the top right corner of your screen to active Cloud Shell. 

One point to note is that for the most part, we will be using the build-in Cloud Shell to execute the processes. The first of which is to clone the repository from Github. You can use the same code in the given link to follow along. 

Google Cloud development: Cloud Shell
Google Cloud development: Cloud Shell | Source: Author

Once the repo has been cloned cd into it. 

Google Cloud development: Kubernetes

Now we will enable, set up, and launch the Kubernetes engine. The Kubernetes engine will allow us to manage our dockerized application. To configure Kubernetes we need to follow 4 steps:

  • 1 Create a deployment.yaml file.
  • 2 Create a service.yaml file.
  • 3 Enable Kubernetes engine in GCD.
  • 4 Launch the Kubernetes engine through Cloud Shell.

Create a deployment.yaml

The purpose of deployment.yaml is to configure the deployment settings. It consists of two parts:

  • API version and the kind of operation
apiVersion: apps/v1
kind: Deployment
  • Metadata 
 name: translation

 replicas: 2 #no of pods
     app: translation-app #app name
       app: translation-app #app name
     - name: translation-app
       - containerPort: 9999

Here’s what’s in the configuration that I have specified in metadata: 

  • The number of replicas or pods that I will be using, in this case, it is 2.  
  • The container location. The location can also be broken down into four steps:
    • URL: β€œ”. This is the container address which is common.  
    • Project ID: β€œtensor-machine-translation”
    • App name: translation. You can give any name but it has to be the same across all configuration files. 
    • Version: v1. This is a tag. 

Create a service.yaml

The service.yaml exposes the whole application to a network. It is like deployment.yaml but the main difference comes in the kind of operation that we want to implement and it also consists of two parts:

  • API version and the kind of operation
apiVersion: v1
kind: Service
  • Metadata
 name: machinetranslation
 type: LoadBalancer
   app: machinetranslation
 - port: 80
   targetPort: 9999

After configuring deployment and service files let’s move to Kubernetes. 

Enable Kubernetes engine in GCD

Once our configuration files are ready we can then go ahead and enable the Kubernetes API. It is an open-source system that is used to manage Dockerized applications. To enable Kubernetes just type β€˜GKE’ in the GCD search bar. You will be navigated to the Kubernetes Engine API page once there just click on β€˜Enable”. 

Building MLOPS pipeline with Kubernets
Building MLOPS pipeline with Kubernets | Source: Author

After enabling the API you need to create the Kubernetes cluster. There are two ways in which you can create the cluster:

  • 1 Simply click the β€˜Create’ button on the screen.
  • 2 Use Google-Cloud shell.
Kubernets clusters
Kubernets clusters | Source: Author

In our example, we will be using the Google-Cloud shell because you can be more specific about what type of cluster you want. 

Launching the Kubernetes engine

To launch the Kubernetes engine write the following codes in your cloud shell. 

!gcloud config set project tensor-machine-translation
!gcloud config set compute/zone us-central1
!gcloud container clusters create tensorflow-machine-translation --num-nodes=2
Launching the Kubernetes engine
Launching the Kubernetes engine | Source: Author

On completion you can check the K8 cluster on your dashboard. 

Launching the Kubernetes engine
Launching the Kubernetes engine | Source: Author

Now, that our Kubernetes configuration files and the engine is ready we can then move to configure our clouldbuild.yaml file and then launch the whole app. 

Google Cloud development: Cloud Build 

The cloudbuild.yaml file sync all the processes together. It is simple to understand. The configuration file usually contains the following steps:

  • 1 Building a container image from the Dockerfile.
  • 2 Pushing the container image to Google Cloud Registry (GCR).
  • 3 Configuring the entry point.
  • 4 Deploy the whole application in the Kubernetes engine.
- name: ''
 args: ['build', '-t', '', '.']
 timeout: 180s
- name: ''
 args: ['push', '']
- name: ''
 entrypoint: "bash"
 - "-c"
 - |
   echo "Docker Container Built"
   ls -la
   ls -al metadata/
- name: ""
 - run
 - --filename=kube/
 - --location=us-west1-b
 - --cluster=tensor-machine-translation

After configuring the cloudbuild.yaml file you can go back to the Google-Cloud Shell and run the following command:

!gcloud builds submit --config cloudbuild.yaml
Google Cloud development: Cloudbuild
Google Cloud development: Cloud Build | Source: Author

Once it is deployed you will get the link to the application. 

GitHub Action

Now then, we have done all the important processes and our application is up and running. But now we will see how we can create triggers using Github Actions this will enable us to automatically update the build whenever we modify and push the code in the Github repo. Let’s see how we can do that. 

Go to the Github marketplace and search Google Cloud Build.

Github marketplace
Github marketplace | Source: Author

Click on Google Cloud Build. 

Selecting Google Cloud build
Selecting Google Cloud Build | Source: Author

Click on Set up a plan.

Setting up a plan with Google Cloud build
Setting up a plan with Google Cloud Build | Source: Author
Setting up a plan with Google Cloud build
Setting up a plan with Google Cloud Build | Source: Author

Click on configure.

Google Cloud Build configuration
Google Cloud Build configuration | Source: Author

Select the repository.

Selecting Google Cloud Build repository
Selecting Google Cloud Build repository | Source: Author

Select the project in our case it is Tensor-machine-translation.

Selecting "Tensor-machine-translation"
Selecting “Tensor-machine-translation” | Source: Author

Click on Create a trigger.

Creating a trigger
Creating a trigger | Source: Author

Provide a β€œname” for the trigger and leave all the settings as it is. 

Naming the trigger
Naming the trigger | Source: Author

Click on create

Creating a trigger
Creating a trigger | Source: Author

Once created you will be directed to the following page. 

Trigger's view in Google Cloud Build
Trigger’s view in Google Cloud Build | Source: Author

Now, the interesting part is whenever you make any changes to the Github repo the Cloud Build will automatically detect it and make changes in the deployed build. 

Monitoritoring model in production

So far we have seen how can we build an automatic MLOps pipeline, now we will explore the last step which is how to monitor the app that is deployed on the cloud. This process is called cloud deployment. GCD provides us with a dashboard that enables us to monitor the app from any device: phones, tablets, or computer. 

Below is the image of the Google cloud console as given in the iOS. 

Google cloud console
Google Cloud console | Source: Author

In order to monitor the app, you can simply type monitor in the GCD search bar and it navigates you to the respective page. 

Monitoring in Google Cloud
Monitoring in Google Cloud | Source: Author

You will see a bunch of options to choose from. I recommend you choose an overview first and then explore all the possible options. For instance, if you select the GKE then you will see all of the information regarding Kubernetes: Pods, Nodes, Clusters et cetera for that respective project. 

Monitoring in Google Cloud
Monitoring in Google Cloud | Source: Author

Similarly, you can create an alert policy as well. And many other things. 

Monitoring in Google Cloud
Monitoring in Google Cloud | Source: Author

One of the important concepts that you must be familiar with during the monitoring phase is model drift. It occurs when the predictive capabilities of the model degrade over time. 

Model drift can be classified into two types:

Data drift

Data drift is generally a variation in the production data from the training/testing data. It usually occurs when there is a time-lag between the training and deployment. One such example would be any time-series data like COVID-19 data or stock market data. In such cases, factors such as variability can be introduced every hour, in other words, the data keeps evolving over time. This evolution of the data can yield errors in prediction in the production phase. 

In our case, we don’t have to worry since language data remain stable most of the part and thus we can use the same model for a longer period of time. 

You can find more about data drift detection in the following blog: Why data drift detection is important and how do you automate it in 5 simple steps

Concept drift

Concept drift occurs when the predictive variable changes over time or in other words the statistical properties of the output change over time. In concept drift, the model is unable to leverage the patterns that it extracted during the training. For example, let’s say that spam message has evolved over time since it was defined. Now the model is will find it difficult to detect spam messages using the patterns that it extracted 3 weeks ago during the training. In order to tackle them the model parameters has to be adjusted. 

Concept drift can happen if the business model of the company is changing or the dataset used does not represent the entire population. 

How to monitor a model?

  • Methods like sequential analysis, and time distribution methods can help in identifying data drift. 
  • On the other hand, continuously monitoring incoming data and observing their statistical properties can help to overcome concept drift. 
  • Apart from that techniques like ADWIN, chi-squared test, histogram intersection, kolmogorov smirnov statistic can also be helpful. 

How to overcome model drift?

Model Retraining is the best way to solve model drifting including data drift, concept drift and model degradation problems. You can refer to these strategies for retraining:

  • Scheduled retraining – periodically: weekly, monthly, et cetera.
  • Data/event driven – whenever the new data is available.
  • Model/ metric driven – whenever the accuracy is lower than a threshold.


In this tutorial, we explored how various technologies can be used to seamlessly deploy a machine translation app. This is the power of the MLOps. Of course, you can add a lot of things to the file like UI/UX and create a whole new webpage altogether. 

To summarise, we saw:

  • 1 How to design an experiment?
  • 2 Training and testing the model.
  • 3 Saving weights and metadata of the model.
  • 4 Structuring the directory.
  • 5 Separating functions and creating python modules.
  • 6 Creating a Flask app.
  • 7 Dockerizing the app.
  • 8 Creating and configuring Google Kubernetes engine.
  • 9 Deploying the app in Google Kubernetes engine.
  • 10 Finally, automating the whole process using Github Actions.

If you are creating any project that leverages machine learning or deep learning algorithm then instead of stopping at model training you should go beyond and create an MLOps pipeline. I hope that this tutorial has helped you to understand how you can implement MLOps in your own project. 

Please make sure to give it a try because it is the best way to deploy your ML and DL apps.


You can find the full repository here. Also, it is worth mentioning that the Notebook for machine translation was taken from the official website


  1. Neural machine translation with attention
  2. Machine Learning Operations
  3. How to Build MLOps Pipelines with GitHub Actions [Step by Step Guide]
  4. Using GitHub Actions for MLOps
  5. Deploy Machine Learning Pipeline on Google Kubernetes Engine
  6. Why data drift detection is important and how do you automate it in 5 simple steps

Was the article useful?

Thank you for your feedback!