MLOps Blog

Kubernetes vs Docker: What You Should Know as a Machine Learning Engineer

17 min
25th August, 2023

Earlier this year (2020), I decided to move fully into the engineering part of machine learning from Data Science. I wanted to experience a more efficient and scalable way of deploying machine learning models, decoupling my models from my app, and versioning them properly.

Conventionally, what I do mostly after training my model is to import the model in my flask app and then perform inference whenever the API endpoint for the model is being called. Well, I use docker when trying to package my app and deploy to google cloud or any other platform, but there is more to this (I think).


Machine Learning Model Management in 2021 and Beyond – Everything That You Need to Know

I started diving deep into TensorFlow serving. TensorFlow extended, and Kubeflow (Kubernetes made easier for machine learning projects). Along the line, I discovered I needed to know more (maybe just a little) about Kubernetes needed for deploying, orchestrating, and scaling machine learning apps.

The journey and curiosity led to this article. Hence, if you are just like me, ready to up your game and add one of the tools to become a Unicorn data scientist, as described by Elle O’Brien in this article, then this article is for you.

“…so hard, the rare data scientist who can also develop quality software and play engineer is called a unicorn!”- Elle O’Brien

In this article, we will also follow a project-based method, which will make it possible for you to just port the ideas and code shown directly into your machine learning project.

In summary, we will see how to eradicate some difficulties which arise when following the conventional methods, such as:

  • Not being able to separate model serving separately from the app
  • Difficulties in rolling back update
  • Difficulties in pushing out a new update easily
  • Difficulties in scaling the app when user traffic increases
  • Difficulties in versioning your model and app.

To eradicate the above-listed difficulties, below are some of the objectives we need to achieve:

  • Integrating the TensorFlow serving model in a web app
  • Managing web app and Tensorflow serving using docker-compose
  • Building and pushing docker image to Docker-hub
  • Introduce Kubernetes
  • Serving TensorFlow web app with Kubernetes.


  • Training models in Tensorflow
  • Docker, at least basic level
  • TensorFlow serving (if not, here is a quick introduction to TensorFlow serving)

NOTE: You can obtain the code used for the article here.

Let’s get started.

Building ML models

In this article section, we will be creating a simple ML model, which we will use to solidify the concept to be introduced.

The model is a AND logic gate model, and since the main focus of the article is not on how to create a model or neither how to train it, this part will be less explained.

Let’s create a file named and input the code below:

Import TensorFlow as tf

label = tf.constant([1,0,0,1,0,0,1,0])

model = tf.keras.Sequential(


              metrics=['accuracy']),label,batch_size=2, epochs=5)

After creating and training the model will need to save the model in a way it can be servable using TensorFlow serving, hence we won’t be saving just the model weight.

Import time

Save_time = int(time.time()) #1
Path  = f’./saved_models/{saved_time}’ #2, save_format=’tf’)#3

In the code above, we infuse the idea of versioning using the time module. The timestamp at the moment of saving the model is obtained and used to create an inner folder in saved_models/ and then the model is saved into the folder., saved_format=’tf’) create some files which are need for Tensorflow serving.

Now our model is ready and servable.

Docker essentials

 In this section, we will discuss the most essential docker API needed in taking our machine learning project to production and also see how to orchestrate our app with docker-compose.

Incorporating web app with Tensorflow serving image

This section shows how to infuse TensorFlow serving into a flask web app. It shows how to call a TensorFlow serving endpoint API in Flask.

First, let’s serve our AND logic gate model using Tensorflow serving docker image. The first step is to pull the TensorFlow serving image from docker-hub.

NOTE: There is an article on which explains Tensorflow serving in full detail.

docker pull tensorflow/serving

Now let’s run the tensorflow/serving image:

docker run -p 8501:8501 --mount type=bind,source=path/to/directory/saved_models,target=/saved_models/1602624873 -e MODEL_NAME=1602624873 -e MODEL_BASE_PATH=/saved_models -t tensorflow/serving

The above command starts the tensorflow/serving image by first mounting the model from our local directory to a file path in the Docker container using the command:

---mount type=bind,source=path/to/directory/saved_models,target=/saved_models/1602624873

Hence the source path in which the saved_models is created for the AND logic gate model, is bound to a target path of the same name inside the docker container.

On running the image, two endpoints are created, as shown in the image below. One of the endpoints created is for GRPC, but we will be focusing on the second endpoint, which is the REST API endpoint.

Docker Kubernetes endpoint

In order to access the REST API endpoint outside the docker environment, we expose the port by using -p 8501:8501 in the above command.

Let’s test the endpoint to see how it works. We will be using POSTMAN to test the REST API first.

Docker Kubernetes rest API

The input is passed to the model to serve via the rest API. In the POSTMAN, we specify the JSON input using this format “{“instances”: [[1,0]]}” and we obtain a response in JSON format also, which is the model output. 

This shows that our served model is working properly. It’s now time to integrate the Tensorflow serving API endpoint with our web app.

But before we go into that, don’t forget the image is still running in case we want to stop the image from running, here is the code to do that:

docker ps

The command above lists the running images.


Copy the container id for the image you want to stop:

docker stop e74fe1336768

 The command stops the tensorflow/serving image from running.

Let’s now create the Web interface and server to render the page since our model is now servable with TensorFlow serving.

The model web interface is going to be a form with two inputs ad a submit button, just as shown in the image below:

gate test

Here is the code for the UI at index.html:

        <link rel="stylesheet" type="text/css" href="../static/css/bootstrap-theme.min.css" />
        <link rel="stylesheet" type="text/css" href="../static/css/bootstrap.min.css" />
        <link rel="stylesheet" type="text/css" href="../static/css/responsive.bootstrap.min.css" />
        <link rel="stylesheet" type="text/css" href="../static/css/style.css" />
        <div class="container">
            {%include 'includes/_messages.html' %}
            <div class="row">
                <h1 class="text-center">AND GATE TEST</h1>

                    <form action="{{url_for('home')}}" method="post">
                        <div class="form-row">
                            <div class="col">
                              <input type="text" class="form-control" name="inp1" placeholder="input 1">
                            <div class="col">
                              <input type="text" class="form-control" name="inp2" placeholder="input 2">
                            <div class="col">
                                <button type="submit" class="btn btn-primary ml-4">Submit</button>


Now that we’ve created the User interface let’s create the flask app to render the User interface and also handle the request to the TensorFlow serving API.

Create a file name and input the code below:

from flask import Flask, render_template, flash, request
import requests
from os import environ

app = Flask(__name__)

The code above imports the necessary modules such as Flask, request, and the os module. Also, the code above initializes the flask app.

The next line of code to be added to the is the code that manages and makes a call to the TensorFlow serving API.

def tfserving_request(req_input, model_name): #1
    url = f"http://localhost:8501/v1/models/{model_name}:predict" #2
    input_request = {"instances": [req_input]} #3
    response =, json=input_request) #4
    return response

Based on comment numbering:

  • #1 the tfserving_request takes in two inputs: a request input named re_input and the name of the model
  • #2 Defines the API endpoint base
  • #3 structure the input to a format accepted by the TensorFlow serving API endpoint
  • #4 make a request to the API endpoint by passing in the request input 

The next step is to add the route which will be used for rendering the Web interface in the browser:

@app.route("/home",methods=["GET","POST"]) #1
def home():

    if request.method == "POST": #2

        inp1 = int(request.form["inp1"]) #3
        inp2 = int(request.form["inp2"])

        response = tfserving_request([inp1,inp2], "1602624873") #4

        resp = response.json() #5
        flash(f"obtained {inp1} and {inp2} have a prediction of {resp['predictions']}", 'success') #6

    return render_template("index.html") #7
  • #1 we define the route to load render the HTML has `/home`, and we also define the request method to be accepted by the route as `GET` and `POST`
  • #2 Check if the request made is a POST request
  • #3 If post request, we obtain the form input from the HTML using `request.form[“inp1”]` remember that `inp1` is the input name.
  • #4  make a call to the `tfserving_request` function and pass in the form input alongside the model name into the function.
  • #5 the response return from the TensorFlow serving is converted to JSON
  • #6 The prediction is obtained using `resp[‘predictions’], which contains the probability score. The `flash` function is used to print the result as a message 
  • #7 renders the UI from `index.html`

Finally, let’s add the line of code to enable the starting of the flask server:

if __name__ == "__main__":, host='', port=int(environ.get('PORT', 8080)))

When we run the script, the code above makes the host the server at port 8080.

Let’s run the using:

python run

This will start the server like this:


Now that the server is started, we can view the web app via;

If we visit the link, we will see the web interface:

gate test 2

If we type in input into the text box of the rendered page and click on the “submit” button, we get an error page if the TensorFlow serving docker image is down. Hence we need to start the TensorFlow serving image.

docker run -p 8501:8501 --mount type=bind,source=path/to/directory/saved_models,target=/saved_models/1602624873 -e MODEL_NAME=1602624873 -e MODEL_BASE_PATH=/saved_models -t tensorflow/serving

Once we run the command above, we can go back to the web interface and type in our input. We will input `1` and `0` to the input field and then press the submit button, we get the following response as shown in the image below:

gate test

The response is displayed on top of the page showing the input received by the server and the prediction output.

Since this is working perfectly, let’s create a docker image to manage the flask app for us. In the same directory containing the flask app, let’s create a Docker file:

FROM python:3.8-slim //1

ADD requirements.txt requirements.txt //3
RUN pip install -r requirements.txt //4
ENV APP_HOME /app //5

COPY . ./    //7

CMD ["python",""] //8

Code explanation based on the comment numbering:

  1. Obtain a light python image of version 3.8-slim
  2. Prevent the app from crashing without printing a relevant message
  3. Add requirements.txt file containing a list of packages to install to a virtual file of the same name
  4. Install all the packages in the requirements.txt
  5. Create a directory in the docker and assign it to environmental variables
  6. Specify the working directory base on the directory created in 5
  7. Copy all the files in the flask app directory into the working directory
  8. Command to run the flask app after the image is created

Before we run the docker file, let’s create the requirements.txt and one way of doing that easily is via the command below:

Pip freeze > requirements.txt

This creates the requirements.txt and adds the base of the necessary package on the package install in the python environment

But for this project, the only package needed is the Flask and the requests package.

Once the requirements.txt is created, let’s create an image for the flask app:

docker build -t flaskweb .

If we run the above code, we should obtain the following output if the image is successfully created.

docker output

Now that the image is successfully created, let’s run the image:

docker run -p 8080:8080 -e PORT=8080  -t flaskweb

This starts the flask server as described before.

flask server

Let’s visit the same link as mentioned before in order to view the web interface. Type in the previous input and let’s see what it’s being outputted:

web interface

An error is being outputted, this error is due to the fact that the flaskweb can communicate to an outside host. Hence this brought us to the idea of Docker-compose in order to solve the issue.

Using Docker compose to manage services

Docker-compose gives us the opportunity to create the two docker services (TensorFlow serving and flaskweb) with a single file and command and also gives us the ability to manage both services.

NOTE: To install docker-compose on different operating systems, visit this link.

To enable docker-compose, let the flask app directory and the saved_models folder created for our model be in the same directory. Then in the flask app directory, create a file named and then copy all the content from the Dockerfile into

After copying, the` will look like this:

FROM python:3.8-slim

ADD requirements.txt requirements.txt
RUN pip install -r requirements.txt
COPY . ./

CMD ["python",""]

Make sure your file system for the directory is like this:


Once this is created, let’s create a YAML file named docker-compose.yml to define TensorFlow serving and flask web services.

version: "3.8"
    image: tensorflow/serving
      - ./saved_models:/saved_models/1602624873
      - '8501:8501'
        MODEL_NAME: 1602624873
        MODEL_BASE_PATH: /saved_models/

    image: flaskweb
        context: ./flask_app
      - '8080:8080'

In the docker-compose.yml we specify the docker-compose version using version , and we also define the type of services inside the service object. We name the two services server and web.

In the server service object, we specify the image to pull from. We then define the volumes, by copying the models from ./saved_models into a directory in the Docker container named /saved_models/1602624873. And we then specify the port for the service just as we did when starting the normal docker image. Also, the environment variable needed is specified inside the environment object.

As you can see, the same process is similar to the way we run our docker image, as described in the previous section.

Also for the web service object, we specify the name of the image. And we then create a build object to which we define how the image should be built. We define the context by pointing to the flask_app directory.

We tell the docker-compose to use the dockerfile present in the directory named We define the port.

To start the service, we run the following command in the directory containing the docker-compose.yml.

docker-compose up

This starts both services as shown below:


The TensorFlow serving and the flaskweb service are running, if we visit the URL http:localhost:8080/home , it will load the web interface, but if we type in our input and click submit, we will still obtain the same error.

To resolve the error, instead of using localhost in the TensorFlow serving API, endpoint define in the tfserving_request function in, we replace it with the name of the TensorFlow serving service named server:

# in

def tfserving_request(req_input, model_name):
    url = f"http://server:8501/v1/models/{model_name}:predict"
    input_request = {"instances": [req_input]}
    response =, json=input_request)
    return response

To see the changes, we need to stop the services running by using:

docker-compose stop

Once both services are stopped, we start it again:

docker -compose up

Once this is done, we can go ahead to the web interface and type in the input and click submit the app is working perfectly.

flask web

To learn more about docker-compose, kindly visit this link.

Building and pushing docker images to the docker hub

In order to integrate our docker images with Kubernetes (which will be discussed in the next section), we need to push our images to the docker hub.

To build and push our images to docker-hub, first, visit the docker-hub and then create an account. After the account has been created. We need to login to the docker hub from our system terminal.

Just for security purposes, store your docker password in a text file, and give it any name, I will name mine `my_password.txt`, then run the command below:

$ cat ~/my_password.txt | docker login --username steveoni --password-stdin

In the command above, I use `~/` because the `my_password.txt`, hence the docker login obtains the password from the .txt using stdin. 

If the login is successful, you will see a message showing, login successful.

Let’s create an image for our flask app and then push it to the docker hub:

$ docker build -t steveoni/tfweb:1.0

This will create an image containing a tag 1.1 which is also a versioning of the image. The name `steveoni/tfweb` specify your `username/image name` .

Once the image is ready, we can now push it to the Docker hub:

$ docker push steveoni/tfweb:1.0

This pushes the image to the docker hub, as seen below:

docker hub

We are done with the flask app part, now let’s do that of TensorFlow serving. Remember, we did not create a docker file for TensorFlow serving, but we are making use of the tensorflow/serving image.

We will need to build on top of the tensorflow/serving image. Let’s create a docker file in the same directory the /saved_models is located.

From tensorflow/serving

ENV APP_HOME /saved_models/1602624873
COPY ./saved_models ./

The approach used in the docker file above is the same as that used in creating the previous docker images.

Let’s build the image for local testing before we build and push it to the docker hub.

$ docker build -t tfs .

Let’s test to see if the image is working properly:

$docker run -p 8501:8501 -e MODEL_NAME=1602624873 -e MODEL_BASE_PATH=/saved_models -t tfs

This starts the TensorFlow serving server.

Now we can go ahead to create the image officially and then push:

$ docker build -t steveoni/tfupdate:1.1
$ docker push steveoni/tfupdate:1.1

The image below shows that it has been successfully pushed without checking the docker-hub:

docker hub

What is Kubernetes – introduction

Why Kubernetes? Imagine you have deployed your docker app to cloud service, and everything is working fine and running properly. But after some time, your application now has thousands of users making requests per second.

Unfortunately, due to the number of users making requests per second, your app kept crashing and you can’t keep avoiding the crashing, and users keep complaining.

To solve this, you can make multiple replicas of the app and make it available all the time (in case one is down, the other can come up). Another question to ask is if all the replicas went down, how do you scale back? How do you set up the network endpoint? Who is going to check the state per time? How do you manage the replicas in a way in which they communicate with each other?

Due to the question asked above comes the need for Kubernetes to resolve the situations as listed above.

 “It’s a container orchestration platform that consists of several components and it works tirelessly to keep your servers in the state that you desire.”
– by Farhan Hasin Chowdhury.

Kubernetes Cluster

For this article, we will be running Kubernetes on our local machine instead of a cloud service. To get the Kubernetes running on our system, we need to install two sets of programs.

First, we need to install Minikube, which allows us to run a single-node Kubernetes cluster on our local computer. And then, we install the Kubernetes command-line tool called Kubectl.

To Install the two programs visit the link below:

NOTE: In this article, I will be summarising some of the ideas pertaining to the project used in this article. To get an overview and a practical knowledge of what Kubernetes is, visit this article by Farhan Hasin Chowdhury. I will be using some of his illustrations to introduce Kubernetes.

Once the installation is complete, you can test the programs out with the command below:

$ minikube version
$ kubectl version

Before we start using minikube, let’s set up a hypervisor driver for it. In this article, we will be using Docker as the hypervisor drive.

NOTE: Hypervisor is used as an abstraction layer to separate the virtual machine from the system hardware. 

The command below helps set the hypervisor for minikube:

$ minikube config set driver docker

Once this is done, we can go ahead to start minikube:

$ minikube start

In the terminal, we will see the following input after running the command above. Though it might take time to finish the whole loading process sometimes.


Before we go into using the minikube program we just started, let’s have an overview of some Kubernetes terms and the concept in general.

Kubernetes contains what we call nodes. A node can either be a virtual or physical machine assigned a particular task. A set of those machines communicating with one another over a shared network is called a cluster.

And since in this project we are making use of Minikube. We only have access to a single virtual machine that will serve as our server. Hence we call it a single-node Kubernetes cluster.

Think of it as instead of having access to multiple computers that can be used as a server, you only have access to one, which is your personal computer, which can serve as a server to host your application.

The image below shows an overview of minikube.

Source: Kubernetes Handbook

Usually, Kubernetes does contain two-component;

  • The Control Plane Components
  • The Node Plane Components

The control plane components are responsible for assigning and scheduling tasks to a node based on the available resources. They are also responsible for keeping the state of the node and also validating requests made to the node. Remember that a node is a virtual machine.

The Node plane Components are responsible for maintaining network rules on each node server. They are also responsible for maintaining, And they also provide a gateway between the control plane and each node in the cluster.

Each Node contains what we call Pods, and according to Kubernetes documentation

“ Pod are the smallest deployable units of computing that you can create and manage in Kubernetes”.

Source: Kubernetes Handbook

The image shows the overview of a cluster. The cluster above is a single-node cluster since we are using minikube. 

A pod houses our app container. Even though a Pod can contain more than one container, it is advisable to assign a pod to it. And it is also advisable to manage Pod using higher objects. This higher object has the ability to create and delete pods at any time. Hence they help manage the pod. We discuss more on these higher objects later in the section.

A Node can contain more than one pod and with each pod performing the same function. With the help of what we call Service, we can combine all these pods in a node as a single entity. And Service gives us access to define how the pod is accessed.

With this little knowledge about Kubernetes, we go ahead with the project at hand and alongside explain some other concepts needed.

Serving ML-powered web app with Kubernetes

This section shows how to use Kubernetes to orchestrate your app. It shows differents approaches of creating pods and Loadbalancer. It also introduces the concept of `Development` and `ClusterIP`

Explicit method of creating pod and load balancer

First let’s test this idea of service, pod, and the likes using the tensorflow serving image we created in the previous sections.

Remember, we’ve started the minikube, now let’s create our first pod using the tensorflow serving image.

$ kubectl run tf-kube --image=steveoni/tfupdate:1.1 --port=8501 --env=”MODEL_NAME=1602624873”  --env="MODEL_BASE_PATH=/saved_models/"

The above code is similar to the same command used in running the docker image in the previous section. tf-kube in the command is the name of the pod we are trying to create, and the environmental variable is defined using --env.

Then we obtain a message that the pod has been created. To see the list of pod that has been created we can use the get pods command as shown below:

$ kubectl get pods

This is list the pods created:


You can see the list of pods that have been created by me, some are since three days ago. I forgot to delete them. Among the list of pods, we can see the pod we just created, its STATUS is running.

To delete any of the pods we just need to run the command below:

$ kubectl delete pod-name

Our tf-kube pod is running but we can’t access it from outside the cluster. In order to access the pod outside the cluster, let’s create a service called LoadBalancer. This service helps expose a pod outside the cluster.

$ kubectl expose pod tf-kube --type=LoadBalancer --port=8501

The command above creates the LoadBalancer service named tf-kube:

kubectl 2

Once the loadBalancer service is reading we can then go-ahead to start the `tf-kube` LoadBalancer service using minikube.

$ minikube service tf-kube

The command above starts the service as shown in the image above. It also maps the `TARGET PORT` to the `URL` has shown in the image. Hence instead of visiting `http://localhost:8501` to visit out TensorFlow serving API endpoint, we will visit the URL ``. 

Let’s test the TensorFlow serving API endpoint using Postman as shown in the Tensorflow Serving section.

API endpoint

We can see that the API endpoint is working properly. Now we’ve been able to create our first pod and our first service.

To delete the service created:

$ kubectl delete service tf-kube

The approach we took to creating our first pod and service is not an ideal way of creating pods and services.  

In the next sub section , we show how to create pods in a more reproduceable and manageable way.

Declarative method of creating pod and load balancer

Let’s take a more declarative method, just as we did during the docker-compose section. This method makes it easier for us to configure our Kubernetes. And also make it easier for others to set up.

First, let’s create a YAML file called `tf-kube-pod.yml` and input the code below:

apiVersion: v1
kind: Pod
  name: tf-kube-pod
	component: server
	- name: tf-kube
  	image: steveoni/tfupdate:1.1
    	- containerPort: 8501
    	- name: MODEL_NAME
      	value: "1602624873"
    	- name: MODEL_BASE_PATH
      	value: /saved_models/

In the YAML file, we define the following

  • Defined the apiVersion which is the version of the Kubernetes API we want to use
  • Specify the kind of objects we want to create, which is Pod
  • We define the metadata in which we give the pod a name and a label tag by assigning a value server to the component property
  • We then define the spec which contains the state we desire for the pod. We define the container image to use, which is steveoni/tfupdate:1.1 and we also define the container port 8501
  • Also in the spec.containers we specify the environmental variables in env containing their name and their value.

Now let’s create a pod using the `tf-kube-pod.yml` file:

$ kubectl apply -f tf-kube-pod.yml

This creates the pod, and you can use kubectl get pods to see the pod created.

Now lead create the loadbalancer service configuration file for the pod just created. Create a file named `tf-kube-load-balancer.yml`. Input the following in the file:

apiVersion: v1
kind: Service
  name: tf-kube-load-balancer-service
  type: LoadBalancer
	- port: 8501
  	targetPort: 8501
	component: server

Same as the previous file, we specify the kind object to be service this time around and also in the spec we defined the type to be LoadBalancer. And in the spec.ports we defined the port to host the pod and the targetport which is the pod tf-kube-pod port. And in spec.selector we point the Loadbalancer to the server component, which is the label of the tf-kube-pod pod created above.

 Then we create the service from `tf-kube-load-balancer.yml`, with the command below:

$ kubectl apply -f tf-kube-load-balancer.yml

We can also check the list of services created using kubectl get services. And to start the service run:

$ minikube service tf-kube-load-balancer.yml

This starts the service and opens the web browser.

The whole process is working as expected. But don’t forget, while describing Pod at the beginning of the section, we mentioned that it is good to manage pod with higher objects, which have the capability of creating and deleting Pod. We will cover that in the next section.

Working with multi-container Applications

We’ve been able to create a Pod for our TensorFlow-serving, which is just a single container. Don’t forget that our main goal is to infuse the TensorFlow serving with a web app. 

And as we’ve seen during the docker-compose section, we were able to create a program to manage both the Tensroflow-serving service and the flask-web app service. We will be doing the same thing in this Section.

In this section, we will be using higher objects called Deployment, and we will also be introduced to ClusterIP.

Deployment: Deployment is a controller, it gives us the ability to create multiple replicas of a Pod easily, it also gives us the ability to easily roll out and roll back updates.

“In Kubernetes, controllers are control loops that watch the state of your cluster, then make or request changes where needed. Each controller tries to move the current cluster state closer to the desired state. A control loop is a non-terminating loop that regulates the state of a system.”
– Kubernetes documentation

ClusterIP: ClusterIP is another type of service, just like LoadBalancer. In opposite to the LoadBalancer service, clusterIP only exposes an application within our cluster. That is, it prevents the application from being accessed outside the cluster.

We will be using the two terms just defined in deploying our Web app in Kubernetes.

App architecture

The above image shows what the app architecture for deployment in Kubernetes will look like. We will be creating Three replicas of our Flask Web app and Tensorflow serving. 

We won’t like to expose our Tensorflow serving outside the clusters, as we did in the previous section while creating pods, hence we will be creating a ClusterIP for the Tensorflow serving.

A load balancer is created for the Flask web app since we would only like to expose the app to the users. 

To implement this architecture, let’s go ahead to create a Deployment configuration file and a Load Balancer configuration file for the Flask Web app. While creating this, we will be using the flask web app image deployed to docker hub; `steveoni/tfweb:1.0`

Let’s create a folder name `k8s`, you can name it any name though. Inside the file, we will create all configuration files needed.

In the folder, create a file name `tf-web-dev.yml`, and input the text below in the file:

apiVersion: apps/v1
kind: Deployment
  name: tfweb-dev
  replicas: 3
      component: web
        component: web
        - name: web
          image: steveoni/tfweb:1.0
            - containerPort: 8080

Like all other yml files we’ve created; 

  • we specify the `kind` of objects to be created using this file as `Deployment` object. 
  • In the `spec` we specify the numbers of `replicas` as 3. 
  • In the `spec.selector` we gave the Object a label tag `web` using the `component` property. 
  • We also define the image to be used as `steveoni/tfweb:1,0`, 
  • and also the port to be exposed is specified using the `containerPort`

We can decide to create the Deployment object immediately by using `kubectl apply -f tf-web-dev.yml` ,  But the major reason why we create the folder `k8s` or whatever the name you called the folder, is to be able to create the whole object needed for the app deployment at once, using just a command.

Hence, let’s create the LoadBalancer service for the Deployment object (our flask web app) defined above. Create a file and name it `tfweb-load-balancer-service.yml`.

apiVersion: v1
kind: Service
  name: tfweb-load-balancer-service
  type: LoadBalancer
    - port: 8080
      targetPort: 8080
    component: web

The load balancer is the same as the Load balancer created in the previous section, just that this time around it is pointing to the flask web app via the tag name `spec.selector.component:web` .

Now the flask web object is ready. Let’s then create the Tensorflow serving server. Create a file called `tf-kube-dev.yml`:

apiVersion: apps/v1
kind: Deployment
  name: tf-kube-dev
  replicas: 3
      component: server
        component: server
        - name: server
          image: steveoni/tfupdate:1.1
            - containerPort: 8501
            - name: MODEL_NAME
              value: "1602624873"
            - name: MODEL_BASE_PATH
              value: /saved_models/

The configuration above is similar to the one created for the flask web; But the label tag is set to `component:server`

Let’s create a CLusterIP service for the Tensorflow serving Object. Create a file and name it `tf-cluster-ip-service.yml`:

apiVersion: v1
kind: Service
  name: tf-cluster-ip-service
  type: ClusterIP
    - port: 8501
      targetPort: 8501
    component: server

The file above is the same as that created for the `LoadBalancer` service, just that the `spec.type` is assigned the value `ClusterIP`.

The app architecture is set and ready for deployment on Kubernetes. The command below initializes the creation of the Flask web app service (web) and The Tensorflow serving service (server) at the same time.

$ kubectl apply -f k8s

The above command will work if you are in a directory containing the `k8s` folder. But if your working directory is in `k8s` itself, use the command below:

$ kubectl apply -f .

This will create the services needed based on the files in the `k8s` directory, as shown below:


From the image, we can see that the objects and services are created

Let’s check if the pods are running:

$ kubectl get pods

Remember that we created 3 replicas for each of the Deployment objects. Hence the total number of pods running should be six, which is correct, as seen from the image above.

To see the Deployment objects:

$ kubectl get deployments

The deployment objects are properly created.

Let’s check the services created for both objects:

$ kubectl get services

The `tf-cluster-ip-service` for `ClusterIP` and `tfweb-load-balancer-service` for `LoadBalancer` is properly created.

Now that we have everything set, let’s start the `LoadBalancer` service:

$ minikube service tfweb-load-balancer-service

This will open the web browser at URL: `` to see the app, let’s go to route `/home`, which renders the web interface.

When values were inputted, and the form was submitted, the server responded with an error:


Remember, we’ve faced this type of error before while trying to make our flask web app docker communicate with the TensorFlow serving docker. And we solve this by replacing the `localhost` with `server`, which is the name of the TensorFlow serving service created by docker-compose.

To solve this, we need to replace the `server` host used in `tfserving_request` in our `` with the CLusterIP service name hosting the TensorFlow serving. Hence we replace `server` with `tf-cluster-ip-service` in

def tfserving_request(req_input, model_name):
input_request = {"instances": [req_input]}
response =, json=input_request)
return response

Once this is done, we rebuild the `tfweb` image and push it to docker-hub. Now the new image is given a version `1.2`.

$docker build -t steveoni/tfweb:1.2 .
$ docker push steveoni/tfweb:1.2

With this, we need to change the `image` we are pulling from in `tfweb-dev.yml` to `steveoni/tfweb:1.2`

Let’s delete the former deployment objects and services created:

$ kubectl delete deployments --all
$ kubectl delete services --all

We then create new Deployments objects and Services again:

$ kubectl apply -f k8s

Then we start the LoadBalacer service:

$ minikube service tfweb-load-balancer-service

The web page is loaded automatically, let’s go over to the route `/home` to test the web app:

gate test

Now the app is working properly. 

If applying this approach to any other project and your pods refuse to start, you can always see the full details of the pod by running the command below:

$ kubectl describe pods

This will give the full details of the whole pod created, but to get for a specific pod, run `kubectl get pods`. Obtain the name of the pod and then run:

$ kubectl describe pod pod-name

And to see the logs from any of the pods, we use the command below:

$ kubectl logs pod-name


In this article, we’ve been able to see how you migrate your way out of the conventional method of building machine learning apps. I believe this article has shown you how to build efficient and scalable machine learning apps.

Also, you have added to the tools for becoming a Unicorn data scientist. Well, there are some other tools and concepts to learn to build and take ML products to production efficiently, such as:

  • CI/CD: How to add unit testing to your model and use a single push to push your app to github, pass some series of tests and then push to production
  • TFX/Kubeflow: Custom tools to make the orchestration of your app and deployment using Kubernetes easier.


Was the article useful?

Thank you for your feedback!