Kubernetes vs Docker: What You Should Know as a Machine Learning Engineer
Earlier this year (2020), I decided to move fully into the engineering part of machine learning from Data Science. I wanted to experience a more efficient and scalable way of deploying machine learning models, decoupling my models from my app, and versioning them properly.
Conventionally, what I do mostly after training my model is to import the model in my flask app and then perform inference whenever the API endpoint for the model is being called. Well, I use docker when trying to package my app and deploy to google cloud or any other platform, but there is more to this (I think).
MIGHT INTEREST YOU
Machine Learning Model Management in 2021 and Beyond – Everything That You Need to Know
I started diving deep into TensorFlow serving. TensorFlow extended, and Kubeflow (Kubernetes made easier for machine learning projects). Along the line, I discovered I needed to know more (maybe just a little) about Kubernetes needed for deploying, orchestrating, and scaling machine learning apps.
The journey and curiosity led to this article. Hence, if you are just like me, ready to up your game and add one of the tools to become a Unicorn data scientist, as described by Elle O’Brien in this article, then this article is for you.
“…so hard, the rare data scientist who can also develop quality software and play engineer is called a unicorn!”- Elle O’Brien
In this article, we will also follow a project-based method, which will make it possible for you to just port the ideas and code shown directly into your machine learning project.
In summary, we will see how to eradicate some difficulties which arise when following the conventional methods, such as:
- Not being able to separate model serving separately from the app
- Difficulties in rolling back update
- Difficulties in pushing out a new update easily
- Difficulties in scaling the app when user traffic increases
- Difficulties in versioning your model and app.
To eradicate the above-listed difficulties, below are some of the objectives we need to achieve:
- Integrating the TensorFlow serving model in a web app
- Managing web app and Tensorflow serving using docker-compose
- Building and pushing docker image to Docker-hub
- Introduce Kubernetes
- Serving TensorFlow web app with Kubernetes.
Pre-requisites
- Training models in Tensorflow
- Docker, at least basic level
- TensorFlow serving (if not, here is a quick introduction to TensorFlow serving)
NOTE: You can obtain the code used for the article here.
Let’s get started.
Building ML models
In this article section, we will be creating a simple ML model, which we will use to solidify the concept to be introduced.
The model is a AND logic gate model, and since the main focus of the article is not on how to create a model or neither how to train it, this part will be less explained.
Let’s create a file named model.py
and input the code below:
Import TensorFlow as tf
data=tf.constant([[1,1],[0,0],[2,0],[2,1],[3,0],[0,4],[5,6],[0,10]])
label = tf.constant([1,0,0,1,0,0,1,0])
model = tf.keras.Sequential(
[
tf.keras.Input(shape=(2,)),
tf.keras.layers.Dense(20,activation="relu"),
tf.keras.layers.Dense(2,activation="softmax")
]
)
print(model.summary())
model.compile(optimizer='adam',
loss=tf.keras.losses.BinaryCrossentropy(),
metrics=['accuracy'])
model.fit(data,label,batch_size=2, epochs=5)
After creating and training the model will need to save the model in a way it can be servable using TensorFlow serving, hence we won’t be saving just the model weight.
Import time
Save_time = int(time.time()) #1
Path = f’./saved_models/{saved_time}’ #2
model.save(path, save_format=’tf’)#3
In the code above, we infuse the idea of versioning using the time module. The timestamp at the moment of saving the model is obtained and used to create an inner folder in saved_models/
and then the model is saved into the folder.
model.save(file_path, saved_format=’tf’)
create some files which are need for Tensorflow serving.

Now our model is ready and servable.
Docker essentials
In this section, we will discuss the most essential docker API needed in taking our machine learning project to production and also see how to orchestrate our app with docker-compose.
Incorporating web app with Tensorflow serving image
This section shows how to infuse TensorFlow serving into a flask web app. It shows how to call a TensorFlow serving endpoint API in Flask.
First, let’s serve our AND logic gate model using Tensorflow serving docker image. The first step is to pull the TensorFlow serving image from docker-hub.
NOTE: There is an article on Neptune.ai which explains Tensorflow serving in full detail.
docker pull tensorflow/serving
Now let’s run the tensorflow/serving image:
docker run -p 8501:8501 --mount type=bind,source=path/to/directory/saved_models,target=/saved_models/1602624873 -e MODEL_NAME=1602624873 -e MODEL_BASE_PATH=/saved_models -t tensorflow/serving
The above command starts the tensorflow/serving image by first mounting the model from our local directory to a file path in the Docker container using the command:
---mount type=bind,source=path/to/directory/saved_models,target=/saved_models/1602624873
Hence the source
path in which the saved_models
is created for the AND logic gate model, is bound to a target
path of the same name inside the docker container.
On running the image, two endpoints are created, as shown in the image below. One of the endpoints created is for GRPC, but we will be focusing on the second endpoint, which is the REST API endpoint.

In order to access the REST API endpoint outside the docker environment, we expose the port by using -p 8501:8501
in the above command.
Let’s test the endpoint to see how it works. We will be using POSTMAN to test the REST API first.

The input is passed to the model to serve via the rest API. In the POSTMAN, we specify the JSON input using this format “{“instances”: [[1,0]]}” and we obtain a response in JSON format also, which is the model output.
This shows that our served model is working properly. It’s now time to integrate the Tensorflow serving API endpoint with our web app.
But before we go into that, don’t forget the image is still running in case we want to stop the image from running, here is the code to do that:
docker ps
The command above lists the running images.

Copy the container id for the image you want to stop:
docker stop e74fe1336768
The command stops the tensorflow/serving image from running.
Let’s now create the Web interface and server to render the page since our model is now servable with TensorFlow serving.
The model web interface is going to be a form with two inputs ad a submit button, just as shown in the image below:

Here is the code for the UI at index.html
:
<html>
<head>
<link rel="stylesheet" type="text/css" href="../static/css/bootstrap-theme.min.css" />
<link rel="stylesheet" type="text/css" href="../static/css/bootstrap.min.css" />
<link rel="stylesheet" type="text/css" href="../static/css/responsive.bootstrap.min.css" />
<link rel="stylesheet" type="text/css" href="../static/css/style.css" />
</head>
<body>
<div class="container">
{%include 'includes/_messages.html' %}
<div class="row">
<h1 class="text-center">AND GATE TEST</h1>
</div>
<form action="{{url_for('home')}}" method="post">
<div class="form-row">
<div class="col">
<input type="text" class="form-control" name="inp1" placeholder="input 1">
</div>
<div class="col">
<input type="text" class="form-control" name="inp2" placeholder="input 2">
</div>
<div class="col">
<button type="submit" class="btn btn-primary ml-4">Submit</button>
</div>
</div>
</form>
</div>
</body>
</html>
Now that we’ve created the User interface let’s create the flask app to render the User interface and also handle the request to the TensorFlow serving API.
Create a file name app.py and input the code below:
from flask import Flask, render_template, flash, request
import requests
from os import environ
app = Flask(__name__)
The code above imports the necessary modules such as Flask, request, and the os module. Also, the code above initializes the flask app.
The next line of code to be added to the app.py is the code that manages and makes a call to the TensorFlow serving API.
def tfserving_request(req_input, model_name): #1
url = f"http://localhost:8501/v1/models/{model_name}:predict" #2
input_request = {"instances": [req_input]} #3
response = requests.post(url=url, json=input_request) #4
return response
Based on comment numbering:
- #1 the
tfserving_request
takes in two inputs: a request input namedre_input
and the name of the model - #2 Defines the API endpoint base
- #3 structure the input to a format accepted by the TensorFlow serving API endpoint
- #4 make a request to the API endpoint by passing in the request input
The next step is to add the route which will be used for rendering the Web interface in the browser:
@app.route("/home",methods=["GET","POST"]) #1
def home():
if request.method == "POST": #2
inp1 = int(request.form["inp1"]) #3
inp2 = int(request.form["inp2"])
response = tfserving_request([inp1,inp2], "1602624873") #4
resp = response.json() #5
flash(f"obtained {inp1} and {inp2} have a prediction of {resp['predictions']}", 'success') #6
return render_template("index.html") #7
- #1 we define the route to load render the HTML has `/home`, and we also define the request method to be accepted by the route as `GET` and `POST`
- #2 Check if the request made is a POST request
- #3 If post request, we obtain the form input from the HTML using `request.form[“inp1”]` remember that `inp1` is the input name.
- #4 make a call to the `tfserving_request` function and pass in the form input alongside the model name into the function.
- #5 the response return from the TensorFlow serving is converted to JSON
- #6 The prediction is obtained using `resp[‘predictions’], which contains the probability score. The `flash` function is used to print the result as a message
- #7 renders the UI from `index.html`
Finally, let’s add the line of code to enable the starting of the flask server:
if __name__ == "__main__":
app.run(debug=True, host='0.0.0.0', port=int(environ.get('PORT', 8080)))
When we run the script, the code above makes the host the server at port 8080.
Let’s run the app.py using:
python run app.py
This will start the server like this:

Now that the server is started, we can view the web app via http://0.0.0.0:8080/home
If we visit the link, we will see the web interface:

If we type in input into the text box of the rendered page and click on the “submit” button, we get an error page if the TensorFlow serving docker image is down. Hence we need to start the TensorFlow serving image.
docker run -p 8501:8501 --mount type=bind,source=path/to/directory/saved_models,target=/saved_models/1602624873 -e MODEL_NAME=1602624873 -e MODEL_BASE_PATH=/saved_models -t tensorflow/serving
Once we run the command above, we can go back to the web interface and type in our input. We will input `1` and `0` to the input field and then press the submit button, we get the following response as shown in the image below:

The response is displayed on top of the page showing the input received by the server and the prediction output.
Since this is working perfectly, let’s create a docker image to manage the flask app for us. In the same directory containing the flask app, let’s create a Docker file:
FROM python:3.8-slim //1
ENV PYTHONUNBUFFER ED True //2
ADD requirements.txt requirements.txt //3
RUN pip install -r requirements.txt //4
ENV APP_HOME /app //5
WORKDIR $APP_HOME //6
COPY . ./ //7
CMD ["python","app.py"] //8
Code explanation based on the comment numbering:
- Obtain a light python image of version 3.8-slim
- Prevent the app from crashing without printing a relevant message
- Add requirements.txt file containing a list of packages to install to a virtual file of the same name
- Install all the packages in the requirements.txt
- Create a directory in the docker and assign it to environmental variables
- Specify the working directory base on the directory created in 5
- Copy all the files in the flask app directory into the working directory
- Command to run the flask app after the image is created
Before we run the docker file, let’s create the requirements.txt
and one way of doing that easily is via the command below:
Pip freeze > requirements.txt
This creates the requirements.txt and adds the base of the necessary package on the package install in the python environment
But for this project, the only package needed is the Flask
and the requests
package.
Once the requirements.txt is created, let’s create an image for the flask app:
docker build -t flaskweb .
If we run the above code, we should obtain the following output if the image is successfully created.

Now that the image is successfully created, let’s run the image:
docker run -p 8080:8080 -e PORT=8080 -t flaskweb
This starts the flask server as described before.

Let’s visit the same link as mentioned before in order to view the web interface. Type in the previous input and let’s see what it’s being outputted:

An error is being outputted, this error is due to the fact that the flaskweb
can communicate to an outside host. Hence this brought us to the idea of Docker-compose in order to solve the issue.
Using Docker compose to manage services
Docker-compose gives us the opportunity to create the two docker services (TensorFlow serving and flaskweb) with a single file and command and also gives us the ability to manage both services.
NOTE: To install docker-compose on different operating systems, visit this link.
To enable docker-compose, let the flask app directory and the saved_models folder created for our model be in the same directory. Then in the flask app directory, create a file named Dockerfile.dev
and then copy all the content from the Dockerfile
into Dockerfile.dev
.
After copying, the Dockerfile.dev
` will look like this:
FROM python:3.8-slim
ENV PYTHONUNBUFFERED True
ADD requirements.txt requirements.txt
RUN pip install -r requirements.txt
ENV APP_HOME /app
WORKDIR $APP_HOME
COPY . ./
CMD ["python","app.py"]
Make sure your file system for the directory is like this:
/main-directory
-/flask_app
-/saved_models
Once this is created, let’s create a YAML file named docker-compose.yml
to define TensorFlow serving and flask web services.
version: "3.8"
services:
server:
image: tensorflow/serving
volumes:
- ./saved_models:/saved_models/1602624873
ports:
- '8501:8501'
environment:
MODEL_NAME: 1602624873
MODEL_BASE_PATH: /saved_models/
web:
image: flaskweb
build:
context: ./flask_app
dockerfile: Dockerfile.dev
ports:
- '8080:8080'
In the docker-compose.yml
we specify the docker-compose version using version
, and we also define the type of services inside the service
object. We name the two services server
and web
.
In the server
service object, we specify the image to pull from. We then define the volumes
, by copying the models from ./saved_models
into a directory in the Docker container named /saved_models/1602624873
. And we then specify the port for the service just as we did when starting the normal docker image. Also, the environment variable needed is specified inside the environment
object.
As you can see, the same process is similar to the way we run our docker image, as described in the previous section.
Also for the web
service object, we specify the name of the image. And we then create a build
object to which we define how the image should be built. We define the context by pointing to the flask_app
directory.
We tell the docker-compose to use the dockerfile
present in the directory named Dockerfile.dev
. We define the port.
To start the service, we run the following command in the directory containing the docker-compose.yml
.
docker-compose up
This starts both services as shown below:

The TensorFlow serving and the flaskweb service are running, if we visit the URL http:localhost:8080/home , it will load the web interface, but if we type in our input and click submit, we will still obtain the same error.
To resolve the error, instead of using localhost
in the TensorFlow serving API, endpoint define in the tfserving_request
function in app.py, we replace it with the name of the TensorFlow serving service named server
:
# in app.py
def tfserving_request(req_input, model_name):
url = f"http://server:8501/v1/models/{model_name}:predict"
input_request = {"instances": [req_input]}
response = requests.post(url=url, json=input_request)
return response
To see the changes, we need to stop the services running by using:
docker-compose stop
Once both services are stopped, we start it again:
docker -compose up
Once this is done, we can go ahead to the web interface and type in the input and click submit the app is working perfectly.

To learn more about docker-compose, kindly visit this link.
Building and pushing docker images to the docker hub
In order to integrate our docker images with Kubernetes (which will be discussed in the next section), we need to push our images to the docker hub.
To build and push our images to docker-hub, first, visit the docker-hub and then create an account. After the account has been created. We need to login to the docker hub from our system terminal.
Just for security purposes, store your docker password in a text file, and give it any name, I will name mine `my_password.txt`, then run the command below:
$ cat ~/my_password.txt | docker login --username steveoni --password-stdin
In the command above, I use `~/` because the `my_password.txt`, hence the docker login obtains the password from the .txt using stdin.
If the login is successful, you will see a message showing, login successful.
Let’s create an image for our flask app and then push it to the docker hub:
$ docker build -t steveoni/tfweb:1.0
This will create an image containing a tag 1.1 which is also a versioning of the image. The name `steveoni/tfweb` specify your `username/image name` .
Once the image is ready, we can now push it to the Docker hub:
$ docker push steveoni/tfweb:1.0
This pushes the image to the docker hub, as seen below:

We are done with the flask app part, now let’s do that of TensorFlow serving. Remember, we did not create a docker file for TensorFlow serving, but we are making use of the tensorflow/serving image.
We will need to build on top of the tensorflow/serving image. Let’s create a docker file in the same directory the /saved_models
is located.
From tensorflow/serving
ENV APP_HOME /saved_models/1602624873
WORKDIR $APP_HOME
COPY ./saved_models ./
The approach used in the docker file above is the same as that used in creating the previous docker images.
Let’s build the image for local testing before we build and push it to the docker hub.
$ docker build -t tfs .
Let’s test to see if the image is working properly:
$docker run -p 8501:8501 -e MODEL_NAME=1602624873 -e MODEL_BASE_PATH=/saved_models -t tfs
This starts the TensorFlow serving server.
Now we can go ahead to create the image officially and then push:
$ docker build -t steveoni/tfupdate:1.1 $ docker push steveoni/tfupdate:1.1
The image below shows that it has been successfully pushed without checking the docker-hub:

What is Kubernetes – introduction
Why Kubernetes? Imagine you have deployed your docker app to cloud service, and everything is working fine and running properly. But after some time, your application now has thousands of users making requests per second.
Unfortunately, due to the number of users making requests per second, your app kept crashing and you can’t keep avoiding the crashing, and users keep complaining.
To solve this, you can make multiple replicas of the app and make it available all the time (in case one is down, the other can come up). Another question to ask is if all the replicas went down, how do you scale back? How do you set up the network endpoint? Who is going to check the state per time? How do you manage the replicas in a way in which they communicate with each other?
Due to the question asked above comes the need for Kubernetes to resolve the situations as listed above.
“It’s a container orchestration platform that consists of several components and it works tirelessly to keep your servers in the state that you desire.”
– by Farhan Hasin Chowdhury.
Kubernetes Cluster
For this article, we will be running Kubernetes on our local machine instead of a cloud service. To get the Kubernetes running on our system, we need to install two sets of programs.
First, we need to install Minikube, which allows us to run a single-node Kubernetes cluster on our local computer. And then, we install the Kubernetes command-line tool called Kubectl.
To Install the two programs visit the link below:
NOTE: In this article, I will be summarising some of the ideas pertaining to the project used in this article. To get an overview and a practical knowledge of what Kubernetes is, visit this article by Farhan Hasin Chowdhury. I will be using some of his illustrations to introduce Kubernetes.
Once the installation is complete, you can test the programs out with the command below:
$ minikube version $ kubectl version
Before we start using minikube, let’s set up a hypervisor driver for it. In this article, we will be using Docker as the hypervisor drive.
NOTE: Hypervisor is used as an abstraction layer to separate the virtual machine from the system hardware.
The command below helps set the hypervisor for minikube:
$ minikube config set driver docker
Once this is done, we can go ahead to start minikube:
$ minikube start
In the terminal, we will see the following input after running the command above. Though it might take time to finish the whole loading process sometimes.

Before we go into using the minikube program we just started, let’s have an overview of some Kubernetes terms and the concept in general.
Kubernetes contains what we call nodes. A node can either be a virtual or physical machine assigned a particular task. A set of those machines communicating with one another over a shared network is called a cluster.
And since in this project we are making use of Minikube. We only have access to a single virtual machine that will serve as our server. Hence we call it a single-node Kubernetes cluster.
Think of it as instead of having access to multiple computers that can be used as a server, you only have access to one, which is your personal computer, which can serve as a server to host your application.
The image below shows an overview of minikube.

Usually, Kubernetes does contain two-component;
- The Control Plane Components
- The Node Plane Components
The control plane components are responsible for assigning and scheduling tasks to a node based on the available resources. They are also responsible for keeping the state of the node and also validating requests made to the node. Remember that a node is a virtual machine.
The Node plane Components are responsible for maintaining network rules on each node server. They are also responsible for maintaining, And they also provide a gateway between the control plane and each node in the cluster.
Each Node contains what we call Pods, and according to Kubernetes documentation
“ Pod are the smallest deployable units of computing that you can create and manage in Kubernetes”.

The image shows the overview of a cluster. The cluster above is a single-node cluster since we are using minikube.
A pod houses our app container. Even though a Pod can contain more than one container, it is advisable to assign a pod to it. And it is also advisable to manage Pod using higher objects. This higher object has the ability to create and delete pods at any time. Hence they help manage the pod. We discuss more on these higher objects later in the section.
A Node can contain more than one pod and with each pod performing the same function. With the help of what we call Service, we can combine all these pods in a node as a single entity. And Service gives us access to define how the pod is accessed.
With this little knowledge about Kubernetes, we go ahead with the project at hand and alongside explain some other concepts needed.
Serving ML-powered web app with Kubernetes
This section shows how to use Kubernetes to orchestrate your app. It shows differents approaches of creating pods and Loadbalancer. It also introduces the concept of `Development` and `ClusterIP`
Explicit method of creating pod and load balancer
First let’s test this idea of service, pod, and the likes using the tensorflow serving image we created in the previous sections.
Remember, we’ve started the minikube, now let’s create our first pod using the tensorflow serving image.
$ kubectl run tf-kube --image=steveoni/tfupdate:1.1 --port=8501 --env=”MODEL_NAME=1602624873” --env="MODEL_BASE_PATH=/saved_models/"
The above code is similar to the same command used in running the docker image in the previous section. tf-kube
in the command is the name of the pod we are trying to create, and the environmental variable is defined using --env
.
Then we obtain a message that the pod has been created. To see the list of pod that has been created we can use the get pods
command as shown below:
$ kubectl get pods
This is list the pods created:

You can see the list of pods that have been created by me, some are since three days ago. I forgot to delete them. Among the list of pods, we can see the pod we just created, its STATUS is running.
To delete any of the pods we just need to run the command below:
$ kubectl delete pod-name
Our tf-kube
pod is running but we can’t access it from outside the cluster. In order to access the pod outside the cluster, let’s create a service called LoadBalancer. This service helps expose a pod outside the cluster.
$ kubectl expose pod tf-kube --type=LoadBalancer --port=8501
The command above creates the LoadBalancer service named tf-kube:

Once the loadBalancer service is reading we can then go-ahead to start the `tf-kube` LoadBalancer service using minikube.
$ minikube service tf-kube

The command above starts the service as shown in the image above. It also maps the `TARGET PORT` to the `URL` has shown in the image. Hence instead of visiting `http://localhost:8501` to visit out TensorFlow serving API endpoint, we will visit the URL `http://172.17.0.2:30116`.
Let’s test the TensorFlow serving API endpoint using Postman as shown in the Tensorflow Serving section.

We can see that the API endpoint is working properly. Now we’ve been able to create our first pod and our first service.
To delete the service created:
$ kubectl delete service tf-kube
The approach we took to creating our first pod and service is not an ideal way of creating pods and services.
In the next sub section , we show how to create pods in a more reproduceable and manageable way.
Declarative method of creating pod and load balancer
Let’s take a more declarative method, just as we did during the docker-compose section. This method makes it easier for us to configure our Kubernetes. And also make it easier for others to set up.
First, let’s create a YAML file called `tf-kube-pod.yml` and input the code below:
apiVersion: v1
kind: Pod
metadata:
name: tf-kube-pod
labels:
component: server
spec:
containers:
- name: tf-kube
image: steveoni/tfupdate:1.1
ports:
- containerPort: 8501
env:
- name: MODEL_NAME
value: "1602624873"
- name: MODEL_BASE_PATH
value: /saved_models/
In the YAML file, we define the following
- Defined the
apiVersion
which is the version of the Kubernetes API we want to use - Specify the
kind
of objects we want to create, which is Pod - We define the
metadata
in which we give the pod a name and alabel
tag by assigning a valueserver
to thecomponent
property - We then define the
spec
which contains the state we desire for the pod. We define the container image to use, which issteveoni/tfupdate:1.1
and we also define the container port8501
- Also in the
spec.containers
we specify the environmental variables inenv
containing theirname
and theirvalue
.
Now let’s create a pod using the `tf-kube-pod.yml` file:
$ kubectl apply -f tf-kube-pod.yml
This creates the pod, and you can use kubectl get pods
to see the pod created.
Now lead create the loadbalancer service configuration file for the pod just created. Create a file named `tf-kube-load-balancer.yml`. Input the following in the file:
apiVersion: v1
kind: Service
metadata:
name: tf-kube-load-balancer-service
spec:
type: LoadBalancer
ports:
- port: 8501
targetPort: 8501
selector:
component: server
Same as the previous file, we specify the kind
object to be service
this time around and also in the spec
we defined the type to be LoadBalancer
. And in the spec.ports we defined the port to host the
pod and the targetport
which is the pod tf-kube-pod
port. And in spec.selector
we point the Loadbalancer to the server
component, which is the label of the tf-kube-pod
pod created above.
Then we create the service from `tf-kube-load-balancer.yml`, with the command below:
$ kubectl apply -f tf-kube-load-balancer.yml
We can also check the list of services created using kubectl get services
. And to start the service run:
$ minikube service tf-kube-load-balancer.yml
This starts the service and opens the web browser.
The whole process is working as expected. But don’t forget, while describing Pod at the beginning of the section, we mentioned that it is good to manage pod with higher objects, which have the capability of creating and deleting Pod. We will cover that in the next section.
Working with multi-container Applications
We’ve been able to create a Pod for our TensorFlow-serving, which is just a single container. Don’t forget that our main goal is to infuse the TensorFlow serving with a web app.
And as we’ve seen during the docker-compose section, we were able to create a program to manage both the Tensroflow-serving service and the flask-web app service. We will be doing the same thing in this Section.
In this section, we will be using higher objects called Deployment, and we will also be introduced to ClusterIP.
Deployment: Deployment is a controller, it gives us the ability to create multiple replicas of a Pod easily, it also gives us the ability to easily roll out and roll back updates.
“In Kubernetes, controllers are control loops that watch the state of your cluster, then make or request changes where needed. Each controller tries to move the current cluster state closer to the desired state. A control loop is a non-terminating loop that regulates the state of a system.”
– Kubernetes documentation
ClusterIP: ClusterIP is another type of service, just like LoadBalancer. In opposite to the LoadBalancer service, clusterIP only exposes an application within our cluster. That is, it prevents the application from being accessed outside the cluster.
We will be using the two terms just defined in deploying our Web app in Kubernetes.

The above image shows what the app architecture for deployment in Kubernetes will look like. We will be creating Three replicas of our Flask Web app and Tensorflow serving.
We won’t like to expose our Tensorflow serving outside the clusters, as we did in the previous section while creating pods, hence we will be creating a ClusterIP for the Tensorflow serving.
A load balancer is created for the Flask web app since we would only like to expose the app to the users.
To implement this architecture, let’s go ahead to create a Deployment configuration file and a Load Balancer configuration file for the Flask Web app. While creating this, we will be using the flask web app image deployed to docker hub; `steveoni/tfweb:1.0`
Let’s create a folder name `k8s`, you can name it any name though. Inside the file, we will create all configuration files needed.
In the folder, create a file name `tf-web-dev.yml`, and input the text below in the file:
apiVersion: apps/v1
kind: Deployment
metadata:
name: tfweb-dev
spec:
replicas: 3
selector:
matchLabels:
component: web
template:
metadata:
labels:
component: web
spec:
containers:
- name: web
image: steveoni/tfweb:1.0
ports:
- containerPort: 8080
Like all other yml files we’ve created;
- we specify the `kind` of objects to be created using this file as `Deployment` object.
- In the `spec` we specify the numbers of `replicas` as 3.
- In the `spec.selector` we gave the Object a label tag `web` using the `component` property.
- We also define the image to be used as `steveoni/tfweb:1,0`,
- and also the port to be exposed is specified using the `containerPort`
We can decide to create the Deployment object immediately by using `kubectl apply -f tf-web-dev.yml` , But the major reason why we create the folder `k8s` or whatever the name you called the folder, is to be able to create the whole object needed for the app deployment at once, using just a command.
Hence, let’s create the LoadBalancer service for the Deployment object (our flask web app) defined above. Create a file and name it `tfweb-load-balancer-service.yml`.
apiVersion: v1
kind: Service
metadata:
name: tfweb-load-balancer-service
spec:
type: LoadBalancer
ports:
- port: 8080
targetPort: 8080
selector:
component: web
The load balancer is the same as the Load balancer created in the previous section, just that this time around it is pointing to the flask web app via the tag name `spec.selector.component:web` .
Now the flask web object is ready. Let’s then create the Tensorflow serving server. Create a file called `tf-kube-dev.yml`:
apiVersion: apps/v1
kind: Deployment
metadata:
name: tf-kube-dev
spec:
replicas: 3
selector:
matchLabels:
component: server
template:
metadata:
labels:
component: server
spec:
containers:
- name: server
image: steveoni/tfupdate:1.1
ports:
- containerPort: 8501
env:
- name: MODEL_NAME
value: "1602624873"
- name: MODEL_BASE_PATH
value: /saved_models/
The configuration above is similar to the one created for the flask web; But the label tag is set to `component:server`
Let’s create a CLusterIP service for the Tensorflow serving Object. Create a file and name it `tf-cluster-ip-service.yml`:
apiVersion: v1
kind: Service
metadata:
name: tf-cluster-ip-service
spec:
type: ClusterIP
ports:
- port: 8501
targetPort: 8501
selector:
component: server
The file above is the same as that created for the `LoadBalancer` service, just that the `spec.type` is assigned the value `ClusterIP`.
The app architecture is set and ready for deployment on Kubernetes. The command below initializes the creation of the Flask web app service (web) and The Tensorflow serving service (server) at the same time.
$ kubectl apply -f k8s
The above command will work if you are in a directory containing the `k8s` folder. But if your working directory is in `k8s` itself, use the command below:
$ kubectl apply -f .
This will create the services needed based on the files in the `k8s` directory, as shown below:

From the image, we can see that the objects and services are created
Let’s check if the pods are running:
$ kubectl get pods

Remember that we created 3 replicas for each of the Deployment objects. Hence the total number of pods running should be six, which is correct, as seen from the image above.
To see the Deployment objects:
$ kubectl get deployments

The deployment objects are properly created.
Let’s check the services created for both objects:
$ kubectl get services

The `tf-cluster-ip-service` for `ClusterIP` and `tfweb-load-balancer-service` for `LoadBalancer` is properly created.
Now that we have everything set, let’s start the `LoadBalancer` service:
$ minikube service tfweb-load-balancer-service

This will open the web browser at URL: `http://172.17.0.2:32640/` to see the app, let’s go to route `/home`, which renders the web interface.
When values were inputted, and the form was submitted, the server responded with an error:

Remember, we’ve faced this type of error before while trying to make our flask web app docker communicate with the TensorFlow serving docker. And we solve this by replacing the `localhost` with `server`, which is the name of the TensorFlow serving service created by docker-compose.
To solve this, we need to replace the `server` host used in `tfserving_request` in our `app.py` with the CLusterIP service name hosting the TensorFlow serving. Hence we replace `server` with `tf-cluster-ip-service` in app.py.
def tfserving_request(req_input, model_name):
ur="http://tf-cluster-ip-service:8501/v1/models/{}:predict".format(model_name)
input_request = {"instances": [req_input]}
response = requests.post(url=url, json=input_request)
return response
Once this is done, we rebuild the `tfweb` image and push it to docker-hub. Now the new image is given a version `1.2`.
$docker build -t steveoni/tfweb:1.2 .
$ docker push steveoni/tfweb:1.2
With this, we need to change the `image` we are pulling from in `tfweb-dev.yml` to `steveoni/tfweb:1.2`
Let’s delete the former deployment objects and services created:
$ kubectl delete deployments --all $ kubectl delete services --all
We then create new Deployments objects and Services again:
$ kubectl apply -f k8s
Then we start the LoadBalacer service:
$ minikube service tfweb-load-balancer-service
The web page is loaded automatically, let’s go over to the route `/home` to test the web app:

Now the app is working properly.
If applying this approach to any other project and your pods refuse to start, you can always see the full details of the pod by running the command below:
$ kubectl describe pods
This will give the full details of the whole pod created, but to get for a specific pod, run `kubectl get pods`. Obtain the name of the pod and then run:
$ kubectl describe pod pod-name
And to see the logs from any of the pods, we use the command below:
$ kubectl logs pod-name
Conclusion
In this article, we’ve been able to see how you migrate your way out of the conventional method of building machine learning apps. I believe this article has shown you how to build efficient and scalable machine learning apps.
Also, you have added to the tools for becoming a Unicorn data scientist. Well, there are some other tools and concepts to learn to build and take ML products to production efficiently, such as:
- CI/CD: How to add unit testing to your model and use a single push to push your app to github, pass some series of tests and then push to production
- TFX/Kubeflow: Custom tools to make the orchestration of your app and deployment using Kubernetes easier.
Reference
- The Kubernetes handbook: https://www.freecodecamp.org/news/the-kubernetes-handbook/#introduction-to-container-orchestration-and-kubernetes
- Building Machine Learning Pipeline: https://www.oreilly.com/library/view/building-machine-learning/9781492053187/
- The Docker handbook: https://www.freecodecamp.org/news/the-docker-handbook/