When building data science and machine learning powered products the research-development-production workflow is not linear like in traditional software development where the specs are known and problems are (mostly) understood beforehand.
There are lots of trial and error involved, including the test and use of new algorithms, trying new data versions (and managing it), packaging the product for production, end-users views and perspectives, feedback loops, and more. These make managing those projects a challenge.
Isolating the development environment from the production systems is a must if you want to assure that your application will actually work. And so putting your ML model development work inside a (docker) container can really help with:
- managing the product development,
- keeping your environments clean (and making it easy to reset it),
- most importantly, moving things from development to production becomes easier.
In this article, we will be discussing the development of Machine Learning (ML) powered products, along with best practices for using containers. We’ll cover the following:
- Machine learning iterative processes and dependency
- Version control at all stages
- MLOps vs DevOps
- Need for identical dev and prod environment
- Essentials of Containers (meaning, scope, docker file and docker-compose etc.)
- Jupyter notebook in containers
- Application development with TensorFlow in containers as microservice
- GPU & Docker
What you need to know
In order to fully understand the implementation of machine learning projects in containers, you should:
- Have a basic understanding of software development with Docker,
- Be able to program in Python,
- Be able to build basic machine learning and deep learning models with TensorFlow or Keras,
- Have deployed at least one machine learning model.
The following links might be useful to get you started if you don’t know Docker, Python or TensorFlow:
Machine learning iterative processes and dependency
Learning is an iterative process. When a child learns to walk, it goes through a repetitive process of walking, falling, standing, walking, and so on – until it “clicks” and it can confidently walk.
The same concept applies to machine learning, and it’s necessary to ensure that the ML model is capturing the right patterns, characteristics and inter-dependencies from given data.
When you are building an ML-powered product or application,you need to be prepared for the iterative process in this approach, especially with machine learning.
This iterative process is not limited to product design alone, but it covers the entire cycle of product development using machine learning.
The right patterns that the algorithm needs to make right business decisions are always hidden in the data. Data scientists and MLOps teams need to put in a lot of effort to build robust ML systems capable of performing this task.
Iterative processes can be confusing. As a rule of thumb, a typical machine learning workflow should consist of at least the following stages:
- Data collection or data engineering
- EDA (Exploratory Data Analysis)
- Data pre-processing
- Feature engineering
- Model training
- Model evaluation
- Model tuning and debugging
- Deployment
For each stage, there is a direct or indirect dependency on other stages.
Here is how I like to view the entire workflow based on levels of system design:
- The Model Level (fitting parameters): assuming that the data has been collected, EDA and basic pre-processing done, the iterative process begins when you have to select the model that fits the problem you are trying to solve. There is no shortcut, you need to iterate through some models to see which works best on your data.
- The Micro Level (tuning hyperparameters): once you select a model (or set of models), you begin another iterative process at the micro level, with the aim to get the best model hyperparameters.
- The Macro Level (solving your problem): the first model you build for a problem will rarely be the best possible, even if you tune it perfectly with cross-validation. That’s because fitting model parameters and tuning hyperparameters are only two parts of the entire machine learning problem-solving workflow. At this stage, there is a need to iterate through some techniques for improving the model on the problem you are solving. These techniques include trying other models, or ensembling.
- The Meta Level (improving your data): While improving your model (or training the baseline) you may see that the data that you are using is of poor quality (for example, mislabeled) or that you need more observation of a certain type (for example, images taken at night). In those situations improving your datasets and/or getting more data becomes very important. You should always keep the dataset as relevant as possible to the problem you are solving.
These iterations will always lead to lots of changes in your system, so version control becomes important for efficient workflow and reproducibility.
Version control at all stages
Version control is a system that records changes to a file or set of files over time so that you can recall specific versions later. Because of the iterative processes involved in the development of a ML-powered product, versioning has become crucial to the success of the product, and future maintenance or optimization.
Files in your ML workflow, and systems such as notebooks, datasets, scripting files – they all need versioning.
There are many tools and best practises for versioning these files depending on your team’s preferences. I’ll share what works best for me.
Generally, you will use version control systems such as Git, Apache Subversion (SVC), or Concurrent Version Systems (CVS). But using only one of these systems might not be the best for machine learning projects, because of the kind of files used in the ML workflow. It’s best to add other useful tools for efficient versioning of each file.
Data Versioning: most companies store data in a database or cloud storage / buckets, like the Amazon S3 bucket or Google Cloud Storage, where data can be pulled when needed.
Pulling a sample to best represent the problem you are trying to solve might be iterative, and it becomes important to version the data used to train a machine learning model.
There is a limit to the volume and size of file you can push to a version control platform and sometimes, the data you will be working with comes in gigabytes, so it’s not the best way to approach this.
With tools like DVC and Neptune, data versioning becomes easier. Below are some useful links to get you started with data version control:
Notebook Versioning: Jupyter, Colab notebooks generate files that may contain metadata, source code, formatted text, and rich media.
Unfortunately, this makes these files poor candidates for conventional version control solutions, which work best with plain text. The problem with these notebooks is that they are human-readable JSON .ipynb files. It is uncommon to edit the JSON source directly because the format is so verbose. It’s easy to forget required punctuation, unbalance brackets like {} and [], and corrupt the file.
More troublesome, Jupyter source code is often littered with cell output stored as binary blobs. Little changes in the notebook, such as rerunning with new data, will look like a significant change in the version control commit logs.
Some built-in solutions to effectively keep track of the file convert the notebook to HTML, or to a Python file. External tools that you can use for this are nbdime, ReviewNB, Jupytext and Neptune, to mention a few.
My choice is Neptune, because it can integrate with Jupyter and JupyterLab as an extension. Version control is just one of Neptune’s features. The team, project, and user management features make this more than a version control tool, but the software’s lightweight footprint may make it a compelling candidate regardless.
EDITOR’S NOTE
Your entire project can be versioned using version control systems, and this becomes even easier with containers, which we’ll soon discuss.
MLOps vs DevOps
Before we dive into containers for machine learning with TensorFlow, let’s quickly go through the similarities and differences between MLOps and DevOps.
MLOps (Machine Learning Operations) aims to manage the deployment of all types of machine learning (deep learning, federated learning, etc) in large-scale production environments.
DevOps (Development and Operations) is a set of practices that combines software development and IT operations at large scale. It aims to make development cycles shorter, increase deployment velocity, and create dependable releases.
DevOps principles also apply to MLOps, but there are some aspects of machine learning workloads that require a different focus or implementation.
Having in mind the basic ML workflow we discussed earlier, we can pinpoint the following differences in MLOps and DevOps:
- Team Skills: an MLOps team has research scientists, data scientists, and machine learning engineers who serve the same role as a software engineer in a DevOps team. The ML engineers have the essential skills of a software engineer, combined with data science expertise.
- Development: DevOps is linear, and MLOps is more experimental in nature. The team needs to be able to manipulate model parameters and data features, and retrain models frequently as the data changes. This requires more complex feedback loops. Also, the team needs to be able to track operations for reproducibility without impeding workflow reusability.
- Testing: in MLOps, testing requires additional methods beyond what is normally done in DevOps. For example, MLOps requires tests for data validation, model validation, testing of model quality, model integration and differential tests.
- Deployment : the deployment process in MLOps is similar to DevOps, but it depends on the type of ML system you’re deploying. This becomes easier if the designed ML system is decoupled from the entire product cycle, and acts as an external unit to the software.
- Production: a production machine learning model is continuous and can be more challenging than traditional software in production. The intelligence can degrade with time as user data changes. MLOps needs model monitoring and auditing to avoid the unexpected.
READ ALSO
MLOps: What It Is, Why It Matters, and How to Implement It (From a Data Scientist Perspective)
The Best MLOps Tools You Need to Know as a Data Scientist
Need for identical development and production environment
In software engineering there are typically two stages of product development – development and production.This can be reduced to one when cloud-native is the choice for both development and production, but the majority of ML apps are developed on local PCs before being pushed to cloud.
The production environment is the reproduction of the development environment, with a focus on the key dependencies for smooth product performance.
Reproducing the environment in MLOps, or manually keeping track of these dependencies can be challenging because of the iterative processes involved in the workflow.
For Python developers, tools such as Pip and Pipenv are often used to bridge this gap, but containers are a better way to keep things cleaner.
Essentials of Containers in MLOps
A container is a standard unit of software that packages code and all its dependencies, so the application runs quickly and reliably from one computing environment to another.
With containers, there is no need for selective cloud or any computing environment configuration for production, because they can run almost everywhere.
Using containers to separate project environments makes it flexible for ML teams to test-run new packages, modules and framework versions, without breaking the entire system, or having to install every tool on a local host.
Think of a container as a virtual machine. They have a lot in common, but function differently because containers virtualize the operating system instead of hardware. Containers are more portable and efficient.

Containers are an abstraction at the app layer that packages code and dependencies together. Multiple containers can run at the same machine, and share the OS kernel with other containers, each running as isolated processes in user space.
Containers take up less space than VMs, because virtual machines are an abstraction of physical hardware, turning one server into many servers. Each VM includes a full copy of an operating system, the application, necessary binaries and libraries.
While there are many container running tools, we’ll focus on Docker.
A container is a great way to do research and experimentation, with flexibility to add data analytics and machine learning tools (like jupyter notebook and jupyter lab). Docker containers on development hosts are a great tool for model development, as trained models can be saved and turned into self-contained images, and used as a microservice.
This eliminates the risk of having to delete an entire virtual environment, or reinstall the operating system if crashed by bad packages or frameworks. With containers, all that is needed is to delete or rebuild the image.
One key thing that is worth noting is that the container file system goes away when the image is removed or stopped.
Follow the guides below to get Docker installed on your local machine if you don’t have it. Depending your host operating system, you can download it via these links:
Once installed, you can type “docker” in your terminal to check if your host machine recognises the command.
The output should look similar to this:

Jupyter notebook in containers
Being able to run a jupyter notebook on docker is great for data scientists, because you can do research and experimentation with or without directly interfacing the host machine.
Jupyter is designed to be accessed through a web browser interface, which is powered by a built-in web server. It is best run with official images (which include ipython & conda), standard python libraries, all required jupyter software and modules, and additional data science and machine learning libraries. The development environment can be set up by just pulling an official image.
To get started, let’s pull the official Jupyter TensorFlow image, which we’ll use as our development environment in this project. You can look through the list of images and their content on the official Jupyter site.
When we first run this command, the image gets pulled and cashed upon subsequent running. We can run this command directly on the terminal, but I prefer to run it via docker compose, since we’ll be running multiple-container applications that work together. With this, typos can be avoided.
Compose is a tool for defining and running multi-container Docker applications. With Compose, you use a YAML file to configure your application’s services. Then, with a single command, you create and start all the services from your configuration.
The project compose file is shown in the figure below.

We defined two services (tensorflow and tf_model_serving). Focusing on the highlighted part (tensorflow service to use the jupyter/tensorflow-notebook image), let’s explain each of the tags:
- Image: this is used to specify the docker image to be used for the service. In our case, we have specified the official tensorflow jupyter notebook.
- Ports: we use this to map the host machine port to the container port. In this case, since jupyter runs on port 8888 by default, we have mapped this to port 8888 on the local host, but feel free to change the localhost port if 8888 has already been used by another service.
- Volumes: with this, we can bind-mount a local host directory to our working directory in the container. This is very useful to save intermediate files such as model artefacts to bind localhost directory, since container files get removed once it stops running. In this case, we have bind-mounted the host notebook folder to projectDir (project directory on the running container notebook).
- Environment: the official jupyter image can be run as a jupyter notebook or jupyter lab. With the environment tag, we can specify our preferred choice. Setting the “JUPYTER_ENABLE_LAB” to yes indicates that we have decided to run jupyter as a lab and not notebook.
To run a service using docker compose, we can run “docker-compose up <service name>”. For the TensorFlow jupyter service, we run “docker-compose up tensorflow”. This command pulls the image from docker hub for the first time:

Subsequent running of the command runs the cashed imaged directly, without the need to pull again just as shown below.

We can access this notebook via the web url displayed in the terminal as highlighted above, and intermediate logs are shown.

Now that the notebook is up and running, we can carry out our research and experimentation on the ML-powered product we’re working on. Any package that is not included by default can be installed using “!pip install <package name>” in any of the code cells. To check installed packages, you can run “!pip freeze”.
MIGHT BE USEFUL
Application development with TensorFlow in containers
For this project, we will develop an automated image classification solution for photographs of marine invertebrates taken by researchers in South Africa, and serve the model with Tensorflow serving. You can read more about the problem and the provided dataset on ZindiAfrica.
I won’t be going deep into how to build or optimize the model, but I’ll work you through how to get it saved in the TensorFlow serving format, and serve in a container as a microservice.
You can work through the notebook for clarification on how the model was built with TensorFlow via the GitHub link at the end of this article. Some of images in the 137 classes are shown below:

Once the model is built and trained on these images with satisfying evaluations, we can load the .h5 keras model saved during training and save it in the format required by TensorFlow serving as shown below:

import time
from tensorflow.keras.models import load_model
ts = int(time.time())
loadmodel = load_model('model.h5')
loadmodel.save(filepath=f'/home/jovyan/projectDir/classifier/my_model/{ts}', save_format='tf')
This will create some artifacts needed for serving just as shown in the figure below:

TensorFlow Serving makes it easy and efficient to expose a trained model via a model server. It provides flexible APIs that can be easily integrated with an existing system.
This is similar to how you would have used frameworks like Flask or Django to expose the saved model, but TensorFlow Serving is more powerful and a better choice for MLOps, as there is a need to keep track of model version, code separation and efficient model serving.
To know more about why I chose to use this over other traditional frameworks, check the “Building Machining Pipelines” book.
Model Serving Architecture
The goal is to build and deploy this as a microservice, using containers as a REST API which can be consumed by a bigger service, like the company website. With TensorFlow serving, there are two options to API endpoints – REST and gRPC.
- REST: REpresentational State Transfer is an architectural style for providing standards between computer systems on the web, making it easier for systems to communicate with each other. It defines a communication style on how clients communicate with web services. Clients using REST communicate with the server using standard HTTP methods like GET, POST, and DELETE. The payloads of the requests are mostly encoded in JSON.
- gRPC: open-source remote procedure call system, initially developed at Google in 2015. It is preferred when working with extremely large files during inference, because It provides low-latency communication and smaller payloads than REST.
Model Serving Architecture

While the two APIs (REST and gRPC) are available for consumption, the focus is on the REST API.
I simulate this using client code in a container jupyter notebook. To achieve this, we can embed our saved model in a custom docker image built on top of the official TensorFlow serving image.
Client requests can be balanced or evenly distributed across a number of TensorFlow Serving replicas (4 for our use case), using a single node balancer with docker swarm or kubernetes.
I will use docker swarm to orchestrate both our client notebook and the custom image since it is part of my installed docker application.
In the docker compose yml file, we will need to add the TensorFlow Serving service to it just as shown below:

Let’s quickly work through the tags for the tf_model_serving service
- Image: our custom serving docker image – tf_serving – will be built and tagged classifier_model.
- Build: the compose file for tensorflow_model_serving service has a build option which defines context and name of the docker file to use for building. In this case, we have it named Dockerfile with the following docker commands in it:

This file will be used by the docker compose file to build the custom serving image.
FROM: Used to specify the base image to be used. This will be pulled from docker hub if this is the first time you are pulling the image.
COPY: Used to tell docker what to copy from our host machine to the image being built. In our case, we are copying the saved model in the ./notebooks/classifier into the /models/ directory in the custom TensorFlow serving image.
ENV: MODEL_NAME=my_model tells TensorFlow serving where to look for the saved model upon request.
- Deploy: With this tag, we can specify the number of replicas for load balancing (4 in our case). Setting endpoint_mode to vip makes this container accessible in service discovery by Virtual IP. The containers provisioned in docker swarm mode can be accessed in service discovery via a Virtual IP (VIP), and routed through the docker swarm ingress overlay network, or via a DNS round robin (DNSRR).
To build this custom serving image, run the command “docker-compose build <service name> “ (in our case, “docker-compose build tf_model_serving”) in your terminal as shown below:

After the custom image has been built, we can use docker swarm to start up the services listed in the docker compose file using the commands below:
“docker swarm init”
“docker stack deploy -c <docker compose file> <stack name> . in our case,
“docker stack deploy -c docker-compose.yml tf”
With this, docker will create a virtual network which allows all containers to communicate with each other by name.

To check the logs each containers, we can run the command below
“docker service logs <any of the service names above>.
Running “docker service logs tf_tf_model_serving” will show the logs:

Now that the server is on, we can simulate how this will be consumed as a microservice using the client code notebook in the running jupyter notebook, as shown in the model architecture.
To access the notebook web url, we can run service log on it:
“docker service logs tf_tensorflow”

Running in a browser should give something similar to this:

I kept 10 random images from the dataset in a folder to test the API from the notebook. These images are shown below:

Each has been reshaped into 224*224 size, just like I did while training the model. Before sending a request to the API, let’s quickly construct the api endpoint just as shown in the code snippet below:
tf_service_host = 'tf_model_serving'
model_name = 'my_model'
REST_API_port = '8501'
model_predict_url = 'http://'+tf_service_host+':'+REST_API_port+'/v1/models/'+model_name+':predict'
You will notice that this is generic and the general format may look like: http://{HOST}:{PORT}/v1/models/{MODEL_NAME}:{VERB}
- HOST: The domain name and IP address of your model server or service name. In our case, we have declared this as “tf_service_host” which can serve as our host.
- PORT: This is the server port for the url, for REST API, the port is 8501 by default as seen in the architecture above.
- MODEL_NAME: This is the name of the model we are serving. We set this “my_model” while configuring.
- VERB: This can be: classify, regress or predict, based on the model signature. In our case, we use “predict”.
We can have a prediction function with which we can pre-process input images from clients into the required format (“JSON”) before sending it to the API.
def model_predict(url,image):
request_json = json.dumps({"signature_name": "serving_default", "instances": image.tolist()})
request_headers = {"content-type": "application/json"}
response_json = requests.post(url, data=request_json, headers=request_headers)
prediction = json.loads(response_json.text)['predictions']
pred_class = np.argmax(prediction)
confidence_level = prediction[0][pred_class]
return (pred_class,confidence_level)
In the code snippet above, the first line – “json.dump” – is used to declare JSON data payload which is the format required by the API.
The instance parameter is set to the image we want to classify. In line 3, we send a post request to the server passing the url, json file, and the headers.
We then get the prediction out of the return json info with the key “prediction”. Since we have 137 classes in the dataset, we can get the exact predicted class using numpy argmax function, and also obtain the model prediction confidence level. These two were returned as Python tuples.
Invoking this function on the 10 test data with for loop as shown below:
predicted_classes = []
for img in test_data:
predicted_classes.append(model_predict(url = model_predict_url, image=np.expand_dims(img,0)))
This will return [(0, 0.75897634),
(85, 0.798368514),
(77, 0.995417),
(120, 0.997971237),
(125, 0.906099916),
(66, 0.996572495),
(79, 0.977153897),
(106, 0.864411),
(57, 0.952410817),
(90, 0.99959296)]
We can structure the result as shown below:
for pred_class,confidence_level in predicted_classes:
print(f'predicted class= {Class_Name[pred_class]} with confidence level of {confidence_level}')
With the output
predicted class= Actiniaria with confidence level of 0.75897634
predicted class= Ophiothrix_fragilis with confidence level of 0.798368514
predicted class= Nassarius speciosus with confidence level of 0.995417
predicted class= Salpa_spp_ with confidence level of 0.997971237
predicted class= Solenocera_africana with confidence level of 0.906099916
predicted class= Lithodes_ferox with confidence level of 0.996572495
predicted class= Neolithodes_asperrimus with confidence level of 0.977153897
predicted class= Prawns with confidence level of 0.864411
predicted class= Hippasteria_phrygiana with confidence level of 0.952410817
predicted class= Parapagurus_bouvieri with confidence level of 0.99959296
GPU and Docker
Docker is a great tool to create containerized machine learning and data science environments for research and experimentation, but it will be great if we can leverage GPU acceleration (if available on a host machine) to speed things up, especially with deep learning.
GPU-accelerated computing works by assigning compute-intensive portions of an application to the GPU, providing a supercomputing level of parallelism that bypasses costly, low-level operations employed by mainstream analytics systems.
The use of GPU on host for data science projects depends on two things:
- GPU support host machine
- GPU support packages and software
Since docker isolates containers from a host to a large extent, giving containers access to GPU-accelerated cards is a trivial task to achieve.
At the time of writing this article, docker community officially supports GPU acceleration for containers running on Linux hosts. While there are workarounds for windows and mac OS hosts, achieving this could be a very difficult task.
One way to know if your running Tensoflow jupyter container has access to GPU is with the code snippet below:
tf.config.experimental.list_physical_devices('GPU')
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
Even though my host machine has GPU support, I have not been able to leverage on this because I’m running this on macOS.
Nevertheless, docker is the easiest way to run TensorFlow with GPU support. Click here to set up a TensorFlow docker image with GPU support on a Linux host.
Conclusion
Here is the Github link I promised to the project we worked on. Check it out!
Thank you for reading this tutorial, I hope it was helpful.