MLOps Blog

Applications of AI in Drone Technology: Building Machine Learning Models That Work on Drones (With TensorFlow/Keras)

6 min
24th August, 2023

Welcome back to the second part of Building a Facemask Surveillance System with Drone Technology and Deep Learning. In the first part, we covered an introduction to drone technology, various classifications, the architecture of the drone utilized for this project, and setting up the environment for programming the drone with python.

If you are not familiar with Part 1, you can find it on our blog.

In this part, we are going to download and do a pre-processing of the facemask datasets, build the face mask detection model using Tensorflow/Keras, run training and save the deep learning model for further implementation.


To fully understand this tutorial, it is assumed that you:

Below are some links to help you get started:


With TensorFlow
With Keras

Now, let’s continue to build our surveillance system. In this section we build the deep learning model to detect two classes, namely:

  1. People with mask
  2. People without mask
Mask detection
Face mask classification

To perform this classification, we will be utilizing a class of deep neural networks called Convolutional neural networks(CNN), which is commonly applied to analyzing visual imagery

Downloading and pre-processing the datasets

Data is the core of any ML/AI algorithm. For this project, the dataset was downloaded from Kaggle and RMFD dataset. The dataset consists of 3835 images belonging to two classes:

  1. with_mask: 1916 images
  2. without_mask: 1919 images.

To start building these models, we need to import the necessary libraries which include modules for preprocessing, model building, model evaluation, visualization, and file management.

from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.layers import AveragePooling2D
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.preprocessing.image import load_img
from tensorflow.keras.utils import to_categorical
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from imutils import paths
import matplotlib.pyplot as plt
import numpy as np
import argparse
import os

Also to aid TensorBoard logging, Neptune AI has made a neat integration with both platforms. To utilize this service, we import the following libraries:

import random
import psutil
import neptune
import neptune_tensorboard as neptune_tb

Following that, we’ll load our Neptune API credentials from the .env using dot_env module. This API token is needed to authorize communication between your training scripts and Neptune. In order to keep our API token secret as it is like a password to our application, we will utilize environment variables to load the value using the dot_env library.

from dotenv import load_dotenv


Next, we will initiate the project and automatically logs TensorBoard metrics:

<pre class="hljs" style="display: block; overflow-x: auto; padding: 0.5em; color: rgb(51, 51, 51); background: rgb(248, 248, 248);">neptune.init(project_qualified_name=<span class="hljs-string" style="color: rgb(221, 17, 68);">'codebrain/Drone'</span>,

With these in place, we can start setting the environment for training our model for the face mask detection, we will initialize the initial learning rate, the number of epochs to train for, and batch size and set the experiment log directory.

	'EPOCHS': 20,
	'BS': 32,
	'INIT_LR': 1e-4,
RUN_NAME = 'run_{}'.format(random.getrandbits(64))
EXPERIMENT_LOG_DIR = 'logs/{}'.format(RUN_NAME)

Next, we are going to construct argument parsers that are going to make it easy to write user-friendly command-line interfaces to interact with the dataset, plot, and model.

ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", required=True,
            	help="path to input dataset")
ap.add_argument("-p", "--plot", type=str, default="plot.png",
            	help="path to output loss/accuracy plot")
ap.add_argument("-m", "--model", type=str,
            	help="path to output face mask detector model")
args = vars(ap.parse_args())

Next, we will be grabbing the list of images from our dataset directory and then Initializing the list of data and classes/labels.

print("[INFO] loading images...")
imagePaths = list(paths.list_images(args["dataset"]))
data = []
labels = []

We will loop over the image paths to extract the class label from the filename and preprocess the image per time into 224 by 224 pixels which are to be fed into the neural network. Furthermore, we convert these images to arrays and pass the input image to the preprocess_input function, which is meant to adequately your image to the format the model requires (you guarantee that the images you load are compatible with preprocess_input). Finally converting data and labels to NumPy arrays for further processing. One hot encoding is performed on labels to convert categorical data to numerical data.

for imagePath in imagePaths:
	label = imagePath.split(os.path.sep)[-2]

		image = load_img(imagePath, target_size=(224, 224))
	image = img_to_array(image)
	image = preprocess_input(image)

	# update the data and labels lists, respectively

# convert the data and labels to NumPy arrays
data = np.array(data, dtype="float32")
labels = np.array(labels)

# perform one-hot encoding on the labels
lb = LabelBinarizer()
labels = lb.fit_transform(labels)
labels = to_categorical(labels)

Next, we tell Neptune to create an experiment. Give it a name and log hyperparameters. It is recommended to have everything in the with statement if possible, to enforce auto-clean once the experiment is complete.  Before training the model we need to split the data into train and test data using a split of 80% of the data for training and the remaining 20% for test. 

with neptune.create_experiment(name=RUN_NAME, params=PARAMS):

	# partition the data into training and testing splits using 75% of
	# the data for training and the remaining 25% for testing
	(trainX, testX, trainY, testY) = train_test_split(data, labels,
                                                  	test_size=0.20, stratify=labels, random_state=42)

Construction of the training image generator to artificially expand the size of a training dataset by creating modified versions of the images. This data augmentation will aid the model to generalize well.

aug = ImageDataGenerator(

Model building

After preprocessing the dataset and properly labeling it, the next step is to train a model to accurately classify the images. There are two ways to go about this, it’s either you build a classifier from scratch or use a pre-trained model. I chose the latter and adapted the mobilenet_v2 which is a convolutional neural network that is 53 layers deep. 

mobile net
MobileNet V2 Architecture 

Note: When using a pre-trained model, it is important to read well on the model being used and it can be adapted to solve the problem at hand. Also, for it to be able to work with your preprocessed dataset. For the context of this work, MobilenetV2 was adapted due to state-of-the-art performances with object detection, reduced complexity, and limitation over computation, graphic processing, and storage.

Adapting the model involved loading the model with a pre-trained weight imagenet and adding more structures to the model. The convolutional layers were followed by activation function ReLU(to add non-linearity) and Max Pooling (to reduce the feature map). Dropout is added to Prevent Neural Networks from Overfitting. Then, fully connected layers are added at the end. Finally, we compiled our model to the loss function, the optimizer, and the metrics. The loss function is used to find errors or deviations in the learning process. Keras requires loss function during the model compilation process. Optimization is an important process that optimizes the input weights by comparing the prediction and the loss function and metrics are used to evaluate the performance of your model. The model was serialized and saved on my local disk.

Note: The headModel is to be placed on top of the baseModel, this will become the actual model we will train.

baseModel = MobileNetV2(weights="imagenet", include_top=False,
                    	input_tensor=Input(shape=(224, 224, 3)))

headModel = baseModel.output
headModel = AveragePooling2D(pool_size=(7, 7))(headModel)
headModel = Flatten(name="flatten")(headModel)
headModel = Dense(128, activation="relu")(headModel)
headModel = Dropout(0.5)(headModel)
headModel = Dense(2, activation="softmax")(headModel)
model = Model(inputs=baseModel.input, outputs=headModel)

# loop over all layers in the base model and freeze them so they will
# *not* be updated during the first training process
for layer in baseModel.layers:
	layer.trainable = False

# compile our model
print("[INFO] compiling model...")
opt = Adam(lr=PARAMS['INIT_LR'],
           decay=PARAMS['INIT_LR'] / PARAMS['EPOCHS'])
model.compile(loss="binary_crossentropy", optimizer=opt,

# train the head of the network
print("[INFO] training head...")
H =
    	aug.flow(trainX, trainY, batch_size=PARAMS['BS']),
    	steps_per_epoch=len(trainX) // PARAMS['BS'],
    	validation_data=(testX, testY),
    	validation_steps=len(testX) // PARAMS['BS'],

The next step is to evaluate the model’s performance by predicting the test data labels.

print("[INFO] evaluating network...")
predIdxs = model.predict(testX, batch_size=PARAMS['BS'])

# for each image in the testing set we need to find the index of the
# label with corresponding largest predicted probability
predIdxs = np.argmax(predIdxs, axis=1)

# show a nicely formatted classification report
print(classification_report(testY.argmax(axis=1), predIdxs,

# serialize the model to disk
print("[INFO] saving mask detector model...")["model"], save_format="h5")

Finally, we need to visualize the training loss and accuracy from our Neptune experiment dashboard.

Neptune drone exp dashboard

The dashboard can be seen here.

We can monitor the hardware consumption from the Neptune experiment dashboard.

The dashboard can be seen here.

Implementing the model on the video stream 

This is a new script that will load the saved model from the previous session. To begin implementation, we need to import some necessary libraries.

from tensorflow.keras.applications.mobilenet_v2 import preprocess_input
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.models import load_model
from import VideoStream
import numpy as np
import argparse
import imutils
import time
import cv2
import os
import time

Next, we will be creating two functions get_facenet_masknet and detect_and_predict_mask. The get_facenet_masknet function will read in the previously serialized trained model and corresponding weights.

def get_facenet_masknet():
	# construct the argument parser and parse the arguments
	ap = argparse.ArgumentParser()
	ap.add_argument("-f", "--face", type=str,
                	help="path to face detector model directory")
	ap.add_argument("-m", "--model", type=str,
                	help="path to trained face mask detector model")
	ap.add_argument("-c", "--confidence", type=float, default=0.5,
                	help="minimum probability to filter weak detections")
	args = vars(ap.parse_args())

	# load our serialized face detector model from disk
	print("[INFO] loading face detector model...")
	# prototxtPath = os.path.sep.join([args["face"],  "deploy.prototxt"])
	prototxtPath = (
	# weightsPath = os.path.sep.join([args["face"],
	#                             	"res10_300x300_ssd_iter_140000.caffemodel"])
	weightsPath = (
	faceNet = cv2.dnn.readNet(prototxtPath, weightsPath)

	# load the face mask detector model from disk
	print("[INFO] loading face mask detector model...")
	maskNet = load_model(
	return(faceNet, maskNet, args)

With that in place, the detect_and_predict_mask will grab the dimensions of the frame and construct a blob from it. The blob will be passed through the network to detect the face.

We will grab the dimensions:

def detect_and_predict_mask(frame, faceNet, maskNet, args):
(h, w) = frame.shape[:2]
	blob = cv2.dnn.blobFromImage(frame, 1.0, (300, 300),
                             	(104.0, 177.0, 123.0))

	detections = faceNet.forward()

Following this, we will initialize our list of faces and their corresponding locations and also the list of predictions from our face mask network.

	faces = []
	locs = []
	preds = []

We will loop over the detections to extract the confidence (i.e probability) associated with the detection.

	for i in range(0, detections.shape[2]):
    		confidence = detections[0, 0, i, 2]

Next we will filter out weak detections by ensuring the confidence is greater than the minimum confidence.

   	if confidence > args["confidence"]:
        	# compute the (x, y)-coordinates of the bounding box for
        	# the object
        	box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
        	(startX, startY, endX, endY) = box.astype("int")

We must also ensure that bounding boxes fall within the dimensions of frame.

        	(startX, startY) = (max(0, startX), max(0, startY))
        	(endX, endY) = (min(w - 1, endX), min(h - 1, endY))

We will do some pre-processing steps that involve extracting the face ROI then convert it from BGR to RGB channel ordering. Then proceeds to resizing the frame to a 224 by 224 pexels and converting it to arrays.

        	face = frame[startY:endY, startX:endX]
        	face = cv2.cvtColor(face, cv2.COLOR_BGR2RGB)
        	face = cv2.resize(face, (224, 224))
        	face = img_to_array(face)
        	face = preprocess_input(face)

We must ensure that we add the face and bounding boxes to their respective lists.

        	locs.append((startX, startY, endX, endY))

Prediction will be made if at least on face was detected, then make batch predictions on all faces at the same time rather than a one-by-one predictions.

	if len(faces) > 0:
    	faces = np.array(faces, dtype="float32")
    	preds = maskNet.predict(faces, batch_size=32)

Finally, we will  return a 2-tuple of the face locations and their corresponding locations.

	return (locs, preds)


In this tutorial, you’ve seen how to preprocess and load the facemask dataset, as well as train the facemask detection model using Tensorflow Mobilenet V2 with Python. The trained model was further adapted for video stream implementation. Finally, the model was serialized for the next phase of the project, which involves the deployment of this surveillance system on a flask application.

Armed with the knowledge gained from part 1 and 2, we have the framework of our surveillance system fully built, the next part covers the deployment of this system. Happy reading and stay tuned. 

Was the article useful?

Thank you for your feedback!