Hello and welcome to this Kaggle tutorial on how to build a model for face mask detection using Python and Machine Learning.

Indicated by the project name itself, the overarching objective of this tutorial is pretty simple: Given an input image, our face mask detection model should be able to detect if a person is wearing a face mask or not with a good amount of accuracy.
To successfully complete this project, there are three major parts we need to think about:
So, with all of that in mind, let us first start by getting the dataset and downloading the codebase.
The dataset used for this tutorial is publicly available on Kaggle and you can download the dataset from here: https://www.kaggle.com/wobotintelligence/face-mask-detection-dataset
Please make sure to keep the data files in the data/ directory of this project. There are multiple files in the dataset, however, we only need the files from /train.csv and /Medical mask/Medical mask/Medical Mask/images/ folder for this project.
Thanks to Ayushi Mishra for publishing the original version of this notebook. You can view the original notebook here: https://www.kaggle.com/ayushimishra2809/face-mask-detection
Let us start by importing the necessary libraries used in this face mask detection project.
# Common Python libraries import numpy as np import pandas as pd import os import matplotlib.pyplot as plt import matplotlib.patches as patches # For reading in images and image manipulation import cv2 # For label encoding the target variable from sklearn.preprocessing import LabelEncoder # For tensor based operations from tensorflow.keras.utils import to_categorical, normalize # For Machine Learning from tensorflow.keras.layers import Flatten, Dense, Conv2D, MaxPooling2D, Dropout from tensorflow.keras.models import Sequential from tensorflow.keras.optimizers import Adam # For face detection from mtcnn.mtcnn import MTCNN
In this part, we will be creating a training dataset for training an image classification model.
The train.csv file contains information about images such as the image name, coordinates for bounding boxes of faces as well as the classname for each bounding box.
# Reading in the csv file
train = pd.read_csv("data/train.csv")
# Displaying the first five rows
train.head()

The data dictionary for the dataset is as follows,
name: Image filenamex1, x2, y1, y2: Bounding box coordinatesclassname: Bounding box label
We can see that the image name has been repeated in multiple rows. This is because a single image can contain multiple bounding boxes with different classnames. Let us have a look at how many total unique image filenames are present in the dataset,
# Total number of unique images len(train["name"].unique())
4326
Getting all images with classname as either face_with_mask or face_no_mask in the dataset since classifying these labels is our primary objective for this project.
# classnames to select
options = ["face_with_mask", "face_no_mask"]
# Select rows that have the classname as either "face_with_mask" or "face_no_mask"
train = train[train["classname"].isin(options)].reset_index(drop=True)
train.sort_values("name", axis=0, inplace=True)
Let us also look at the distribution of these labels,
# Plotting a bar plot x_axis_val = ["face_with_mask", "face_no_mask"] y_axis_val = train.classname.value_counts() plt.bar(x_axis_val, y_axis_val)

Now, let's learn how to fetch actual images from the folder /Medical mask/Medical mask/Medical Mask/images/.
Printing the filenames of some images in the folder.
# Contains images of medical masks images_file_path = "data/Medical mask/Medical mask/Medical Mask/images/" # Fetching all the file names in the image directory image_filenames = os.listdir(images_file_path) # Printing out the first five image names print(image_filenames[:5])
['0001.jpg', '0002.png', '0003.jpg', '0004.jpg', '0005.jpg']
We will not be using all 6024 images in the given folder since some of the images do not have the classnames we have filtered the train dataframe for. So, we will be using only the images with name included in the train dataframe.
Let's plot a sample image from the filtered train dataset,
# Getting the full image filepath
sample_image_name = train.iloc[0]["name"]
sample_image_file_path = images_file_path + sample_image_name
# Select rows with the same image name as in the "name" column of the train dataframe
sel_df = train[train["name"] == sample_image_name]
# Convert all of the available "bbox" values into a list
bboxes = sel_df[["x1", "x2", "y1", "y2"]].values.tolist()
# Creating a figure and a sub-plot
fig, ax = plt.subplots()
# Reading in the image as an array
img = plt.imread(sample_image_file_path)
# Showing the image
ax.imshow(img)
# Plotting the bounding boxes
for box in bboxes:
x1, x2, y1, y2 = box
# x and y co-ordinates
xy = (x1, x2)
# Width of box
width = y1 - x1
# Height of box
height = y2 - x2
rect = patches.Rectangle(
xy,
width,
height,
linewidth=2,
edgecolor="r",
facecolor="none",
)
ax.add_patch(rect)

Before we move forward with creating a training dataset, there are some things to consider,
Since this is just a practice project and we would want to build an average model quickly, we will do the following image manipulations to decrease our number of features:
Creating an array of image arrays and their labels,
img_size = 50
data = []
for index, row in train.iterrows():
# Single row
name, x1, x2, y1, y2, classname = row.values
# Full file path
full_file_path = images_file_path + name
# Reading in the image array as a grayscale image
img_array = cv2.imread(full_file_path, cv2.IMREAD_GRAYSCALE)
# Selecting the portion covered by the bounding box
crop_image = img_array[x2:y2, x1:y1]
# Resizing the image
new_img_array = cv2.resize(crop_image, (img_size, img_size))
# Appending the arrays into a data variable along with bounding box
data.append([new_img_array, classname])
# Plotting one of the images after pre-processing
plt.imshow(data[0][0], cmap="gray")

Let's separate out the independent variables x with the dependent variable y,
# Initializing an empty list for features (independent variables)
x = []
# Initializing an empty list for labels (dependent variable)
y = []
for features, labels in data:
x.append(features)
y.append(labels)
Next, performing some data pre-processing,
# Reshaping the feature array (Number of images, IMG_SIZE, IMG_SIZE, Color depth) x = np.array(x).reshape(-1, 50, 50, 1) # Normalizing x = normalize(x, axis=1) # Label encoding y lbl = LabelEncoder() y = lbl.fit_transform(y) # Converting it into a categorical variable y = to_categorical(y)
In this part, we will be building and training an image classification model and more specifically, a convolutional neural network for face mask detection.
The architecture of the Convolutional Neural Network we will be building is as follows,

Before we build and train the model, let us select only the height, width, and color depth for our input layer,
input_img_shape = x.shape[1:] print(input_img_shape)
(50, 50, 1)
Next, creating the CNN architecture using the Sequential model from TensorFlow,
# Initializing a sequential keras model
model = Sequential()
# Adding a 2D convolution layer
model.add(
Conv2D(
filters=100,
kernel_size=(3, 3),
use_bias=True,
input_shape=input_img_shape,
activation="relu",
strides=2,
)
)
# Adding a max-pooling layer
model.add(MaxPooling2D(pool_size=(2, 2)))
# Adding a 2D convolution layer - Output Shape = 10 x 10 x 64
model.add(Conv2D(filters=64, kernel_size=(3, 3), use_bias=True, activation="relu"))
# Adding a max-pooling layer - Output Shape = 5 x 5 x 64
model.add(MaxPooling2D(pool_size=(2, 2)))
# Adding a flatten layer - Output Shape = 5 x 5 x 64 = 1600
model.add(Flatten())
# Adding a dense layer - Output Shape = 50
model.add(Dense(50, activation="relu"))
# Adding a dropout
model.add(Dropout(0.2))
# Adding a dense layer with softmax activation
model.add(Dense(2, activation="softmax"))
# Printing the model summary
model.summary()

We've successfully built our model architecture. Let's move onto training the model with the below given configuration,
# Initializing an Adam optimizer opt = Adam(lr=1e-3, decay=1e-5) # Configuring the model for training model.compile(optimizer=opt, loss="categorical_crossentropy", metrics=["accuracy"]) # Training the model model.fit(x, y, epochs=30, batch_size=5)

We now have a trained image classification model ready!
In this part, we will be trying to detect if a person in an image is wearing a face mask or not.
Let us start by reading in a sample image that is out of the training sample images,
# Image file path for sample image test_image_file_path = "sample_test_images/0001.jpg" # Loading in the image img = plt.imread(test_image_file_path) # Showing the image plt.imshow(img)
Now that we've read in the image, we must first detect the face(s) in the image and perform the necessary image pre-processing steps.
Thus, for face detection we will be using MTCNN.
Multi-task Cascaded Convolutional Networks (MTCNN) is a framework developed as a solution for both face detection and face alignment. You can learn more about it from this helpful Medium post: https://medium.com/@iselagradilla94/multi-task-cascaded-convolutional-networks-mtcnn-for-face-detection-and-facial-landmark-alignment-7c21e8007923
# Initializing the detector detector = MTCNN() # Detecting the faces in the image faces = detector.detect_faces(img) print(faces)
[{'box': [300, 137, 326, 399], 'confidence': 0.998160183429718, 'keypoints': {'left_eye': (398, 307), 'right_eye': (535, 297), 'nose': (470, 369), 'mouth_left': (421, 435), 'mouth_right': (544, 424)}}]
Next, performing image pre-processing,
# Reading in the image as a grayscale image img_array = cv2.imread(test_image_file_path, cv2.IMREAD_GRAYSCALE) # Initializing the detector detector = MTCNN() # Detecting the faces in the image faces = detector.detect_faces(img) # Getting the values for bounding box x1, x2, width, height = faces[0]["box"] # Selecting the portion covered by the bounding box crop_image = img_array[x2 : x2 + height, x1 : x1 + width] # Resizing the image new_img_array = cv2.resize(crop_image, (img_size, img_size)) # Plotting the image plt.imshow(new_img_array, cmap="gray")
Some more pre-processing,
# Reshaping the image x = new_img_array.reshape(-1, 50, 50, 1) # Normalizing x = normalize(x, axis=1)
Finally, let us make a prediction.
prediction = model.predict(x) print(prediction)
[[0.01 0.99]]
Interpreting these predictions,
We can also use the np.argmax() method to find the index with the highest probability value,
# Returns the index of the maximum value np.argmax(prediction)
1
Since you now know how to build a binary image classifier, you can now perform the following tasks on your own by extending the concepts you've learned:
If you want to join the TCR community on Slack and meet other students, please feel free to sign up through this link: https://www.theclickreader.com/join-our-community/
Do you want to learn Python, Data Science, and Machine Learning while getting certified? Here are some best selling Datacamp courses that we recommend you enroll in: