Face Mask Detection using Python and ML - Kaggle Tutorials

Table of Contents
Hello and welcome to this Kaggle tutorial on how to build a model for face mask detection using Python and Machine Learning.

Face Mask Detection using Python - Kaggle Tutorials

Indicated by the project name itself, the overarching objective of this tutorial is pretty simple: Given an input image, our face mask detection model should be able to detect if a person is wearing a face mask or not with a good amount of accuracy.

To successfully complete this project, there are three major parts we need to think about:

  1. Part 1: Create a training dataset - We should be able to create a training dataset of face images with proper bounding boxes of human faces and annotations indicating whether the person is wearing a face mask or not.
  2. Part 2: Train an image classification model - We should be able to create an image classification model like a Convolutional Neural Network for face mask detection. The accuracy of detection heavily relies on the type and quality of the model we will be building.
  3. Part 3: Make predictions - We should be able to detect faces on images and make predictions on whether or not the person is wearing a face mask using our trained image classification model.

So, with all of that in mind, let us first start by getting the dataset and downloading the codebase.

Getting Started with Face Mask Detection

The dataset used for this tutorial is publicly available on Kaggle and you can download the dataset from here: https://www.kaggle.com/wobotintelligence/face-mask-detection-dataset

Please make sure to keep the data files in the data/ directory of this project. There are multiple files in the dataset, however, we only need the files from /train.csv and /Medical mask/Medical mask/Medical Mask/images/ folder for this project.

Thanks to Ayushi Mishra for publishing the original version of this notebook. You can view the original notebook here: https://www.kaggle.com/ayushimishra2809/face-mask-detection

Importing necessary libraries for Face Mask Detection

Let us start by importing the necessary libraries used in this face mask detection project.

# Common Python libraries
import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
import matplotlib.patches as patches

# For reading in images and image manipulation
import cv2

# For label encoding the target variable
from sklearn.preprocessing import LabelEncoder

# For tensor based operations
from tensorflow.keras.utils import to_categorical, normalize

# For Machine Learning
from tensorflow.keras.layers import Flatten, Dense, Conv2D, MaxPooling2D, Dropout
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam

# For face detection
from mtcnn.mtcnn import MTCNN

Part 1: Create a training dataset

In this part, we will be creating a training dataset for training an image classification model.

The train.csv file contains information about images such as the image name, coordinates for bounding boxes of faces as well as the classname for each bounding box.

# Reading in the csv file
train = pd.read_csv("data/train.csv")

# Displaying the first five rows
train.head()

The data dictionary for the dataset is as follows,

  • name: Image filename
  • x1, x2, y1, y2: Bounding box coordinates
  • classname: Bounding box label
Example image with bounding box - Face Mask Detection using Python

We can see that the image name has been repeated in multiple rows. This is because a single image can contain multiple bounding boxes with different classnames. Let us have a look at how many total unique image filenames are present in the dataset,

# Total number of unique images

Getting all images with classname as either face_with_mask or face_no_mask in the dataset since classifying these labels is our primary objective for this project.

# classnames to select
options = ["face_with_mask", "face_no_mask"]

# Select rows that have the classname as either "face_with_mask" or "face_no_mask"
train = train[train["classname"].isin(options)].reset_index(drop=True)
train.sort_values("name", axis=0, inplace=True)

Let us also look at the distribution of these labels,

# Plotting a bar plot
x_axis_val = ["face_with_mask", "face_no_mask"]
y_axis_val = train.classname.value_counts()
plt.bar(x_axis_val, y_axis_val)
Distribution of labels - Face Mask Detection using Python

Now, let's learn how to fetch actual images from the folder /Medical mask/Medical mask/Medical Mask/images/.

Printing the filenames of some images in the folder.

# Contains images of medical masks
images_file_path = "data/Medical mask/Medical mask/Medical Mask/images/"

# Fetching all the file names in the image directory
image_filenames = os.listdir(images_file_path)

# Printing out the first five image names
['0001.jpg', '0002.png', '0003.jpg', '0004.jpg', '0005.jpg']

We will not be using all 6024 images in the given folder since some of the images do not have the classnames we have filtered the train dataframe for. So, we will be using only the images with name included in the train dataframe.

Let's plot a sample image from the filtered train dataset,

# Getting the full image filepath
sample_image_name = train.iloc[0]["name"]
sample_image_file_path = images_file_path + sample_image_name

# Select rows with the same image name as in the "name" column of the train dataframe
sel_df = train[train["name"] == sample_image_name]

# Convert all of the available "bbox" values into a list
bboxes = sel_df[["x1", "x2", "y1", "y2"]].values.tolist()

# Creating a figure and a sub-plot
fig, ax = plt.subplots()

# Reading in the image as an array
img = plt.imread(sample_image_file_path)

# Showing the image

# Plotting the bounding boxes
for box in bboxes:

    x1, x2, y1, y2 = box

    # x and y co-ordinates
    xy = (x1, x2)

    # Width of box
    width = y1 - x1

    # Height of box
    height = y2 - x2

    rect = patches.Rectangle(

Image with bounding box - Face Mask Detection using Python

Before we move forward with creating a training dataset, there are some things to consider,

  • The resolution (width x height) of images in the training dataset is relatively high. For example, if an image is of size (1280, 720), the number of pixels we would be feeding in the Convolutional Neural Network 1280 x 720 = 921,600 pixels. Training a model using these many pixels will take a lot of time.
  • The images have a color depth of 3, that is, Red, Green and Blue. So, the total number of pixels would be 1280 x 720 x 3 = 2,764,800.

Since this is just a practice project and we would want to build an average model quickly, we will do the following image manipulations to decrease our number of features:

  • Convert the color depth of the image to 1 by reading in the image as a grayscale image
  • Crop out the region covered by the bounding boxes in each image
  • Resize the image to be of a size 50 x 50

Creating an array of image arrays and their labels,

img_size = 50
data = []

for index, row in train.iterrows():

    # Single row
    name, x1, x2, y1, y2, classname = row.values

    # Full file path
    full_file_path = images_file_path + name

    # Reading in the image array as a grayscale image
    img_array = cv2.imread(full_file_path, cv2.IMREAD_GRAYSCALE)

    # Selecting the portion covered by the bounding box
    crop_image = img_array[x2:y2, x1:y1]

    # Resizing the image
    new_img_array = cv2.resize(crop_image, (img_size, img_size))

    # Appending the arrays into a data variable along with bounding box
    data.append([new_img_array, classname])

# Plotting one of the images after pre-processing
plt.imshow(data[0][0], cmap="gray")
Cropped Grayscale image - Face Mask Detection using Python

Let's separate out the independent variables x with the dependent variable y,

# Initializing an empty list for features (independent variables)
x = []

# Initializing an empty list for labels (dependent variable)
y = []

for features, labels in data:

Next, performing some data pre-processing,

# Reshaping the feature array (Number of images, IMG_SIZE, IMG_SIZE, Color depth)
x = np.array(x).reshape(-1, 50, 50, 1)

# Normalizing
x = normalize(x, axis=1)

# Label encoding y
lbl = LabelEncoder()
y = lbl.fit_transform(y)

# Converting it into a categorical variable
y = to_categorical(y)

Part 2: Training an Image Classification Model

In this part, we will be building and training an image classification model and more specifically, a convolutional neural network for face mask detection.

The architecture of the Convolutional Neural Network we will be building is as follows,

Convolutional Neural Network (CNN) Architecture - Face Mask Detection

Before we build and train the model, let us select only the height, width, and color depth for our input layer,

input_img_shape = x.shape[1:]
(50, 50, 1)

Next, creating the CNN architecture using the Sequential model from TensorFlow,

# Initializing a sequential keras model
model = Sequential()

# Adding a 2D convolution layer
        kernel_size=(3, 3),

# Adding a max-pooling layer
model.add(MaxPooling2D(pool_size=(2, 2)))

# Adding a 2D convolution layer - Output Shape = 10 x 10 x 64
model.add(Conv2D(filters=64, kernel_size=(3, 3), use_bias=True, activation="relu"))

# Adding a max-pooling layer - Output Shape = 5 x 5 x 64
model.add(MaxPooling2D(pool_size=(2, 2)))

# Adding a flatten layer - Output Shape = 5 x 5 x 64 = 1600

# Adding a dense layer - Output Shape = 50
model.add(Dense(50, activation="relu"))

# Adding a dropout

# Adding a dense layer with softmax activation
model.add(Dense(2, activation="softmax"))

# Printing the model summary
Convolutional Neural Network (CNN) Model Summary - Face Mask Detection using Python

We've successfully built our model architecture. Let's move onto training the model with the below given configuration,

# Initializing an Adam optimizer
opt = Adam(lr=1e-3, decay=1e-5)

# Configuring the model for training
model.compile(optimizer=opt, loss="categorical_crossentropy", metrics=["accuracy"])

# Training the model
model.fit(x, y, epochs=30, batch_size=5)
Model Training Results - Face Mask Detection using Python

We now have a trained image classification model ready!

Part 3: Making a Prediction

In this part, we will be trying to detect if a person in an image is wearing a face mask or not.

Let us start by reading in a sample image that is out of the training sample images,

# Image file path for sample image
test_image_file_path = "sample_test_images/0001.jpg"

# Loading in the image
img = plt.imread(test_image_file_path)

# Showing the image

Now that we've read in the image, we must first detect the face(s) in the image and perform the necessary image pre-processing steps.

Thus, for face detection we will be using MTCNN.

Multi-task Cascaded Convolutional Networks (MTCNN) is a framework developed as a solution for both face detection and face alignment. You can learn more about it from this helpful Medium post: https://medium.com/@iselagradilla94/multi-task-cascaded-convolutional-networks-mtcnn-for-face-detection-and-facial-landmark-alignment-7c21e8007923

# Initializing the detector
detector = MTCNN()

# Detecting the faces in the image
faces = detector.detect_faces(img)

[{'box': [300, 137, 326, 399], 'confidence': 0.998160183429718, 'keypoints': {'left_eye': (398, 307), 'right_eye': (535, 297), 'nose': (470, 369), 'mouth_left': (421, 435), 'mouth_right': (544, 424)}}]

Next, performing image pre-processing,

# Reading in the image as a grayscale image
img_array = cv2.imread(test_image_file_path, cv2.IMREAD_GRAYSCALE)

# Initializing the detector
detector = MTCNN()

# Detecting the faces in the image
faces = detector.detect_faces(img)

# Getting the values for bounding box
x1, x2, width, height = faces[0]["box"]

# Selecting the portion covered by the bounding box
crop_image = img_array[x2 : x2 + height, x1 : x1 + width]

# Resizing the image
new_img_array = cv2.resize(crop_image, (img_size, img_size))

# Plotting the image
plt.imshow(new_img_array, cmap="gray")

Some more pre-processing,

# Reshaping the image
x = new_img_array.reshape(-1, 50, 50, 1)

# Normalizing
x = normalize(x, axis=1)

Finally, let us make a prediction.

prediction = model.predict(x)
[[0.01 0.99]]

Interpreting these predictions,

  • If the probability value at index 0 is greater than the probability value at index 1, the classification is "face_no_mask" since we had assigned [1., 0.] as "face_no_mask" during training.
  • If the probability value at index 1 is greater than the probability value at index 0, the classification is "face_with_mask".since we had assigned [0., 1.] as "face_with_mask" during training.

We can also use the np.argmax() method to find the index with the highest probability value,

# Returns the index of the maximum value

Coding exercise for you

Since you now know how to build a binary image classifier, you can now perform the following tasks on your own by extending the concepts you've learned:

  • Create a multi-classification model using the above dataset by keeping all classnames, that is, ['face_with_mask', 'mask_colorful', 'face_no_mask', 'face_with_mask_incorrect', 'mask_surgical', 'face_other_covering', 'scarf_bandana', 'eyeglasses', 'helmet', 'face_shield', 'sunglasses', 'hood', 'hat', 'goggles', 'hair_net', 'hijab_niqab', 'other', 'gas_mask', 'balaclava_ski_mask', 'turban'].
  • Analyze the probabilities of predictions by using images that fall in multiple classes.

Face Mask Detection using Python and ML - Kaggle Tutorials

