Hello and welcome to this Kaggle tutorial on how to build a model for face mask detection using Python and Machine Learning.
Indicated by the project name itself, the overarching objective of this tutorial is pretty simple: Given an input image, our face mask detection model should be able to detect if a person is wearing a face mask or not with a good amount of accuracy.
To successfully complete this project, there are three major parts we need to think about:
- Part 1: Create a training dataset – We should be able to create a training dataset of face images with proper bounding boxes of human faces and annotations indicating whether the person is wearing a face mask or not.
- Part 2: Train an image classification model – We should be able to create an image classification model like a Convolutional Neural Network for face mask detection. The accuracy of detection heavily relies on the type and quality of the model we will be building.
- Part 3: Make predictions – We should be able to detect faces on images and make predictions on whether or not the person is wearing a face mask using our trained image classification model.
So, with all of that in mind, let us first start by getting the dataset and downloading the codebase.
Getting Started with Face Mask Detection
The dataset used for this tutorial is publicly available on Kaggle and you can download the dataset from here: https://www.kaggle.com/wobotintelligence/face-mask-detection-dataset
Please make sure to keep the data files in the data/
directory of this project. There are multiple files in the dataset, however, we only need the files from /train.csv
and /Medical mask/Medical mask/Medical Mask/images/
folder for this project.
Thanks to Ayushi Mishra for publishing the original version of this notebook. You can view the original notebook here: https://www.kaggle.com/ayushimishra2809/face-mask-detection
Importing necessary libraries for Face Mask Detection
Let us start by importing the necessary libraries used in this face mask detection project.
# Common Python libraries import numpy as np import pandas as pd import os import matplotlib.pyplot as plt import matplotlib.patches as patches # For reading in images and image manipulation import cv2 # For label encoding the target variable from sklearn.preprocessing import LabelEncoder # For tensor based operations from tensorflow.keras.utils import to_categorical, normalize # For Machine Learning from tensorflow.keras.layers import Flatten, Dense, Conv2D, MaxPooling2D, Dropout from tensorflow.keras.models import Sequential from tensorflow.keras.optimizers import Adam # For face detection from mtcnn.mtcnn import MTCNN
Part 1: Create a training dataset
In this part, we will be creating a training dataset for training an image classification model.
The train.csv
file contains information about images such as the image name, coordinates for bounding boxes of faces as well as the classname
for each bounding box.
# Reading in the csv file train = pd.read_csv("data/train.csv") # Displaying the first five rows train.head()
The data dictionary for the dataset is as follows,
name
: Image filenamex1, x2, y1, y2
: Bounding box coordinatesclassname
: Bounding box label
We can see that the image name has been repeated in multiple rows. This is because a single image can contain multiple bounding boxes with different classnames. Let us have a look at how many total unique image filenames are present in the dataset,
# Total number of unique images len(train["name"].unique())
4326
Getting all images with classname as either face_with_mask
or face_no_mask
in the dataset since classifying these labels is our primary objective for this project.
# classnames to select options = ["face_with_mask", "face_no_mask"] # Select rows that have the classname as either "face_with_mask" or "face_no_mask" train = train[train["classname"].isin(options)].reset_index(drop=True) train.sort_values("name", axis=0, inplace=True)
Let us also look at the distribution of these labels,
# Plotting a bar plot x_axis_val = ["face_with_mask", "face_no_mask"] y_axis_val = train.classname.value_counts() plt.bar(x_axis_val, y_axis_val)
Now, let’s learn how to fetch actual images from the folder /Medical mask/Medical mask/Medical Mask/images/
.
Printing the filenames of some images in the folder.
# Contains images of medical masks images_file_path = "data/Medical mask/Medical mask/Medical Mask/images/" # Fetching all the file names in the image directory image_filenames = os.listdir(images_file_path) # Printing out the first five image names print(image_filenames[:5])
['0001.jpg', '0002.png', '0003.jpg', '0004.jpg', '0005.jpg']
We will not be using all 6024 images in the given folder since some of the images do not have the classnames we have filtered the train
dataframe for. So, we will be using only the images with name
included in the train
dataframe.
Let’s plot a sample image from the filtered train
dataset,
# Getting the full image filepath sample_image_name = train.iloc[0]["name"] sample_image_file_path = images_file_path + sample_image_name # Select rows with the same image name as in the "name" column of the train dataframe sel_df = train[train["name"] == sample_image_name] # Convert all of the available "bbox" values into a list bboxes = sel_df[["x1", "x2", "y1", "y2"]].values.tolist() # Creating a figure and a sub-plot fig, ax = plt.subplots() # Reading in the image as an array img = plt.imread(sample_image_file_path) # Showing the image ax.imshow(img) # Plotting the bounding boxes for box in bboxes: x1, x2, y1, y2 = box # x and y co-ordinates xy = (x1, x2) # Width of box width = y1 - x1 # Height of box height = y2 - x2 rect = patches.Rectangle( xy, width, height, linewidth=2, edgecolor="r", facecolor="none", ) ax.add_patch(rect)
Before we move forward with creating a training dataset, there are some things to consider,
- The resolution (width x height) of images in the training dataset is relatively high. For example, if an image is of size (1280, 720), the number of pixels we would be feeding in the Convolutional Neural Network 1280 x 720 = 921,600 pixels. Training a model using these many pixels will take a lot of time.
- The images have a color depth of 3, that is, Red, Green and Blue. So, the total number of pixels would be 1280 x 720 x 3 = 2,764,800.
Since this is just a practice project and we would want to build an average model quickly, we will do the following image manipulations to decrease our number of features:
- Convert the color depth of the image to 1 by reading in the image as a grayscale image
- Crop out the region covered by the bounding boxes in each image
- Resize the image to be of a size 50 x 50
Creating an array of image arrays and their labels,
img_size = 50 data = [] for index, row in train.iterrows(): # Single row name, x1, x2, y1, y2, classname = row.values # Full file path full_file_path = images_file_path + name # Reading in the image array as a grayscale image img_array = cv2.imread(full_file_path, cv2.IMREAD_GRAYSCALE) # Selecting the portion covered by the bounding box crop_image = img_array[x2:y2, x1:y1] # Resizing the image new_img_array = cv2.resize(crop_image, (img_size, img_size)) # Appending the arrays into a data variable along with bounding box data.append([new_img_array, classname]) # Plotting one of the images after pre-processing plt.imshow(data[0][0], cmap="gray")
Let’s separate out the independent variables x
with the dependent variable y
,
# Initializing an empty list for features (independent variables) x = [] # Initializing an empty list for labels (dependent variable) y = [] for features, labels in data: x.append(features) y.append(labels)
Next, performing some data pre-processing,
# Reshaping the feature array (Number of images, IMG_SIZE, IMG_SIZE, Color depth) x = np.array(x).reshape(-1, 50, 50, 1) # Normalizing x = normalize(x, axis=1) # Label encoding y lbl = LabelEncoder() y = lbl.fit_transform(y) # Converting it into a categorical variable y = to_categorical(y)
Part 2: Training an Image Classification Model
In this part, we will be building and training an image classification model and more specifically, a convolutional neural network for face mask detection.
The architecture of the Convolutional Neural Network we will be building is as follows,
Before we build and train the model, let us select only the height, width, and color depth for our input layer,
input_img_shape = x.shape[1:] print(input_img_shape)
(50, 50, 1)
Next, creating the CNN architecture using the Sequential model from TensorFlow,
# Initializing a sequential keras model model = Sequential() # Adding a 2D convolution layer model.add( Conv2D( filters=100, kernel_size=(3, 3), use_bias=True, input_shape=input_img_shape, activation="relu", strides=2, ) ) # Adding a max-pooling layer model.add(MaxPooling2D(pool_size=(2, 2))) # Adding a 2D convolution layer - Output Shape = 10 x 10 x 64 model.add(Conv2D(filters=64, kernel_size=(3, 3), use_bias=True, activation="relu")) # Adding a max-pooling layer - Output Shape = 5 x 5 x 64 model.add(MaxPooling2D(pool_size=(2, 2))) # Adding a flatten layer - Output Shape = 5 x 5 x 64 = 1600 model.add(Flatten()) # Adding a dense layer - Output Shape = 50 model.add(Dense(50, activation="relu")) # Adding a dropout model.add(Dropout(0.2)) # Adding a dense layer with softmax activation model.add(Dense(2, activation="softmax")) # Printing the model summary model.summary()
We’ve successfully built our model architecture. Let’s move onto training the model with the below given configuration,
# Initializing an Adam optimizer opt = Adam(lr=1e-3, decay=1e-5) # Configuring the model for training model.compile(optimizer=opt, loss="categorical_crossentropy", metrics=["accuracy"]) # Training the model model.fit(x, y, epochs=30, batch_size=5)
We now have a trained image classification model ready!
Part 3: Making a Prediction
In this part, we will be trying to detect if a person in an image is wearing a face mask or not.
Let us start by reading in a sample image that is out of the training sample images,
# Image file path for sample image test_image_file_path = "sample_test_images/0001.jpg" # Loading in the image img = plt.imread(test_image_file_path) # Showing the image plt.imshow(img)
Now that we’ve read in the image, we must first detect the face(s) in the image and perform the necessary image pre-processing steps.
Thus, for face detection we will be using MTCNN.
Multi-task Cascaded Convolutional Networks (MTCNN) is a framework developed as a solution for both face detection and face alignment. You can learn more about it from this helpful Medium post: https://medium.com/@iselagradilla94/multi-task-cascaded-convolutional-networks-mtcnn-for-face-detection-and-facial-landmark-alignment-7c21e8007923
# Initializing the detector detector = MTCNN() # Detecting the faces in the image faces = detector.detect_faces(img) print(faces)
[{'box': [300, 137, 326, 399], 'confidence': 0.998160183429718, 'keypoints': {'left_eye': (398, 307), 'right_eye': (535, 297), 'nose': (470, 369), 'mouth_left': (421, 435), 'mouth_right': (544, 424)}}]
Next, performing image pre-processing,
# Reading in the image as a grayscale image img_array = cv2.imread(test_image_file_path, cv2.IMREAD_GRAYSCALE) # Initializing the detector detector = MTCNN() # Detecting the faces in the image faces = detector.detect_faces(img) # Getting the values for bounding box x1, x2, width, height = faces[0]["box"] # Selecting the portion covered by the bounding box crop_image = img_array[x2 : x2 + height, x1 : x1 + width] # Resizing the image new_img_array = cv2.resize(crop_image, (img_size, img_size)) # Plotting the image plt.imshow(new_img_array, cmap="gray")
Some more pre-processing,
# Reshaping the image x = new_img_array.reshape(-1, 50, 50, 1) # Normalizing x = normalize(x, axis=1)
Finally, let us make a prediction.
prediction = model.predict(x) print(prediction)
[[0.01 0.99]]
Interpreting these predictions,
- If the probability value at index 0 is greater than the probability value at index 1, the classification is “face_no_mask” since we had assigned [1., 0.] as “face_no_mask” during training.
- If the probability value at index 1 is greater than the probability value at index 0, the classification is “face_with_mask”.since we had assigned [0., 1.] as “face_with_mask” during training.
We can also use the np.argmax() method to find the index with the highest probability value,
# Returns the index of the maximum value np.argmax(prediction)
1
Coding exercise for you
Since you now know how to build a binary image classifier, you can now perform the following tasks on your own by extending the concepts you’ve learned:
- Create a multi-classification model using the above dataset by keeping all classnames, that is, [‘face_with_mask’, ‘mask_colorful’, ‘face_no_mask’, ‘face_with_mask_incorrect’, ‘mask_surgical’, ‘face_other_covering’, ‘scarf_bandana’, ‘eyeglasses’, ‘helmet’, ‘face_shield’, ‘sunglasses’, ‘hood’, ‘hat’, ‘goggles’, ‘hair_net’, ‘hijab_niqab’, ‘other’, ‘gas_mask’, ‘balaclava_ski_mask’, ‘turban’].
- Analyze the probabilities of predictions by using images that fall in multiple classes.
If you want to join the TCR community on Slack and meet other students, please feel free to sign up through this link: https://www.theclickreader.com/join-our-community/
Do you want to learn Python, Data Science, and Machine Learning while getting certified? Here are some best selling Datacamp courses that we recommend you enroll in:
- Introduction to Python (Free Course) - 1,000,000+ students already enrolled!
- Introduction to Data Science in Python- 400,000+ students already enrolled!
- Introduction to TensorFlow for Deep Learning with Python - 90,000+ students already enrolled!
- Data Science and Machine Learning Bootcamp with R - 70,000+ students already enrolled!