The Convolution Operation

Table of Contents

[latexpage]

The convolution operation is the fundamental algorithmic backbone of a Convolutional Neural Network (CNN).

The convolution operation takes in two tensors of the same size as input and outputs the total sum of the element-wise multiplication of these two tensors. This can be better understood using the following notation-based example:

$$ \begin{pmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{pmatrix} * \begin{pmatrix} b_{11} & b_{12} & b_{13} \\ b_{21} & b_{22} & b_{23} \\ b_{31} & b_{32} & b_{33} \end{pmatrix} = \sum_{i=1}^{3}\sum_{j=1}^{3}a_{ij}b_{ij} $$

Here, the dimension of both the input tensors is 3x3 and thus, convolution is possible. Each element of the first tensor ($a_{ij}$) is multiplied by each element of the second tensor ($b_{ij}$) and added together to get the final output tensor.

How is the convolution operation implemented on an image in a Convolutional Neural Network?

In a Convolutional Neural Network, the convolution operation is often performed between an image tensor having fairly large dimensions (e.g. 256x256, 312x312, etc.) and a 'filter' or 'kernel' having a fairly smaller dimension (e.g. 3x3, 5x5, etc.).

This is done so in order to understand the importance of individual pixel values in relation to their neighbouring pixels. For example, focusing a 3x3 filter ('kernel') on a top-left part of a 512x512 image allows the neural network to understand the features of the image at the top-left 3x3 region of the image.

The filter can then be repetitively shifted to different parts of the image to understand different regions of the image. This 'understanding' part is dependent on the actual elements of the 3x3 filter and the convolution operation's output is responsible for the alteration of the 3x3 filter values.

So, moving forward in this chapter, we will be focusing on understanding how a convolution operation can be carried out between a small-sized kernel and a large-sized image tensor.

Consider $A$ is an image tensor with a dimension size of 4x4 and $K$ is a kernel/filter with a dimension size of 3x3. Let the elements of both these tensors be as follows:

$$ A = \begin{pmatrix} a_{11} & a_{12} & a_{13} & a_{14} \\ a_{21} & a_{22} & a_{23} & a_{24} \\ a_{31} & a_{32} & a_{33} & a_{34} \\ a_{41} & a_{42} & a_{43} & a_{44} \end{pmatrix}$$

and,

$$ K = \begin{pmatrix} k_{11} & k_{12} & k_{13} \\ k_{21} & k_{22} & k_{23} \\ k_{31} & k_{32} & k_{33} \end{pmatrix} $$

For the convolution operation to be possible between these two tensors, we will need to have two equal-sized tensors as input. Thus, we will repeatedly select a subset tensor from $A$ such the dimension of each subset is equal to the dimension of the kernel.

Generally, the subset selection is done starting from the top-left position of the tensor and ends in the bottom-right position. The convolution operation is then performed between these subsets and the kernel.

Let us see this in action. Selecting the first subset tensor having a dimension of 3x3,

$$ A_1 = \begin{pmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{pmatrix}$$

Now, we have two 3x3 tensors ($A_1$ and $K$) and the convolution operation can be performed. Let the output of the convolution operation be assigned as $o_1$.

$$ o_1 = A_1 * K = a_{11}k_{11}\ +\ a_{12}k_{12}\ +\ a_{13}k_{13}\ +\ a_{21}k_{21}\ +\ a_{22}k_{22}\ +\ a_{23}k_{23}\ +\ a_{31}k_{31}\ +\ a_{32}k_{32}\ +\ a_{33}k_{33} $$

Next, we repeat the same process again for a second-subset of the tensor $A$ by shifting to the right by one pixel, i.e., a stride of 1.

$$ A_2 = \begin{pmatrix} a_{12} & a_{13} & a_{14} \\ a_{22} & a_{23} & a_{24} \\ a_{32} & a_{33} & a_{34} \end{pmatrix}$$

We have again obtained a 3x3 subset of the tensor $A$. Applying the convolution operation with kernel $K$,

$$ o_2 = A_2 * K = a_{12}k_{11}\ +\ a_{13}k_{12}\ +\ a_{14}k_{13}\ +\ a_{22}k_{21}\ +\ a_{23}k_{22}\ +\ a_{24}k_{23}\ +\ a_{32}k_{31}\ +\ a_{33}k_{32}\ +\ a_{34}k_{33} $$

There are still some elements at the bottom row which haven't been convolved. So, we take a stride of 1 to the bottom and start again from the left hand side.

$$ A_3 = \begin{pmatrix} a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \\ a_{41} & a_{42} & a_{43} \end{pmatrix}$$

$$ o_3 = A_3 * K = a_{21}k_{11}\ +\ a_{22}k_{12}\ +\ a_{23}k_{13}\ +\ a_{31}k_{21}\ +\ a_{32}k_{22}\ +\ a_{33}k_{23}\ +\ a_{41}k_{31}\ +\ a_{42}k_{32}\ +\ a_{43}k_{33} $$

And, we again shift to the right with a stride of 1.

$$ A_4 = \begin{pmatrix} a_{22} & a_{23} & a_{24} \\ a_{32} & a_{33} & a_{34} \\ a_{42} & a_{43} & a_{44} \end{pmatrix}$$

$$ o_4 = A_4 * K = a_{22}k_{11}\ +\ a_{23}k_{12}\ +\ a_{24}k_{13}\ +\ a_{32}k_{21}\ +\ a_{33}k_{22}\ +\ a_{34}k_{23}\ +\ a_{42}k_{31}\ +\ a_{43}k_{32}\ +\ a_{44}k_{33} $$

We've successfully performed the convolution operation to the entire image tensor $A$! Now, the final step is to get all of these outputs together in a single tensor as follows:

$$ O = \begin{pmatrix} o_1 & o_2 \\ o_3 & o_4 \end{pmatrix} $$

The following GIF will give you a better intuition as to how convolution operation is performed above. The left-hand side shows the convolution kernel sliding over the image whereas the right-hand side shows the result of the convolution.

Thus, by following the processes shown in this chapter, we can easily apply the convolution operation between any kernel $K$ and any image tensor $A$.

Do you want to learn Python, Data Science, and Machine Learning while getting certified? Here are some best selling Datacamp courses that we recommend you enroll in:

Introduction to Python (Free Course) - 1,000,000+ students already enrolled!
Introduction to Data Science in Python- 400,000+ students already enrolled!
Introduction to TensorFlow for Deep Learning with Python - 90,000+ students already enrolled!
Data Science and Machine Learning Bootcamp with R - 70,000+ students already enrolled!

Written by

The Click Reader

At The Click Reader, we are committed to empowering individuals with the tools and knowledge needed to excel in the ever-evolving field of data science. Our sole focus is delivering a world-class data science bootcamp that transforms beginners and upskillers into industry-ready professionals.

The Convolution Operation

How is the convolution operation implemented on an image in a Convolutional Neural Network?

Related Articles

Natural Language Processing (NLP) for Beginners using NLTK

The Pooling Operation

Logistic Regression in TensorFlow 2.0

Plotting 2D Plots in Matplotlib

Interested In Data Science Bootcamp?
Request more info now.

The Convolution Operation

How is the convolution operation implemented on an image in a Convolutional Neural Network?

Related Articles

Natural Language Processing (NLP) for Beginners using NLTK

The Pooling Operation

Logistic Regression in TensorFlow 2.0

Plotting 2D Plots in Matplotlib

Interested In Data Science Bootcamp?Request more info now.

Interested In Data Science Bootcamp?
Request more info now.