The Convolution OperationNovember 24, 2020 2020-12-03 21:17
The Convolution Operation
The Convolution Operation
The convolution operation is the fundamental algorithmic backbone of a Convolutional Neural Network (CNN).
The convolution operation takes in two tensors of the same size as input and outputs the total sum of the element-wise multiplication of these two tensors. This can be better understood using the following notation-based example:
Here, the dimension of both the input tensors is 3×3 and thus, convolution is possible. Each element of the first tensor () is multiplied by each element of the second tensor () and added together to get the final output tensor.
How is the convolution operation implemented on an image in a Convolutional Neural Network?
In a Convolutional Neural Network, the convolution operation is often performed between an image tensor having fairly large dimensions (e.g. 256×256, 312×312, etc.) and a ‘filter’ or ‘kernel’ having a fairly smaller dimension (e.g. 3×3, 5×5, etc.).
This is done so in order to understand the importance of individual pixel values in relation to their neighbouring pixels. For example, focusing a 3×3 filter (‘kernel’) on a top-left part of a 512×512 image allows the neural network to understand the features of the image at the top-left 3×3 region of the image.
The filter can then be repetitively shifted to different parts of the image to understand different regions of the image. This ‘understanding’ part is dependent on the actual elements of the 3×3 filter and the convolution operation’s output is responsible for the alteration of the 3×3 filter values.
So, moving forward in this chapter, we will be focusing on understanding how a convolution operation can be carried out between a small-sized kernel and a large-sized image tensor.
Consider is an image tensor with a dimension size of 4×4 and is a kernel/filter with a dimension size of 3×3. Let the elements of both these tensors be as follows:
For the convolution operation to be possible between these two tensors, we will need to have two equal-sized tensors as input. Thus, we will repeatedly select a subset tensor from such the dimension of each subset is equal to the dimension of the kernel.
Generally, the subset selection is done starting from the top-left position of the tensor and ends in the bottom-right position. The convolution operation is then performed between these subsets and the kernel.
Let us see this in action. Selecting the first subset tensor having a dimension of 3×3,
Now, we have two 3×3 tensors ( and ) and the convolution operation can be performed. Let the output of the convolution operation be assigned as .
Next, we repeat the same process again for a second-subset of the tensor by shifting to the right by one pixel, i.e., a stride of 1.
We have again obtained a 3×3 subset of the tensor . Applying the convolution operation with kernel ,
There are still some elements at the bottom row which haven’t been convolved. So, we take a stride of 1 to the bottom and start again from the left hand side.
And, we again shift to the right with a stride of 1.
We’ve successfully performed the convolution operation to the entire image tensor ! Now, the final step is to get all of these outputs together in a single tensor as follows:
The following GIF will give you a better intuition as to how convolution operation is performed above. The left-hand side shows the convolution kernel sliding over the image whereas the right-hand side shows the result of the convolution.
Thus, by following the processes shown in this chapter, we can easily apply the convolution operation between any kernel and any image tensor .