In the last lesson, we discussed shifting the convolution filter (kernel) by one pixel at a time, i.e., by a stride of 1. Well, it is not necessary to move a convolution filter across an image by just taking a stride of 1.
Stride is the number of pixels shifts over the input matrix. The stride ($s$) taken during a series of convolution operation can be changed according to the need of the problem. When $s$ = 1, it means that the filter will be shifted by a step of one column of pixel values to the right or one row of pixel values to the bottom. Similarly, when $s$ = 2, it means that the filter will be shifted by a step of two columns of pixel values to the right or two rows of pixel values to the bottom and so on.
However, if you think for a moment, why would anyone want to take a larger stride since the neural network will be ignoring pixel values during computation. There are multiple reasons but here are some major ones:
- Taking a larger stride allows a series of convolution operations to be computed faster for a large dimension image (say, 3000×3000 pixels).
- Lesser memory is needed to store the results of the convolution operation.
- The size of the output tensor can be reduced to make the input to the next layer of a Convolutional Neural Network smaller.
- Since overlapping pixel values are ignored when selecting new regions, overfitting can be avoided.
Finding the size of an output tensor after a series of convolution operations
Generally, in a Convolutional Neural Network, the input image undergoes multiple convolution operations, where each convolution operation might change the size of the input image. In this section you will learn an easy way to find the size on an output tensor after a series of convolution operations.
If $n_{A1}$ x $n_{A2}$ is the size of the input image tensor, $n_K$ x $n_K$ is the size of the convolution filter and $s$ is the value of stride taken, then, the size of the resulting tensor, $n_{O1}$ x $n_{O2}$ (after a series of convolution operation) can be found out using the following formula:
$$n_{O1} = \text{floor}\begin{pmatrix} \dfrac{n_{A1}-n_{K}}{s} + 1 \end{pmatrix}$$
and,
$$n_{O2} = \text{floor}\begin{pmatrix} \dfrac{n_{A2}-n_{K}}{s} + 1 \end{pmatrix}$$
where, $\text{floor()}$ means that a floating-point result is rounded to its closest smallest integer value.
We’ve noticed that in the first lesson, we had started with an image tensor $A$ of size 4×4 and a stride $s$ of 1. However, after performing a series of convolutions, the output tensor $O$ got reduced to a size of 2×2. Let us see if the above formula can show similar results for a filter size of 3×3,
$$n_{O1} = \text{floor}\begin{pmatrix}{\dfrac{4-3}{1} + 1 \end{pmatrix} = 2 $$
and,
$$n_{O2} = \text{floor}\begin{pmatrix}{\dfrac{4-3}{1} + 1 \end{pmatrix} = 2 $$
Thus, the size of the output tensor is 2 x 2.
With this, you now know about the Convolution Operation in CNNs. In the next chapter, you will be introduced to another important operation in CNNs, the Pooling Operation.
Do you want to learn Python, Data Science, and Machine Learning while getting certified? Here are some best selling Datacamp courses that we recommend you enroll in:
- Introduction to Python (Free Course) - 1,000,000+ students already enrolled!
- Introduction to Data Science in Python- 400,000+ students already enrolled!
- Introduction to TensorFlow for Deep Learning with Python - 90,000+ students already enrolled!
- Data Science and Machine Learning Bootcamp with R - 70,000+ students already enrolled!