Padding an Image

Greetings! Some links on this site are affiliate links. That means that, if you choose to make a purchase, The Click Reader may earn a small commission at no extra cost to you. We greatly appreciate your support!

[latexpage]

In some cases, it is not possible to perform a convolution/pooling operation on an image if the dimension of the image is smaller than the dimension of the filter region. Thus, to fix this problem, we can pad such images with rows and columns of pixel values to form an image tensor. There are different ways to choose the values of the padded pixels but we mostly use ‘0’ or the value of the closest pixel.

Here is a simple example demonstrating the concept of padding an image with zeroes. Consider an image tensor $A$ with a dimension of 2×2 as shown on the left side of the image below. Since this image only has two columns of pixel values, we cannot use a 3×3 filter on it. So, we can pad the image with zeroes to make 3×3 convolution/pooling operation possible. Padding can be done as shown on the right side of the image below,

Padding a 2x2 image

Now, we have a 4×4 image tensor and the 3×3 convolution/pooling operation can be performed. Here, the value of padding is 1 since we padded the image once on the top, left, right, and bottom.


Finding the size of an output tensor when padding is used

If $n_{A1}$ x $n_{A2}$ is the size of the input image tensor, $n_K$ x $n_K$ is the size of the convolution filter, $s$ is the value of stride taken and $p$ is the amount of padding, then, the size of the resulting tensor, $n_{O1}$ x $n_{O2}$ (after a series of convolution operation) can be found out using the following formula:

$$n_{O1} = \text{floor}\begin{pmatrix} \dfrac{n_{A1}+2p-n_{K}}{s} + 1 \end{pmatrix}$$

and,

$$n_{O2} = \text{floor}\begin{pmatrix} \dfrac{n_{A2}+2p-n_{K}}{s} + 1 \end{pmatrix}$$

Calculating the output tensor size when a filter size of 3×3, stride of 1 and a padding of 1 is used on a 4×2 image.

$$n_{O1} = \text{floor}\begin{pmatrix} \dfrac{4+2-3}{1} + 1 \end{pmatrix} = 4$$

and,

$$n_{O2} = \text{floor}\begin{pmatrix} \dfrac{2+2-3}{1} + 1 \end{pmatrix} = 2$$

Thus, the size of the output tensor is 4 x 2.

Note: Performing a convolution/pooling operation decreases the size of the input tensor but if the right padding value is chosen then the original size can be retained.

With this, you now have all the fundamental knowledge required to build a Convolutional Neural Network. In the next chapter, we will be tying up everything we have learned until now to build a Convolutional Neural Network.


Padding an ImagePadding an Image

Do you want to learn Python, Data Science, and Machine Learning while getting certified? Here are some best selling Datacamp courses that we recommend you enroll in:

  1. Introduction to Python (Free Course) - 1,000,000+ students already enrolled!
  2. Introduction to Data Science  in Python- 400,000+ students already enrolled!
  3. Introduction to TensorFlow for Deep Learning with Python - 90,000+ students already enrolled!
  4. Data Science and Machine Learning Bootcamp with R - 70,000+ students already enrolled!

Leave a Comment