The Pooling Operation

Greetings! Some links on this site are affiliate links. That means that, if you choose to make a purchase, The Click Reader may earn a small commission at no extra cost to you. We greatly appreciate your support!

[latexpage]

The pooling operation is another fundamental operation of a Convolutional Neural Network. Thankfully, this operation can be understood much quicker since we already have a sound knowledge of the convolution operation.

In practice, two kinds of pooling operations are mostly used: max pooling and average pooling. We will illustrate both of these pooling operations in the sections below.


Max-pooling

The max-pooling operation takes in a tensor as an input and outputs the maximum element present in the tensor. This can be better understood using the following notation-based example:

Consider an image tensor $A$ with a dimension size of 4×4,

$$ A = \begin{pmatrix} a_{11} & a_{12} & a_{13} & a_{14} \\ a_{21} & a_{22} & a_{23} & a_{24} \\ a_{31} & a_{32} & a_{33} & a_{34} \\ a_{41} & a_{42} & a_{43} & a_{44} \end{pmatrix}$$

Taking a max-pooling of size 2×2 and stride of 2, the output tensor can be obtained as follows:

$$o_1 = \text{max}\begin{pmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{pmatrix}\ ,\ o_2 = \text{max}\begin{pmatrix} a_{13} & a_{14} \\ a_{23} & a_{24} \end{pmatrix}\ ,\ o_3 = \text{max}\begin{pmatrix} a_{31} & a_{32} \\ a_{41} & a_{42} \end{pmatrix}\ ,\ o_4 = \text{max}\begin{pmatrix} a_{33} & a_{34} \\ a_{43} & a_{44} \end{pmatrix}$$

The final output tensor is then obtained as follows,

$$O = \begin{pmatrix} o_1 & o_2 \\ o_3 & o_4 \end{pmatrix} $$

Let us understand this even clearly with the help of a numerical example.

Consider an image tensor $A$ with a dimension size of 4×4,

$$ A = \begin{pmatrix} 2 & 4 & 6 & 8 \\ 10 & 12 & 14 & 16 \\ 18 & 20 & 22 & 24 \\ 26 & 28 & 30 & 32 \end{pmatrix}$$

Taking a max-pooling of size 2×2 and stride of 2, the output tensor can be obtained as follows,

$$o_1 = \text{max}\begin{pmatrix} 2 & 4 \\ 10 & 12 \end{pmatrix} = 12 \ ,\ o_2 = \text{max}\begin{pmatrix} 6 & 8 \\ 14 & 16 \end{pmatrix} = 16 \ ,\ o_3 = \text{max}\begin{pmatrix} 18 & 20 \\ 26 & 28 \end{pmatrix} = 28 \ ,\ o_4 = \text{max}\begin{pmatrix} 22 & 24 \\30 & 32 \end{pmatrix} = 32$$

The final output tensor is then obtained as follows,

$$O = \begin{pmatrix} 12 & 16 \\ 28 & 32 \end{pmatrix} $$


Average-pooling

The average-pooling operation takes in a tensor as an input and outputs the average of all the elements present in the tensor. This can be better understood using the following notation-based example:

Consider an image tensor $A$ with a dimension size of 4×4,

$$ A = \begin{pmatrix} a_{11} & a_{12} & a_{13} & a_{14} \\ a_{21} & a_{22} & a_{23} & a_{24} \\ a_{31} & a_{32} & a_{33} & a_{34} \\ a_{41} & a_{42} & a_{43} & a_{44} \end{pmatrix}$$

Taking a average-pooling of size 2×2 and stride of 2, the output tensor can be obtained as follows,

$$o_1 = \text{avg}\begin{pmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{pmatrix}\ ,\ o_2 = \text{avg}\begin{pmatrix} a_{13} & a_{14} \\ a_{23} & a_{24} \end{pmatrix}\ ,\ o_3 = \text{avg}\begin{pmatrix} a_{31} & a_{32} \\ a_{41} & a_{42} \end{pmatrix}\ ,\ o_4 = \text{avg}\begin{pmatrix} a_{33} & a_{34} \\ a_{43} & a_{44} \end{pmatrix}$$

The final output tensor is then obtained as follows,

$$O = \begin{pmatrix} o_1 & o_2 \\ o_3 & o_4 \end{pmatrix} $$

Let us understand this even clearly with the help of a numerical example.

Consider an image tensor $A$ with a dimension size of 4×4,

$$ A = \begin{pmatrix} 2 & 4 & 6 & 8 \\ 10 & 12 & 14 & 16 \\ 18 & 20 & 22 & 24 \\ 26 & 28 & 30 & 32 \end{pmatrix}$$

Taking a max-pooling of size 2×2 and stride of 2, the output tensor can be obtained as follows,

$$o_1 = \text{avg}\begin{pmatrix} 2 & 4 \\ 10 & 12 \end{pmatrix} = 7 \ ,\ o_2 = \text{avg}\begin{pmatrix} 6 & 8 \\ 14 & 16 \end{pmatrix} = 11 \ ,\ o_3 = \text{avg}\begin{pmatrix} 18 & 20 \\ 26 & 28 \end{pmatrix} = 23 \ ,\ o_4 = \text{avg}\begin{pmatrix} 22 & 24 \\30 & 32 \end{pmatrix} = 27$$

The final output tensor is then obtained as follows,

$$O = \begin{pmatrix} 7 & 11 \\ 23 & 27 \end{pmatrix} $$

The pooling operation is usually performed after the convolution operation. Pooling is performed in order to further reduce the size of the input tensor by selecting only the important features from an image.


The Pooling OperationThe Pooling Operation

Do you want to learn Python, Data Science, and Machine Learning while getting certified? Here are some best selling Datacamp courses that we recommend you enroll in:

  1. Introduction to Python (Free Course) - 1,000,000+ students already enrolled!
  2. Introduction to Data Science  in Python- 400,000+ students already enrolled!
  3. Introduction to TensorFlow for Deep Learning with Python - 90,000+ students already enrolled!
  4. Data Science and Machine Learning Bootcamp with R - 70,000+ students already enrolled!

Leave a Comment