The Convolution/Pooling Operation for RGB images

Greetings! Some links on this site are affiliate links. That means that, if you choose to make a purchase, The Click Reader may earn a small commission at no extra cost to you. We greatly appreciate your support!

[latexpage]

Until now, we have only discussed the convolution and pooling operations on single-channel images, i.e., grayscale images. However, the colored photos taken from digital cameras are RGB images. Such images are formed by the addition of three color channels: Red, Green, and Blue as shown in the image below,

RGB color channels

Mathematically, an RGB image $A$, is represented as $n_{A1}$ x $n_{A2}$ x $n_{c}$, where the first two dimensions ($n_{A1}$ and $n_{A2}$) represent the number of rows and columns of pixels in the image and the last dimension ($n_c$) represents the number of color channels. So, for an RGB image of 512×512 resolution, the actual representation of it is 512x512x3.

In this case, the convolution/pooling operation is performed on all three colour channels (Red, Green and Blue) simultaneously and a single output tensor is obtained by taking a sum of the convolution/pooling operation of each colour channel.

Let us understand this clearly with the following example of a convolution operation:

Consider an RGB image $A$ with a dimension of 3x3x3,

$$ A_R = \begin{pmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{pmatrix}, A_G = \begin{pmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{pmatrix}, A_B = \begin{pmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{pmatrix}$$

Also, consider a kernel $K$ with a dimension of 3×3,

$$ K = \begin{pmatrix} k_{11} & k_{12} & k_{13} \\ k_{21} & k_{22} & k_{23} \\ k_{31} & k_{32} & k_{33} \end{pmatrix} $$

The output tensor $O$ is obtained as follows,

$$ O = A_R * K + A_G * K + A_B * K $$

The same process can be followed for an image with a larger dimension than the kernel. The kernel is convolved with each colour channel of each subset tensor of the image to get the resultant output tensor.

The above concept can be extended for the pooling operation as well where max-pooling or average-pooling is applied to each colour channel of each subset tensor of the image to get the resultant output tensor.


The Convolution/Pooling Operation for RGB imagesThe Convolution/Pooling Operation for RGB images

Do you want to learn Python, Data Science, and Machine Learning while getting certified? Here are some best selling Datacamp courses that we recommend you enroll in:

  1. Introduction to Python (Free Course) - 1,000,000+ students already enrolled!
  2. Introduction to Data Science  in Python- 400,000+ students already enrolled!
  3. Introduction to TensorFlow for Deep Learning with Python - 90,000+ students already enrolled!
  4. Data Science and Machine Learning Bootcamp with R - 70,000+ students already enrolled!

Leave a Comment