The Convolution/Pooling Operation for RGB images

Table of Contents

[latexpage]

Until now, we have only discussed the convolution and pooling operations on single-channel images, i.e., grayscale images. However, the colored photos taken from digital cameras are RGB images. Such images are formed by the addition of three color channels: Red, Green, and Blue as shown in the image below,

Mathematically, an RGB image $A$, is represented as $n_{A1}$ x $n_{A2}$ x $n_{c}$, where the first two dimensions ($n_{A1}$ and $n_{A2}$) represent the number of rows and columns of pixels in the image and the last dimension ($n_c$) represents the number of color channels. So, for an RGB image of 512x512 resolution, the actual representation of it is 512x512x3.

In this case, the convolution/pooling operation is performed on all three colour channels (Red, Green and Blue) simultaneously and a single output tensor is obtained by taking a sum of the convolution/pooling operation of each colour channel.

Let us understand this clearly with the following example of a convolution operation:

Consider an RGB image $A$ with a dimension of 3x3x3,

$$ A_R = \begin{pmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{pmatrix}, A_G = \begin{pmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{pmatrix}, A_B = \begin{pmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{pmatrix}$$

Also, consider a kernel $K$ with a dimension of 3x3,

$$ K = \begin{pmatrix} k_{11} & k_{12} & k_{13} \\ k_{21} & k_{22} & k_{23} \\ k_{31} & k_{32} & k_{33} \end{pmatrix} $$

The output tensor $O$ is obtained as follows,

$$ O = A_R * K + A_G * K + A_B * K $$

The same process can be followed for an image with a larger dimension than the kernel. The kernel is convolved with each colour channel of each subset tensor of the image to get the resultant output tensor.

The above concept can be extended for the pooling operation as well where max-pooling or average-pooling is applied to each colour channel of each subset tensor of the image to get the resultant output tensor.

The Convolution/Pooling Operation for RGB images

Do you want to learn Python, Data Science, and Machine Learning while getting certified? Here are some best selling Datacamp courses that we recommend you enroll in:

Introduction to Python (Free Course) - 1,000,000+ students already enrolled!
Introduction to Data Science in Python- 400,000+ students already enrolled!
Introduction to TensorFlow for Deep Learning with Python - 90,000+ students already enrolled!
Data Science and Machine Learning Bootcamp with R - 70,000+ students already enrolled!

Written by

The Click Reader

At The Click Reader, we are committed to empowering individuals with the tools and knowledge needed to excel in the ever-evolving field of data science. Our sole focus is delivering a world-class data science bootcamp that transforms beginners and upskillers into industry-ready professionals.

The Convolution/Pooling Operation for RGB images

Related Articles

How does a Deep Neural Network learn?

The Convolution Operation

What is a Deep Neural Network?

Training a Convolutional Neural Network

Interested In Data Science Bootcamp?
Request more info now.

The Convolution/Pooling Operation for RGB images

Related Articles

How does a Deep Neural Network learn?

The Convolution Operation

What is a Deep Neural Network?

Training a Convolutional Neural Network

Interested In Data Science Bootcamp?Request more info now.

Interested In Data Science Bootcamp?
Request more info now.