Until now, we have only discussed the convolution and pooling operations on single-channel images, i.e., grayscale images. However, the colored photos taken from digital cameras are RGB images. Such images are formed by the addition of three color channels: Red, Green, and Blue as shown in the image below,
Mathematically, an RGB image , is represented as x x , where the first two dimensions ( and ) represent the number of rows and columns of pixels in the image and the last dimension () represents the number of color channels. So, for an RGB image of 512×512 resolution, the actual representation of it is 512x512x3.
In this case, the convolution/pooling operation is performed on all three colour channels (Red, Green and Blue) simultaneously and a single output tensor is obtained by taking a sum of the convolution/pooling operation of each colour channel.
Let us understand this clearly with the following example of a convolution operation:
Consider an RGB image with a dimension of 3x3x3,
Also, consider a kernel with a dimension of 3×3,
The output tensor is obtained as follows,
The same process can be followed for an image with a larger dimension than the kernel. The kernel is convolved with each colour channel of each subset tensor of the image to get the resultant output tensor.
The above concept can be extended for the pooling operation as well where max-pooling or average-pooling is applied to each colour channel of each subset tensor of the image to get the resultant output tensor.
Do you want to learn Python, Data Science, and Machine Learning while getting certified? Here are some best selling Datacamp courses that we recommend you enroll in:
- Introduction to Python (Free Course) - 1,000,000+ students already enrolled!
- Introduction to Data Science in Python- 400,000+ students already enrolled!
- Introduction to TensorFlow for Deep Learning with Python - 90,000+ students already enrolled!
- Data Science and Machine Learning Bootcamp with R - 70,000+ students already enrolled!