In my previous article, we introduced the key building block behind convolutional neural networks (CNNs), the convolutional layer.
Convolutional layers allow the neural network to learn the best kernels to decipher or classify our input image.
If you are unfamiliar, a kernel is a small matrix that slides over the input image, and at each step, we apply the convolution operation. Depending on the kernel’s structure, it will have a different effect on the input image. It can blur, sharpen, or even detect edges (Sobel operator).
In CNNs the output from a convolution operation is referred to as a feature map.
Below is an example diagram of a convolution where we blur the resultant image:
If you want a full breakdown of how convolution works, check out my previous post on it here:
In convolutional layers, we have several kernels that the CNN tries to optimize for using backpropagation. Neurons in subsequent convolutional layers are connected to a handful of neurons in the previous layer. This allows the first few layers to recognize low-level features and build up the complexity as we propagate through the CNN.
Convolutional layers are the key part of a CNN, but the second key part is pooling layers, which is what we will discuss in this article.