In my previous article, we dived into the key building block behind Convolutional Neural Networks (CNNs), the convolution mathematical operator. I highly recommend you check that out as it builds context and understanding for this article:
In a nutshell, convolution for image processing is where we apply a small matrix, called a kernel, over our input image to create an output image with some effect applied to it such as blurring or sharpening.
Mathematically, what we have is:
- f∗g: Convolution between functions, f and g.
- f: The input image
- g: The kernel matrix, also known as a filter
- t: The pixel where the convolution is being computed.
- f(τ): The pixel value of image f at pixel τ.
- g(t−τ): The pixel value of g shifted by τ and evaluated at t.
Below is an example of this process, where we apply a box blur to our input image:
The convolution is computed by multiplying each pixel of the input image with the corresponding element from the kernel and summing these products, then normalised by the number of elements.
This is an example of the middle pixel as depicted in the diagram above:
[30*1 + 30*1 + 30*1] +
[30*1 + 70*1 + 30*1] +
[30*1 + 30*1 + 30*1] = 30 + 30 + 30 + 30 + 70 + 30 + 30 + 30 + 30 = 310
pixel value = 310 / 9 ~ 34