Let's check how the convolution works in more detail here.

In [1]:
import matplotlib.pyplot as plt
import numpy as np

from scipy.datasets import ascent
In [2]:
# Load the ascent image from scipy

ascent_image = ascent()

plt.gray()
plt.imshow(ascent_image);
No description has been provided for this image
In [3]:
# Creating a copy of the image

image_transformed = ascent_image.copy()

#Dimensions of the image
size_x = image_transformed.shape[0]
size_y = image_transformed.shape[1]

print(size_x, size_y)
512 512

Let's create a 3 by 3 filter now. Similar to the ones we used in the previous lab in the line of Conv2D where we used 64 such different 3 by 3 filters. Let's create some filters and see how these works -

In [6]:
filter = [[1, 0, 0], [-2, 1, -2], [0, 1, 1]]
filter2 = [[-1, -2, -1], [0,0,0,], [1,2,1]]
filter3 = [[-1,0,1], [-2,0,2], [-1,0,1]]

# all the digits in a filter should add up to 1. If they are not adding up to one then you need to set a weight to make them add up to 1

weight = 1

What the Below Code is Doing:

  • Iterating Over the Image: The code starts by looping over each pixel in the interior of the image (x and y loop over all pixels except for the borders). The reason it avoids the edges is that the filter needs to look at neighboring pixels, and for border pixels, some neighbors are outside the image.
  • Applying the Filter: For each pixel, it calculates a value called convolution. The value of the pixel is affected by the surrounding pixels, based on the filter values. It multiplies each neighboring pixel by the corresponding value in the filter matrix (a 3x3 grid).
  • For example:
    • The pixel at position (x-1, y-1) is multiplied by the filter value at position [0][0] in the filter.
    • The pixel at position (x-1, y) is multiplied by the filter value at [0][1], and so on for all neighboring pixels.
    This process continues for all the interior pixels in the image. The idea is to apply the filter to each pixel and combine the neighboring pixel values based on the filter.
  • Scaling the Result: After the weighted sum of the surrounding pixels is computed, it’s scaled by a weight factor. This weight can adjust the overall effect of the filter.
  • Clamping the Value: After calculating the new value for the pixel, it’s clamped (restricted) between 0 and 255. This is because pixel values in an image are usually represented by integers between 0 (black) and 255 (white), so if the convolution result is outside this range, it’s adjusted to fit within it.
  • Storing the Result: Finally, the new computed value is stored in the image_transformed at the same (x, y) position, effectively creating a new image where the convolution has been applied.
In [7]:
# Iterate over the image and add filters to it

for x in range(1, size_x-1):
    for y in range(1, size_y-1):
        convolution = 0.0
        convolution = convolution + (ascent_image[x-1, y-1] * filter[0][0])
        convolution = convolution + (ascent_image[x-1, y] * filter[0][1])
        convolution = convolution + (ascent_image[x-1, y+1] * filter[0][2])
        convolution = convolution + (ascent_image[x, y-1] * filter[1][0])
        convolution = convolution + (ascent_image[x, y] * filter[1][1])
        convolution = convolution + (ascent_image[x, y+1] * filter[1][2])
        convolution = convolution + (ascent_image[x+1, y-1] * filter[2][0])
        convolution = convolution + (ascent_image[x+1, y] * filter[2][1])
        convolution = convolution + (ascent_image[x+1, y+1] * filter[2][2])

        convolution = convolution * weight

        if (convolution < 0):
            convolution = 0
        if (convolution > 255):
            convolution = 255

        image_transformed[x,y] = convolution
In [8]:
plt.gray()
plt.imshow(image_transformed);
No description has been provided for this image
  • Here, the code is setting the new dimensions of the image after pooling. Since we are pooling in 2x2 blocks, the new dimensions (new_x and new_y) will be half the size of the original image (size_x and size_y).
  • new_image is a new empty image, initialized with zeros. Its size is determined by new_x and new_y (which are half of the original dimensions).
  • Loop iterate over the original image in steps of 2 pixels. The x and y variables represent the starting pixel coordinates of each 2x2 block.
  • For each 2x2 block, the code collects the 4 pixel values into a list called pixels. These are the values from the top-left, top-right, bottom-left, and bottom-right corners of the block.
  • The max(pixels) function finds the maximum value from the 2x2 block and stores it in the corresponding position in the new_image. The position is calculated as int(x/2), int(y/2) because the new image is half the size of the original one.
  • Finally, the transformed image (new_image) is displayed using imshow(). plt.gray() ensures the image is shown in grayscale.
In [10]:
# Let's perform maxpooling on top of it
new_x = int(size_x/2)
new_y = int(size_y/2)

new_image = np.zeros((new_x, new_y))

for x in range(0, size_x, 2):
    for y in range(0, size_y, 2):
        pixels = []
        pixels.append(image_transformed[x,y])
        pixels.append(image_transformed[x+1,y])
        pixels.append(image_transformed[x,y+1])
        pixels.append(image_transformed[x+1,y+1])
        new_image[int(x/2), int(y/2)] = max(pixels)

plt.gray()
plt.imshow(new_image);
No description has been provided for this image
In [ ]:
 
In [11]:
# Let's use the filter2 now 

filter1 = [[1, 0, 0], [-2, 1, -2], [0, 1, 1]]
filter = [[-1, -2, -1], [0,0,0,], [1,2,1]]
filter3 = [[-1,0,1], [-2,0,2], [-1,0,1]]


# Iterate over the image and add filters to it

for x in range(1, size_x-1):
    for y in range(1, size_y-1):
        convolution = 0.0
        convolution = convolution + (ascent_image[x-1, y-1] * filter[0][0])
        convolution = convolution + (ascent_image[x-1, y] * filter[0][1])
        convolution = convolution + (ascent_image[x-1, y+1] * filter[0][2])
        convolution = convolution + (ascent_image[x, y-1] * filter[1][0])
        convolution = convolution + (ascent_image[x, y] * filter[1][1])
        convolution = convolution + (ascent_image[x, y+1] * filter[1][2])
        convolution = convolution + (ascent_image[x+1, y-1] * filter[2][0])
        convolution = convolution + (ascent_image[x+1, y] * filter[2][1])
        convolution = convolution + (ascent_image[x+1, y+1] * filter[2][2])

        convolution = convolution * weight

        if (convolution < 0):
            convolution = 0
        if (convolution > 255):
            convolution = 255

        image_transformed[x,y] = convolution

plt.gray()
plt.imshow(image_transformed);
No description has been provided for this image
In [12]:
# Let's perform maxpooling on top of it
new_x = int(size_x/2)
new_y = int(size_y/2)

new_image = np.zeros((new_x, new_y))

for x in range(0, size_x, 2):
    for y in range(0, size_y, 2):
        pixels = []
        pixels.append(image_transformed[x,y])
        pixels.append(image_transformed[x+1,y])
        pixels.append(image_transformed[x,y+1])
        pixels.append(image_transformed[x+1,y+1])
        new_image[int(x/2), int(y/2)] = max(pixels)

plt.gray()
plt.imshow(new_image);
No description has been provided for this image

This is what happens in the convolution and maxpooling layers, enhancing the features and thus when we use 64 filters that enhances 64 different types of features from the image.

Try using filter3 and see what kind of information from picture you see as most enhanced!

In [ ]:
 
In [ ]: