Convolutional Neural Network#
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import convolve2d
This Jupyter Notebook showcases the implementation of a simple Convolutional Neural Network (CNN) for image classification. The notebook includes the definition of the CNN architecture.
We apply the network on the famous mnist dataset. The dataset contains images depicting handwritten numbers. The numbers range from zero to nine. The goal is to correctly predict the number shown on the image.
mnist = fetch_openml('mnist_784', as_frame=False)
X_train, X_test, y_train, y_test = train_test_split(mnist.data, mnist.target, test_size=0.2, random_state=42)
X_train = X_train.reshape(-1, 28, 28) / 255
X_test = X_test.reshape(-1, 28, 28) / 255
plt.imshow(X_train[0], cmap='gray')
print('Shape:', X_train[0].shape)
print('Label:', y_train[0])
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/sklearn/utils/_optional_dependencies.py:42, in check_pandas_support(caller_name)
41 try:
---> 42 import pandas # noqa
44 return pandas
ModuleNotFoundError: No module named 'pandas'
The above exception was the direct cause of the following exception:
ImportError Traceback (most recent call last)
File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/sklearn/datasets/_openml.py:1059, in fetch_openml(name, version, data_id, data_home, target_column, cache, return_X_y, as_frame, n_retries, delay, parser, read_csv_kwargs)
1058 try:
-> 1059 check_pandas_support("`fetch_openml`")
1060 except ImportError as exc:
File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/sklearn/utils/_optional_dependencies.py:46, in check_pandas_support(caller_name)
45 except ImportError as e:
---> 46 raise ImportError("{} requires pandas.".format(caller_name)) from e
ImportError: `fetch_openml` requires pandas.
The above exception was the direct cause of the following exception:
ImportError Traceback (most recent call last)
Cell In[2], line 1
----> 1 mnist = fetch_openml('mnist_784', as_frame=False)
2 X_train, X_test, y_train, y_test = train_test_split(mnist.data, mnist.target, test_size=0.2, random_state=42)
3 X_train = X_train.reshape(-1, 28, 28) / 255
File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/sklearn/utils/_param_validation.py:216, in validate_params.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
210 try:
211 with config_context(
212 skip_parameter_validation=(
213 prefer_skip_nested_validation or global_skip_validation
214 )
215 ):
--> 216 return func(*args, **kwargs)
217 except InvalidParameterError as e:
218 # When the function is just a wrapper around an estimator, we allow
219 # the function to delegate validation to the estimator, but we replace
220 # the name of the estimator by the name of the function in the error
221 # message to avoid confusion.
222 msg = re.sub(
223 r"parameter of \w+ must be",
224 f"parameter of {func.__qualname__} must be",
225 str(e),
226 )
File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/sklearn/datasets/_openml.py:1072, in fetch_openml(name, version, data_id, data_home, target_column, cache, return_X_y, as_frame, n_retries, delay, parser, read_csv_kwargs)
1067 else:
1068 err_msg = (
1069 f"Using `parser={parser!r}` wit dense data requires pandas to be "
1070 "installed. Alternatively, explicitly set `parser='liac-arff'`."
1071 )
-> 1072 raise ImportError(err_msg) from exc
1074 if return_sparse:
1075 if as_frame:
ImportError: Using `parser='auto'` wit dense data requires pandas to be installed. Alternatively, explicitly set `parser='liac-arff'`.
Convolution Refresher#
In the context of image processing, convolution is commonly used for tasks such as blurring, edge detection, and feature extraction. It involves sliding a small matrix called a kernel, filter or mask over an image and performing a mathematical operation at each position. The result of this operation is a new image that highlights certain features or characteristics of the original image. Let’s look at an example.
kernels = []
# Sharpening Filter
kernel = (-1 * np.ones((3,3)))
kernel[1,1] = 9
kernels.append(kernel.copy())
# Horizontal Edge Detector
kernel = np.zeros((3,3))
kernel[0, :] = 1
kernel[2, :] = -1
kernels.append(kernel.copy())
# Vertical Edge Detector
kernel = np.zeros((3,3))
kernel[:, 0] = 1
kernel[:, 2] = -1
kernels.append(kernel.copy())
sample = X_train[1]
filtered = [convolve2d(sample, kernel, mode='valid') for kernel in kernels]
_, axs = plt.subplots(1, len(filtered) + 1, figsize=(15, 10))
axs[0].imshow(sample, cmap='gray')
for i, img in enumerate(filtered):
axs[i+1].imshow(img, cmap='gray')
Edge Detector#
When the filter is applied to an image, it emphasizes the horizontal transitions from light to dark (or vice versa). When the filter slides over a horizontal edge in the image, the difference between the light and dark areas is emphasized by the positive and negative values in the filter, resulting in a high output value. In areas of the image without horizontal edges, the output values are close to zero. The second does the same but for a vertical edge.
[[ 1. 1. 1.]
[ 0. 0. 0.]
[-1. -1. -1.]]
[[ 1. 0. -1.]
[ 1. 0. -1.]
[ 1. 0. -1.]]
Architecture#
First, we start with layers, which you are already familiar with
The fully connected layer. It connects every neuron in the current layer to every neuron in the subsequent layer.
The softmax layer. It assigns probabilities to each class (numbers from zero to nine), indicating the likelihood of the input belonging to each class. The class with the highest probability is then predicted as the output class.
class FCLayer:
def __init__(self, input_size, output_size):
self.weights = np.random.randn(input_size, output_size) / np.sqrt(input_size)
self.weights_update = np.zeros(self.weights.shape)
def forward(self, input):
self.input = input.copy()
return np.dot(self.input, self.weights)
def backward(self, grad):
self.weights_update += np.outer(self.input, grad)
return np.dot(grad, self.weights.T)
class Softmax:
def forward(self, input):
self.result = np.exp(input)/np.sum(np.exp(input))
return self.result
def backward(self, targets):
return self.result - targets
In the next step, we introduce two new layers:
The convolution layer This layer creates num_filters different convolution masks/filters. Each filter is slid over the input image, multiplying the filter values with the corresponding pixel values in the image, and summing up the results. This process is repeated for every position of the filter on the image, resulting in a new output feature map.
The main advantage of convolutional layers is their ability to capture local patterns and spatial relationships within the input data. Unlike fully connected layers, which consider all input features equally, convolutional layers exploit the spatial structure of the data by sharing weights across different regions. This property makes convolutional layers particularly effective in tasks such as image classification, object detection, and image segmentation.
The maximum pooling layer A pooling window slides over the input feature map. At each position, the maximum value within the window is selected and becomes the output value for that region. The pooling window then moves to the next position, until the entire input is covered. The main purpose of maximum pooling is to downsample the input, reducing its spatial dimensions. This helps in reducing the computational complexity of subsequent layers and extracting the most salient features. By selecting the maximum value within each pooling window, maximum pooling retains the strongest feature activations, enhancing the robustness of the network to small spatial translations or distortions in the input.
In summary, convolutional and maximum pooling layers are essential building blocks of CNNs that enable the network to learn hierarchical representations of input data. By leveraging local patterns and spatial relationships, these layers extract meaningful features and contribute to the overall performance of the network in various computer vision tasks.
class ConvLayer:
def __init__(self, num_filters, filter_size):
self.num_filters = num_filters
self.filter_size = filter_size
self.filters = np.random.randn(num_filters, filter_size, filter_size) / (filter_size * filter_size)
self.filters_update = np.zeros(self.filters.shape)
def forward(self, input):
self.input = input.copy()
height, width = self.input.shape
output_height = height - self.filter_size + 1
output_width = width - self.filter_size + 1
self.output = np.zeros((self.num_filters, output_height, output_width))
for filter_idx in range(self.num_filters):
for i in range(output_height):
for j in range(output_width):
# The window is a local image region
window = self.input[i:i+self.filter_size, j:j+self.filter_size]
self.output[filter_idx, i, j] = np.sum(self.filters[filter_idx] * window)
return self.output
def backward(self, grad):
_, output_height, output_width = grad.shape
input_grad = np.zeros(self.input.shape)
filter_grad = np.zeros(self.filters.shape)
for filter_idx in range(self.num_filters):
for i in range(output_height):
for j in range(output_width):
window = self.input[i:i+self.filter_size, j:j+self.filter_size]
input_grad[i:i+self.filter_size, j:j+self.filter_size] += grad[filter_idx, i, j] * self.filters[filter_idx]
filter_grad[filter_idx] += grad[filter_idx, i, j] * window
self.filters_update += filter_grad
return input_grad
class MaxPoolLayer:
def __init__(self, pool_size):
self.pool_size = pool_size
def forward(self, input):
self.input = input.copy()
num_channels, height, width = self.input.shape
output_height = height // self.pool_size
output_width = width // self.pool_size
output = np.zeros((num_channels, output_height, output_width))
for filter_idx in range(num_channels):
for i in range(output_height):
for j in range(output_width):
# The window is a local image region
window = self.input[filter_idx, i*self.pool_size:(i+1)*self.pool_size, j*self.pool_size:(j+1)*self.pool_size]
output[filter_idx, i, j] = np.max(window)
return output
def backward(self, grad):
num_filters, height, width = self.input.shape
output_height = height // self.pool_size
output_width = width // self.pool_size
input_grad = np.zeros(self.input.shape)
for filter_idx in range(num_filters):
for i in range(output_height):
for j in range(output_width):
window = self.input[filter_idx, i*self.pool_size:(i+1)*self.pool_size, j*self.pool_size:(j+1)*self.pool_size]
max_val = np.max(window)
input_grad[filter_idx, i*self.pool_size:(i+1)*self.pool_size, j*self.pool_size:(j+1)*self.pool_size] = (window == max_val) * grad[filter_idx, i, j]
return input_grad
class CNN:
def __init__(self):
self.conv_layer = ConvLayer(num_filters=6, filter_size=3)
self.pool_layer = MaxPoolLayer(pool_size=2)
# We get 13, because 28 - 3 + 1 = 26, and 26 / 2 = 13
self.fc_layer = FCLayer(input_size=6*13*13, output_size=10)
self.softmax = Softmax()
def forward(self, x):
x = self.conv_layer.forward(x)
self.pool_output = self.pool_layer.forward(x)
x = self.pool_output.flatten()
x = self.fc_layer.forward(x)
return self.softmax.forward(x)
def backward(self, targets):
grad = self.softmax.backward(targets)
grad = self.fc_layer.backward(grad)
# The fully connected layer had received a flatten input, so we have to reshape it back to the pooled shape
grad = grad.reshape(self.pool_output.shape)
grad = self.pool_layer.backward(grad)
grad = self.conv_layer.backward(grad)
return grad
def update_weights(self, lr, batch_size):
self.fc_layer.weights -= lr * self.fc_layer.weights_update / batch_size
self.conv_layer.filters -= lr * self.conv_layer.filters_update / batch_size
self.conv_layer.filters_update = np.zeros(self.conv_layer.filters.shape)
self.fc_layer.weights_update = np.zeros(self.fc_layer.weights.shape)
cnn_model = CNN()
Train and Test#
To keep the training and testing time small for demonstration purpose, we are only using a few iterations. For a real training, we would consider the whole dataset instead of just a randomly sampled subset. By further training we could significantly increase the accuracy. But this is not point of this notebook.
iterations = 25
train_batch_size = 10
test_batch_size = 5
lr = 0.1
accuracies = []
for i in range(iterations):
# train
random_samples = np.random.choice(len(X_train), train_batch_size)
for idx in random_samples:
cnn_model.forward(X_train[idx])
cnn_model.backward(np.eye(10)[int(y_train[idx])])
cnn_model.update_weights(lr, train_batch_size)
# test
accuracy = 0
random_samples = np.random.choice(len(X_test), test_batch_size)
for idx in random_samples:
accuracy += np.argmax(cnn_model.forward(X_test[idx])) == int(y_test[idx])
accuracies.append(accuracy / test_batch_size)
plt.xlabel('Iterations')
plt.ylabel('Accuracy')
plt.plot(accuracies)
samples = np.random.choice(len(X_test), 5)
_, axs = plt.subplots(1,len(samples))
for i, sample in enumerate(samples):
x = X_test[sample]
axs[i].imshow(x, cmap='gray')
axs[i].set_title(np.argmax(cnn_model.forward(x)))
Further Analysis#
We have seen the accuracy increase. That is good! Now, let us try to see, if we can gain a little bit of insight in what into a CNN.
First, we just visualize all the filter (convolution masks), our neural network has learned.
num_filters = cnn_model.conv_layer.num_filters
filter = lambda i : cnn_model.conv_layer.filters[i]
_, axs = plt.subplots(1, num_filters, figsize=(15, 10))
for i in range(num_filters):
axs[i].imshow(filter(i), cmap='gray')
plt.show()
Depending on the outcome of your training, they might look random. Further training should improve that. Anyway, sometimes you can get already some idea of what features the filters might have learned. If not, we can just try to visualize them on our dataset.
Depending on the filter, we might find edge detectors, corner detectors, sharpening filters, etc.
def gain_insight(sample):
img = sample.reshape(28, 28)
convolve = lambda i : convolve2d(img, filter(i), mode='valid')
_, axs = plt.subplots(1, num_filters + 1, figsize=(15, 10))
axs[0].imshow(img, cmap='gray')
for i in range(num_filters):
axs[i+1].imshow(convolve(i), cmap='gray')
gain_insight(X_train[7])
gain_insight(X_train[12])
Have fun looking for features!