Convolutional Neural Networks: Street View Housing Number Digit Recognition¶
Welcome to the project on classification using Convolutional Neural Networks. We will continue to work with the Street View Housing Numbers (SVHN) image dataset for this project.
Context:¶
One of the most interesting tasks in deep learning is to recognize objects in natural scenes. The ability to process visual information using machine learning algorithms can be very useful as demonstrated in various applications.
The SVHN dataset contains over 600,000 labeled digits cropped from street-level photos. It is one of the most popular image recognition datasets. It has been used in neural networks created by Google to improve the map quality by automatically transcribing the address numbers from a patch of pixels. The transcribed number with a known street address helps pinpoint the location of the building it represents.
Objective:¶
To build a CNN model that can recognize the digits in the images.
Dataset¶
Here, we will use a subset of the original data to save some computation time. The dataset is provided as a .h5 file. The basic preprocessing steps have been applied on the dataset.
Mount the drive¶
Let us start by mounting the Google drive. You can run the below cell to mount the Google drive.
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
Importing the necessary libraries¶
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, MaxPool2D, BatchNormalization, Dropout, Flatten, LeakyReLU
from tensorflow.keras.utils import to_categorical
Let us check for the version of tensorflow.
print(tf.__version__)
2.8.0
Load the dataset¶
- Let us now load the dataset that is available as a .h5 file.
- Split the data into the train and the test dataset.
import h5py
# Open the file as read only
# User can make changes in the path as required
h5f = h5py.File('/content/drive/MyDrive/SVHN_single_grey1.h5', 'r')
# Load the training and the test set
X_train = h5f['X_train'][:]
y_train = h5f['y_train'][:]
X_test = h5f['X_test'][:]
y_test = h5f['y_test'][:]
# Close this file
h5f.close()
Let's check the number of images in the training and the testing dataset.
len(X_train), len(X_test)
(42000, 18000)
Observation:
- There are 42,000 images in the training data and 18,000 images in the testing data.
Visualizing images¶
- Use X_train to visualize the first 10 images.
- Use Y_train to print the first 10 labels.
# Visualizing the first 10 images in the dataset and their labels
%matplotlib inline
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 1))
for i in range(10):
plt.subplot(1, 10, i+1)
plt.imshow(X_train[i], cmap="gray")
plt.axis('off')
plt.show()
print('label for each of the above image: %s' % (y_train[0:10]))
label for each of the above image: [2 6 7 4 4 0 3 0 7 3]
Data preparation¶
- Print the shape and the array of pixels for the first image in the training dataset.
- Reshape the train and the test dataset because we always have to give a 4D array as input to CNNs.
- Normalize the train and the test dataset by dividing by 255.
- Print the new shapes of the train and the test set.
- One-hot encode the target variable.
# Shape of the images and the first image
print("Shape:", X_train[0].shape)
print()
print("First image:\n", X_train[0])
Shape: (32, 32) First image: [[ 33.0704 30.2601 26.852 ... 71.4471 58.2204 42.9939] [ 25.2283 25.5533 29.9765 ... 113.0209 103.3639 84.2949] [ 26.2775 22.6137 40.4763 ... 113.3028 121.775 115.4228] ... [ 28.5502 36.212 45.0801 ... 24.1359 25.0927 26.0603] [ 38.4352 26.4733 23.2717 ... 28.1094 29.4683 30.0661] [ 50.2984 26.0773 24.0389 ... 49.6682 50.853 53.0377]]
# Reshaping the dataset to flatten them. Remember that we always have to give a 4D array as input to CNNs
X_train = X_train.reshape(X_train.shape[0], 32,32,1)
X_test = X_test.reshape(X_test.shape[0], 32,32,1)
# Normalize inputs from 0-255 to 0-1
X_train = X_train / 255.0
X_test = X_test / 255.0
# New shape
print('Training set:', X_train.shape, y_train.shape)
print('Test set:', X_test.shape, y_test.shape)
Training set: (42000, 32, 32, 1) (42000,) Test set: (18000, 32, 32, 1) (18000,)
# One-hot encode output
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
# Test labels
y_test
array([[0., 1., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 1., 0., 0.], [0., 0., 1., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 1., 0., 0.], [0., 0., 0., ..., 0., 0., 1.], [0., 0., 1., ..., 0., 0., 0.]], dtype=float32)
Observation:
- Notice that each entry of the target variable is a one-hot encoded vector instead of a single label.
Model Building¶
Now, we have done data preprocessing, let's build a CNN model.
# Fixing the seed for random number generators
np.random.seed(42)
import random
random.seed(42)
tf.random.set_seed(42)
Model Architecture¶
- Let's build a model with the following architecture,
- First Convolutional layer with 16 filters and a kernel size of 3x3. Use the 'same' padding and provide the input shape = (32, 32, 1)
- Add a LeakyRelu layer with the slope equal to 0.1
- Second Convolutional layer with 32 filters and a kernel size of 3x3 with 'same' padding
- Another LeakyRelu with the slope equal to 0.1
- A max-pooling layer with a pool size of 2x2
- Flatten the output from the previous layer
- Add a dense layer with 32 nodes
- Add a LeakyRelu layer with a slope equal to 0.1
- Add the final output layer with nodes equal to the number of classes and softmax activation
- Compile the model with the categorical_crossentropy loss, adam optimizers (learning_rate = 0.001), and accuracy metric.
- Print the summary of the model.
- Fit the model on the train data with a validation split of 0.2, batch size = 32, verbose = 1, and 20 epochs. Store the model building history to use later for visualization.
# Define model
from tensorflow.keras import losses
from tensorflow.keras import optimizers
model1 = Sequential()
model1.add(Conv2D(filters=16, kernel_size=(3,3), padding="same", input_shape=(32, 32, 1)))
model1.add(LeakyReLU(0.1))
model1.add(Conv2D(filters=32, kernel_size=(3,3), padding='same'))
model1.add(LeakyReLU(0.1))
model1.add(MaxPool2D(pool_size=(2,2)))
model1.add(Flatten())
model1.add(Dense(32))
model1.add(LeakyReLU(0.1))
model1.add(Dense(10, activation='softmax'))
adam = optimizers.Adam(learning_rate=0.001)
model1.compile(loss=losses.categorical_crossentropy, optimizer=adam, metrics=['accuracy'])
# Model summary
model1.summary()
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d (Conv2D) (None, 32, 32, 16) 160 leaky_re_lu (LeakyReLU) (None, 32, 32, 16) 0 conv2d_1 (Conv2D) (None, 32, 32, 32) 4640 leaky_re_lu_1 (LeakyReLU) (None, 32, 32, 32) 0 max_pooling2d (MaxPooling2D (None, 16, 16, 32) 0 ) flatten (Flatten) (None, 8192) 0 dense (Dense) (None, 32) 262176 leaky_re_lu_2 (LeakyReLU) (None, 32) 0 dense_1 (Dense) (None, 10) 330 ================================================================= Total params: 267,306 Trainable params: 267,306 Non-trainable params: 0 _________________________________________________________________
- The model has 2,67,306 parameters. The majority of parameters belong to the single dense layer with 32 nodes.
- All the parameters are trainable.
# Fit the model
history_model_1 = model1.fit(X_train, y_train, validation_split=0.2, epochs=20, batch_size=32, verbose=1)
Epoch 1/20 1050/1050 [==============================] - 16s 4ms/step - loss: 1.1700 - accuracy: 0.6114 - val_loss: 0.6528 - val_accuracy: 0.8100 Epoch 2/20 1050/1050 [==============================] - 4s 4ms/step - loss: 0.5289 - accuracy: 0.8503 - val_loss: 0.5177 - val_accuracy: 0.8546 Epoch 3/20 1050/1050 [==============================] - 4s 4ms/step - loss: 0.4371 - accuracy: 0.8733 - val_loss: 0.4935 - val_accuracy: 0.8617 Epoch 4/20 1050/1050 [==============================] - 4s 4ms/step - loss: 0.3815 - accuracy: 0.8893 - val_loss: 0.4525 - val_accuracy: 0.8752 Epoch 5/20 1050/1050 [==============================] - 4s 4ms/step - loss: 0.3379 - accuracy: 0.8989 - val_loss: 0.4589 - val_accuracy: 0.8719 Epoch 6/20 1050/1050 [==============================] - 4s 4ms/step - loss: 0.2940 - accuracy: 0.9140 - val_loss: 0.4758 - val_accuracy: 0.8675 Epoch 7/20 1050/1050 [==============================] - 4s 4ms/step - loss: 0.2630 - accuracy: 0.9201 - val_loss: 0.4476 - val_accuracy: 0.8783 Epoch 8/20 1050/1050 [==============================] - 4s 4ms/step - loss: 0.2366 - accuracy: 0.9276 - val_loss: 0.4872 - val_accuracy: 0.8740 Epoch 9/20 1050/1050 [==============================] - 4s 4ms/step - loss: 0.2114 - accuracy: 0.9360 - val_loss: 0.4958 - val_accuracy: 0.8748 Epoch 10/20 1050/1050 [==============================] - 4s 4ms/step - loss: 0.1868 - accuracy: 0.9426 - val_loss: 0.4869 - val_accuracy: 0.8800 Epoch 11/20 1050/1050 [==============================] - 4s 4ms/step - loss: 0.1639 - accuracy: 0.9496 - val_loss: 0.5503 - val_accuracy: 0.8727 Epoch 12/20 1050/1050 [==============================] - 4s 4ms/step - loss: 0.1535 - accuracy: 0.9513 - val_loss: 0.5518 - val_accuracy: 0.8719 Epoch 13/20 1050/1050 [==============================] - 4s 4ms/step - loss: 0.1338 - accuracy: 0.9576 - val_loss: 0.5839 - val_accuracy: 0.8724 Epoch 14/20 1050/1050 [==============================] - 4s 4ms/step - loss: 0.1214 - accuracy: 0.9621 - val_loss: 0.6357 - val_accuracy: 0.8698 Epoch 15/20 1050/1050 [==============================] - 4s 4ms/step - loss: 0.1098 - accuracy: 0.9652 - val_loss: 0.6370 - val_accuracy: 0.8740 Epoch 16/20 1050/1050 [==============================] - 4s 4ms/step - loss: 0.0990 - accuracy: 0.9692 - val_loss: 0.6700 - val_accuracy: 0.8708 Epoch 17/20 1050/1050 [==============================] - 4s 4ms/step - loss: 0.0892 - accuracy: 0.9716 - val_loss: 0.7333 - val_accuracy: 0.8700 Epoch 18/20 1050/1050 [==============================] - 4s 4ms/step - loss: 0.0887 - accuracy: 0.9710 - val_loss: 0.7378 - val_accuracy: 0.8686 Epoch 19/20 1050/1050 [==============================] - 4s 4ms/step - loss: 0.0734 - accuracy: 0.9763 - val_loss: 0.7868 - val_accuracy: 0.8660 Epoch 20/20 1050/1050 [==============================] - 4s 4ms/step - loss: 0.0652 - accuracy: 0.9786 - val_loss: 0.8368 - val_accuracy: 0.8642
Plotting the validation and training accuracies¶
# Plotting the accuracies
dict_hist = history_model_1.history
list_ep = [i for i in range(1,21)]
plt.figure(figsize = (8,8))
plt.plot(list_ep,dict_hist['accuracy'],ls = '--', label = 'accuracy')
plt.plot(list_ep,dict_hist['val_accuracy'],ls = '--', label = 'val_accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epochs')
plt.legend()
plt.show()
Observations:
- The accuracy on the train set is much better (about 12%) than the validation set. We can say that model is overfitting the training data.
- The plot shows that training accuracy is increasing with epochs but the validation accuracy is more or less constant after 5 epochs.
- We can try adding dropout layers to the model's architecture to avoid overfitting. Also, we can add more convolutional layers for feature extraction.
Let's build another model and see if we can get a better model with generalized performance.
First, we need to clear the previous model's history from the keras backend. Also, let's fix the seed again after clearing the backend.
# Clearing backend
from tensorflow.keras import backend
backend.clear_session()
# Fixing the seed for random number generators
np.random.seed(42)
import random
random.seed(42)
tf.random.set_seed(42)
Second Model Architecture¶
- Let's build a second model with the following architecture,
- First Convolutional layer with 16 filters and a kernel size of 3x3. Use the 'same' padding and provide the input shape = (32, 32, 1)
- Add a LeakyRelu layer with the slope equal to 0.1
- Second Convolutional layer with 32 filters and a kernel size of 3x3 with 'same' padding
- Add LeakyRelu with the slope equal to 0.1
- Add a max-pooling layer with a pool size of 2x2
- Add a BatchNormalization layer
- Third Convolutional layer with 32 filters and a kernel size of 3x3 with 'same' padding
- Add a LeakyRelu layer with a slope equal to 0.1
- Fourth Convolutional layer 64 filters and a kernel size of 3x3 with 'same' padding
- Add a LeakyRelu layer with a slope equal to 0.1
- Add a max-pooling layer with a pool size of 2x2
- Add a BatchNormalization layer
- Flatten the output from the previous layer
- Add a dense layer with 32 nodes
- Add a LeakyRelu layer with a slope equal to 0.1
- Add a dropout layer with a rate equal to 0.5
- Add the final output layer with nodes equal to the number of classes and softmax activation
- Compile the model with the categorical_crossentropy loss, adam optimizers (learning_rate = 0.001), and accuracy metric.
- Print the summary of the model.
- Fit the model on the train data with a validation split of 0.2, batch size = 128, verbose = 1, and 30 epochs. Store the model building history to use later for visualization.
Build and train the second CNN model as per the above mentioned architecture¶
model2 = Sequential()
model2.add(Conv2D(filters=16, kernel_size=(3,3), padding="same", input_shape=(32, 32, 1)))
model2.add(LeakyReLU(0.1))
model2.add(Conv2D(filters=32, kernel_size=(3,3), padding='same'))
model2.add(LeakyReLU(0.1))
model2.add(MaxPool2D(pool_size=(2,2)))
model2.add(BatchNormalization())
model2.add(Conv2D(filters=32, kernel_size=(3,3), padding='same'))
model2.add(LeakyReLU(0.1))
model2.add(Conv2D(filters=64, kernel_size=(3,3), padding='same'))
model2.add(LeakyReLU(0.1))
model2.add(MaxPool2D(pool_size=(2,2)))
model2.add(BatchNormalization())
model2.add(Flatten())
model2.add(Dense(32))
model2.add(LeakyReLU(0.1))
model2.add(Dropout(0.5))
model2.add(Dense(10, activation='softmax'))
adam = optimizers.Adam(learning_rate=1e-3)
model2.compile(loss=losses.categorical_crossentropy, optimizer=adam, metrics=['accuracy'])
# Model summary
model2.summary()
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d (Conv2D) (None, 32, 32, 16) 160 leaky_re_lu (LeakyReLU) (None, 32, 32, 16) 0 conv2d_1 (Conv2D) (None, 32, 32, 32) 4640 leaky_re_lu_1 (LeakyReLU) (None, 32, 32, 32) 0 max_pooling2d (MaxPooling2D (None, 16, 16, 32) 0 ) batch_normalization (BatchN (None, 16, 16, 32) 128 ormalization) conv2d_2 (Conv2D) (None, 16, 16, 32) 9248 leaky_re_lu_2 (LeakyReLU) (None, 16, 16, 32) 0 conv2d_3 (Conv2D) (None, 16, 16, 64) 18496 leaky_re_lu_3 (LeakyReLU) (None, 16, 16, 64) 0 max_pooling2d_1 (MaxPooling (None, 8, 8, 64) 0 2D) batch_normalization_1 (Batc (None, 8, 8, 64) 256 hNormalization) flatten (Flatten) (None, 4096) 0 dense (Dense) (None, 32) 131104 leaky_re_lu_4 (LeakyReLU) (None, 32) 0 dropout (Dropout) (None, 32) 0 dense_1 (Dense) (None, 10) 330 ================================================================= Total params: 164,362 Trainable params: 164,170 Non-trainable params: 192 _________________________________________________________________
Observations:
- Although we have added layers to the model's architecture, the total number of parameters has decreased substantially (approx 40%) compared to the previous model.
- This is due to the additional max-pooling layer that has further reduced the size of images before passing them to the dense layer. Also, we have added a dropout layer to the model.
- All the parameters are trainable.
# Fit the model
history_model_2 = model2.fit(X_train, y_train, validation_split=0.2, epochs=30, batch_size=128, verbose=1)
Epoch 1/30 263/263 [==============================] - 4s 11ms/step - loss: 1.4630 - accuracy: 0.5010 - val_loss: 2.8419 - val_accuracy: 0.1656 Epoch 2/30 263/263 [==============================] - 2s 9ms/step - loss: 0.6960 - accuracy: 0.7839 - val_loss: 0.6896 - val_accuracy: 0.7892 Epoch 3/30 263/263 [==============================] - 3s 10ms/step - loss: 0.5587 - accuracy: 0.8307 - val_loss: 0.4496 - val_accuracy: 0.8699 Epoch 4/30 263/263 [==============================] - 2s 9ms/step - loss: 0.5012 - accuracy: 0.8467 - val_loss: 0.5299 - val_accuracy: 0.8365 Epoch 5/30 263/263 [==============================] - 2s 9ms/step - loss: 0.4546 - accuracy: 0.8624 - val_loss: 0.3936 - val_accuracy: 0.8899 Epoch 6/30 263/263 [==============================] - 2s 9ms/step - loss: 0.4163 - accuracy: 0.8713 - val_loss: 0.3844 - val_accuracy: 0.8919 Epoch 7/30 263/263 [==============================] - 3s 10ms/step - loss: 0.3848 - accuracy: 0.8822 - val_loss: 0.4160 - val_accuracy: 0.8849 Epoch 8/30 263/263 [==============================] - 3s 10ms/step - loss: 0.3587 - accuracy: 0.8882 - val_loss: 0.3522 - val_accuracy: 0.9002 Epoch 9/30 263/263 [==============================] - 2s 9ms/step - loss: 0.3339 - accuracy: 0.8958 - val_loss: 0.3742 - val_accuracy: 0.8980 Epoch 10/30 263/263 [==============================] - 3s 10ms/step - loss: 0.3221 - accuracy: 0.8993 - val_loss: 0.4067 - val_accuracy: 0.8918 Epoch 11/30 263/263 [==============================] - 2s 9ms/step - loss: 0.2965 - accuracy: 0.9055 - val_loss: 0.4151 - val_accuracy: 0.8927 Epoch 12/30 263/263 [==============================] - 2s 9ms/step - loss: 0.2892 - accuracy: 0.9093 - val_loss: 0.3647 - val_accuracy: 0.9093 Epoch 13/30 263/263 [==============================] - 2s 9ms/step - loss: 0.2704 - accuracy: 0.9143 - val_loss: 0.3358 - val_accuracy: 0.9139 Epoch 14/30 263/263 [==============================] - 2s 9ms/step - loss: 0.2625 - accuracy: 0.9182 - val_loss: 0.4111 - val_accuracy: 0.9050 Epoch 15/30 263/263 [==============================] - 3s 10ms/step - loss: 0.2581 - accuracy: 0.9175 - val_loss: 0.4845 - val_accuracy: 0.8824 Epoch 16/30 263/263 [==============================] - 2s 9ms/step - loss: 0.2434 - accuracy: 0.9232 - val_loss: 0.4082 - val_accuracy: 0.8977 Epoch 17/30 263/263 [==============================] - 2s 9ms/step - loss: 0.2360 - accuracy: 0.9243 - val_loss: 0.3794 - val_accuracy: 0.8996 Epoch 18/30 263/263 [==============================] - 2s 9ms/step - loss: 0.2249 - accuracy: 0.9282 - val_loss: 0.3653 - val_accuracy: 0.9104 Epoch 19/30 263/263 [==============================] - 2s 9ms/step - loss: 0.2191 - accuracy: 0.9288 - val_loss: 0.4608 - val_accuracy: 0.8995 Epoch 20/30 263/263 [==============================] - 2s 9ms/step - loss: 0.2074 - accuracy: 0.9323 - val_loss: 0.4097 - val_accuracy: 0.9045 Epoch 21/30 263/263 [==============================] - 2s 9ms/step - loss: 0.2026 - accuracy: 0.9352 - val_loss: 0.3751 - val_accuracy: 0.9064 Epoch 22/30 263/263 [==============================] - 2s 9ms/step - loss: 0.1884 - accuracy: 0.9370 - val_loss: 0.4227 - val_accuracy: 0.9096 Epoch 23/30 263/263 [==============================] - 2s 9ms/step - loss: 0.1873 - accuracy: 0.9382 - val_loss: 0.4186 - val_accuracy: 0.9055 Epoch 24/30 263/263 [==============================] - 3s 10ms/step - loss: 0.1798 - accuracy: 0.9402 - val_loss: 0.3876 - val_accuracy: 0.9118 Epoch 25/30 263/263 [==============================] - 2s 9ms/step - loss: 0.1694 - accuracy: 0.9441 - val_loss: 0.4175 - val_accuracy: 0.9140 Epoch 26/30 263/263 [==============================] - 2s 9ms/step - loss: 0.1784 - accuracy: 0.9415 - val_loss: 0.4477 - val_accuracy: 0.9048 Epoch 27/30 263/263 [==============================] - 2s 9ms/step - loss: 0.1684 - accuracy: 0.9442 - val_loss: 0.4627 - val_accuracy: 0.8974 Epoch 28/30 263/263 [==============================] - 3s 10ms/step - loss: 0.1690 - accuracy: 0.9445 - val_loss: 0.4774 - val_accuracy: 0.9087 Epoch 29/30 263/263 [==============================] - 2s 9ms/step - loss: 0.1571 - accuracy: 0.9471 - val_loss: 0.4835 - val_accuracy: 0.9036 Epoch 30/30 263/263 [==============================] - 3s 10ms/step - loss: 0.1601 - accuracy: 0.9457 - val_loss: 0.4030 - val_accuracy: 0.9148
Plotting the validation and training accuracies¶
# Plotting the accuracies
dict_hist = history_model_2.history
list_ep = [i for i in range(1,31)]
plt.figure(figsize = (8,8))
plt.plot(list_ep,dict_hist['accuracy'],ls = '--', label = 'accuracy')
plt.plot(list_ep,dict_hist['val_accuracy'],ls = '--', label = 'val_accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epochs')
plt.legend()
plt.show()
Observations:
- The second model which has more convolutional layers and less complex in terms of the number of parameters is performing significantly better than the first model.
- The overfitting has reduced significantly. The model is giving a generalized performance.
- As you can see in the graph, validation is almost constant after the epoch 15 which might be due to the plateau region (saddle point or local extremum). We can try adjusting the learning rate during the plateau region.
Predictions on the test data¶
- Make predictions on the test set using the second model.
- Print the obtained results using the classification report and the confusion matrix.
- Final observations on the obtained results.
test_pred = model2.predict(X_test)
test_pred = np.argmax(test_pred, axis=-1)
Note: Earlier, we noticed that each entry of the test data is a one-hot encoded vector, but to print the classification report and confusion matrix, we must convert each entry of y_test to a single label.
# Converting each entry to single label from one-hot encoded vector
y_test = np.argmax(y_test, axis=-1)
# Importing required functions
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
# Printing the classification report
print(classification_report(y_test, test_pred))
# Plotting the heatmap using confusion matrix
cm = confusion_matrix(y_test, test_pred)
plt.figure(figsize=(8,5))
sns.heatmap(cm, annot=True, fmt='.0f')
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()
precision recall f1-score support 0 0.92 0.95 0.93 1814 1 0.89 0.92 0.91 1828 2 0.93 0.92 0.93 1803 3 0.91 0.88 0.89 1719 4 0.91 0.94 0.93 1812 5 0.93 0.89 0.91 1768 6 0.91 0.88 0.90 1832 7 0.94 0.93 0.93 1808 8 0.85 0.92 0.88 1812 9 0.93 0.89 0.91 1804 accuracy 0.91 18000 macro avg 0.91 0.91 0.91 18000 weighted avg 0.91 0.91 0.91 18000
Observations:¶
- The accuracy is 91% on the test set. This is comparable with the results on the validation set which implies that the model is giving a generalized performance.
- This performance is significantly better than the performance of the final model using simple feed-forward neural networks. This suggests that CNNs are a better choice for this particular dataset.
- The recall values for all the digits are higher than or equal to 88% with 3 having the least recall. The confusion matrix shows that the model has confused 3 with digits 5 and 8 the most number of times.
- The same can be observed for digits 1 and 8 as well. The model has confused digits 1 and 8 with 4 and 6 respectively.
- The highest recall of about 95% is for digit 0 i.e. the model can identify 95% of images with digit 0.
- The precision values show that the model is also precise in its prediction, and not just sacrificing precision for recall. All digits have precision greater than or equal to 85% with 7 having the highest precision of 94%. This indicates that it is easier for the model to identify images with digits 7.
- The class with the least precision is digit 8 and 1.
Note:
- We can try hyperparameter tuning to get an even better performance.
- Data Augmentation might help to make the model more robust and invariant toward different orientations.
- We can also try techniques like transfer learning and see if we can get better results.
# Convert notebook to html
!jupyter nbconvert --to html "/content/drive/MyDrive/MIT - Data Sciences/Colab Notebooks/Week_Six_-_Deep_Learning/Street_View_Housing_Number_Digit_Recognition_using_ANNs/CNN_Practice_Project_SVHN.ipynb"