Neural Networks: Street View Housing Number Digit Recognition¶
Welcome to the project on classification using Artificial Neural Networks. We will work with the Street View Housing Numbers (SVHN) image dataset for this project.
Context:¶
One of the most interesting tasks in deep learning is to recognize objects in natural scenes. The ability to process visual information using machine learning algorithms can be very useful as demonstrated in various applications.
The SVHN dataset contains over 600,000 labeled digits cropped from street-level photos. It is one of the most popular image recognition datasets. It has been used in neural networks created by Google to improve the map quality by automatically transcribing the address numbers from a patch of pixels. The transcribed number with a known street address helps pinpoint the location of the building it represents.
Objective:¶
To build a feed-forward neural network model that can recognize the digits in the images.
Dataset¶
Here, we will use a subset of the original data to save some computation time. The dataset is provided as a .h5 file. The basic preprocessing steps have been applied on the dataset.
Mount the drive¶
Let us start by mounting the Google drive. You can run the below cell to mount the Google drive.
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
Importing the necessary libraries¶
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, BatchNormalization
from tensorflow.keras.utils import to_categorical
Let us check the version of tensorflow.
print(tf.__version__)
2.8.0
Load the dataset¶
- Let us now load the dataset that is available as a .h5 file.
- Split the data into train and the test dataset.
import h5py
# Open the file as read only
# User can make changes in the path as required
h5f = h5py.File('/content/drive/MyDrive/SVHN_single_grey1.h5', 'r')
# Load the training and the test set
X_train = h5f['X_train'][:]
y_train = h5f['y_train'][:]
X_test = h5f['X_test'][:]
y_test = h5f['y_test'][:]
# Close this file
h5f.close()
Let's check the number of images in the training and the testing dataset.
len(X_train), len(X_test)
(42000, 18000)
Observations:
- There are 42,000 images in the training data and 18,000 images in the testing data.
Visualizing images¶
- Use X_train to visualize the first 10 images.
- Use Y_train to print the first 10 labels.
# Visualizing the first 10 images in the dataset and their labels
plt.figure(figsize=(10, 1))
for i in range(10):
plt.subplot(1, 10, i+1)
plt.imshow(X_train[i], cmap="gray")
plt.axis('off')
plt.show()
print('label for each of the above image: %s' % (y_train[0:10]))
label for each of the above image: [2 6 7 4 4 0 3 0 7 3]
Data preparation¶
- Print the shape and the array of pixels for the first image in the training dataset.
- Reshape the train and the test dataset because we always have to give a 4D array as input to CNNs.
- Normalize the train and the test dataset by dividing by 255.
- Print the new shapes of the train and the test dataset.
- One-hot encode the target variable.
# Shape and the array of pixels for the first image
print("Shape:", X_train[0].shape)
print()
print("First image:\n", X_train[0])
Shape: (32, 32) First image: [[ 33.0704 30.2601 26.852 ... 71.4471 58.2204 42.9939] [ 25.2283 25.5533 29.9765 ... 113.0209 103.3639 84.2949] [ 26.2775 22.6137 40.4763 ... 113.3028 121.775 115.4228] ... [ 28.5502 36.212 45.0801 ... 24.1359 25.0927 26.0603] [ 38.4352 26.4733 23.2717 ... 28.1094 29.4683 30.0661] [ 50.2984 26.0773 24.0389 ... 49.6682 50.853 53.0377]]
# Reshaping the dataset to flatten them. We are reshaping the 2D image into 1D array
X_train = X_train.reshape(X_train.shape[0], 1024)
X_test = X_test.reshape(X_test.shape[0], 1024)
# Normalize inputs from 0-255 to 0-1
X_train = X_train / 255.0
X_test = X_test / 255.0
# New shape
print('Training set:', X_train.shape, y_train.shape)
print('Test set:', X_test.shape, y_test.shape)
Training set: (42000, 1024) (42000,) Test set: (18000, 1024) (18000,)
# One-hot encode output
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
# No.of classes
y_test
array([[0., 1., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 1., 0., 0.], [0., 0., 1., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 1., 0., 0.], [0., 0., 0., ..., 0., 0., 1.], [0., 0., 1., ..., 0., 0., 0.]], dtype=float32)
Observations:
- Notice that each entry of the target variable is a one-hot encoded vector instead of a single label.
Model Building¶
Now, we have done the data preprocessing, let's build an ANN model.
# Fixing the seed for random number generators
np.random.seed(42)
import random
random.seed(42)
tf.random.set_seed(42)
Model Architecture¶
Build an sequential model with the following architecture:
First hidden layer with 64 nodes and the relu activation and the input shape = (1024, )
Second hidden layer with 32 nodes and the relu activation
Output layer with activation as 'softmax' and number of nodes equal to the number of classes, i.e., 10
Compile the model with the loss equal to categorical_crossentropy, optimizer equal to Adam(learning_rate = 0.001), and metric equal to 'accuracy'.
Print the summary of the model.
Fit on the train data with a validation split of 0.2, batch size = 128, verbose = 1, and epochs = 20. Store the model building history to use later for visualization.
# Define the model
from tensorflow.keras import losses
from tensorflow.keras import optimizers
# Create model
model1 = Sequential()
model1.add(Dense(64, activation='relu', input_shape = (1024,)))
model1.add(Dense(32, activation='relu'))
model1.add(Dense(10, activation='softmax'))
# Compile the model
adam = optimizers.Adam(learning_rate=0.001)
model1.compile(loss=losses.categorical_crossentropy, optimizer=adam, metrics=['accuracy'])
# Model summary
model1.summary()
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense (Dense) (None, 64) 65600 dense_1 (Dense) (None, 32) 2080 dense_2 (Dense) (None, 10) 330 ================================================================= Total params: 68,010 Trainable params: 68,010 Non-trainable params: 0 _________________________________________________________________
Observations:
- The model has 68,010 parameters.
- All the parameters are trainable.
# Fit the model
history_model_1 = model1.fit(X_train, y_train, validation_split=0.2, epochs=20, batch_size=128, verbose=1)
Epoch 1/20 263/263 [==============================] - 4s 4ms/step - loss: 2.3008 - accuracy: 0.1144 - val_loss: 2.2670 - val_accuracy: 0.1352 Epoch 2/20 263/263 [==============================] - 1s 3ms/step - loss: 2.1307 - accuracy: 0.2299 - val_loss: 1.9434 - val_accuracy: 0.3024 Epoch 3/20 263/263 [==============================] - 1s 3ms/step - loss: 1.8054 - accuracy: 0.3639 - val_loss: 1.6951 - val_accuracy: 0.4092 Epoch 4/20 263/263 [==============================] - 1s 3ms/step - loss: 1.6367 - accuracy: 0.4389 - val_loss: 1.5765 - val_accuracy: 0.4687 Epoch 5/20 263/263 [==============================] - 1s 3ms/step - loss: 1.5370 - accuracy: 0.4800 - val_loss: 1.4827 - val_accuracy: 0.5077 Epoch 6/20 263/263 [==============================] - 1s 3ms/step - loss: 1.4697 - accuracy: 0.5068 - val_loss: 1.4335 - val_accuracy: 0.5269 Epoch 7/20 263/263 [==============================] - 1s 3ms/step - loss: 1.4359 - accuracy: 0.5207 - val_loss: 1.4063 - val_accuracy: 0.5386 Epoch 8/20 263/263 [==============================] - 1s 3ms/step - loss: 1.4061 - accuracy: 0.5334 - val_loss: 1.3790 - val_accuracy: 0.5501 Epoch 9/20 263/263 [==============================] - 1s 3ms/step - loss: 1.3800 - accuracy: 0.5468 - val_loss: 1.3579 - val_accuracy: 0.5590 Epoch 10/20 263/263 [==============================] - 1s 3ms/step - loss: 1.3612 - accuracy: 0.5553 - val_loss: 1.3392 - val_accuracy: 0.5688 Epoch 11/20 263/263 [==============================] - 1s 3ms/step - loss: 1.3399 - accuracy: 0.5690 - val_loss: 1.3638 - val_accuracy: 0.5569 Epoch 12/20 263/263 [==============================] - 1s 3ms/step - loss: 1.3186 - accuracy: 0.5778 - val_loss: 1.3065 - val_accuracy: 0.5879 Epoch 13/20 263/263 [==============================] - 1s 3ms/step - loss: 1.3025 - accuracy: 0.5879 - val_loss: 1.3119 - val_accuracy: 0.5818 Epoch 14/20 263/263 [==============================] - 1s 3ms/step - loss: 1.2831 - accuracy: 0.5951 - val_loss: 1.2772 - val_accuracy: 0.6001 Epoch 15/20 263/263 [==============================] - 1s 3ms/step - loss: 1.2642 - accuracy: 0.6021 - val_loss: 1.2839 - val_accuracy: 0.5877 Epoch 16/20 263/263 [==============================] - 1s 3ms/step - loss: 1.2583 - accuracy: 0.6060 - val_loss: 1.2592 - val_accuracy: 0.6062 Epoch 17/20 263/263 [==============================] - 1s 3ms/step - loss: 1.2459 - accuracy: 0.6103 - val_loss: 1.2521 - val_accuracy: 0.6061 Epoch 18/20 263/263 [==============================] - 1s 3ms/step - loss: 1.2375 - accuracy: 0.6117 - val_loss: 1.2350 - val_accuracy: 0.6148 Epoch 19/20 263/263 [==============================] - 1s 3ms/step - loss: 1.2308 - accuracy: 0.6143 - val_loss: 1.2356 - val_accuracy: 0.6155 Epoch 20/20 263/263 [==============================] - 1s 3ms/step - loss: 1.2240 - accuracy: 0.6166 - val_loss: 1.2246 - val_accuracy: 0.6210
Plotting the validation and training accuracies¶
# Plotting the accuracies
dict_hist = history_model_1.history
list_ep = [i for i in range(1,21)]
plt.figure(figsize = (8,8))
plt.plot(list_ep,dict_hist['accuracy'],ls = '--', label = 'accuracy')
plt.plot(list_ep,dict_hist['val_accuracy'],ls = '--', label = 'val_accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epochs')
plt.legend()
plt.show()
Observations:
- The accuracy on the train and the validation set is almost similar. We can say that model is giving a generalized performance.
- The plot shows that training accuracy is increasing with the number of epochs, but the validation accuracy has started to fluctuate after 10 epochs. However, the overall validation accuracy is also increasing with epochs.
Let's build one more model with higher complexity and see if we can improve the performance of the model.
First, we need to clear the previous model's history from the Keras backend. Also, let's fix the seed again after clearing the backend.
# Clearing backend
from tensorflow.keras import backend
backend.clear_session()
# Fixing the seed for random number generators
np.random.seed(42)
import random
random.seed(42)
tf.random.set_seed(42)
Second Model Architecture¶
- Building a sequential model with the following architecture
- First hidden layer with 256 nodes and the relu activation and the input shape = (1024, )
- Second hidden layer with 128 nodes and the relu activation
- Add the Dropout layer with the rate equal to 0.2
- Third hidden layer with 64 nodes and the relu activation
- Fourth hidden layer with 64 nodes and the relu activation
- Fifth hidden layer with 32 nodes and the relu activation
- Add the BatchNormalization layer
- Output layer with activation as 'softmax' and number of nodes equal to the number of classes, i.e., 10
- Compile the model with the loss equal to categorical_crossentropy, optimizer equal to Adam(learning_rate = 0.0005), and metric equal to 'accuracy'. Do not fit the model here, just return the compiled model.
- Print the summary of the model.
- Fit on the train data with a validation split of 0.2, batch size = 128, verbose = 1, and epochs = 30. Store the model building history to use later for visualization.
# Define model
from tensorflow.keras import losses
from tensorflow.keras import optimizers
# Create model
model2 = Sequential()
model2.add(Dense(256, activation='relu', input_shape = (1024,)))
model2.add(Dense(128, activation='relu'))
model2.add(Dropout(0.2))
model2.add(Dense(64, activation='relu'))
model2.add(Dense(64, activation='relu'))
model2.add(Dense(32, activation='relu'))
model2.add(BatchNormalization())
model2.add(Dense(10, activation='softmax'))
# Compile model
adam = optimizers.Adam(learning_rate=0.0005)
model2.compile(loss=losses.categorical_crossentropy, optimizer=adam, metrics=['accuracy'])
# Model summary
model2.summary()
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense (Dense) (None, 256) 262400 dense_1 (Dense) (None, 128) 32896 dropout (Dropout) (None, 128) 0 dense_2 (Dense) (None, 64) 8256 dense_3 (Dense) (None, 64) 4160 dense_4 (Dense) (None, 32) 2080 batch_normalization (BatchN (None, 32) 128 ormalization) dense_5 (Dense) (None, 10) 330 ================================================================= Total params: 310,250 Trainable params: 310,186 Non-trainable params: 64 _________________________________________________________________
Observations:
- The total number of parameters has increased by approximately 4.5 times of the previous model, i.e., the second model is much more complex than the first model.
- There are 64 non-trainable parameters. They belong to the batch normalization layer.
Let's fit the model and plot the accuracies of the training and the validation data.
# Fit the model
history_model_2 = model2.fit(X_train, y_train, validation_split=0.2, epochs=30, batch_size=128, verbose=1)
Epoch 1/30 263/263 [==============================] - 3s 5ms/step - loss: 2.3507 - accuracy: 0.0979 - val_loss: 2.3047 - val_accuracy: 0.1106 Epoch 2/30 263/263 [==============================] - 1s 4ms/step - loss: 2.1886 - accuracy: 0.1686 - val_loss: 2.0674 - val_accuracy: 0.2562 Epoch 3/30 263/263 [==============================] - 1s 4ms/step - loss: 1.7453 - accuracy: 0.3871 - val_loss: 1.5670 - val_accuracy: 0.4777 Epoch 4/30 263/263 [==============================] - 1s 5ms/step - loss: 1.4380 - accuracy: 0.5112 - val_loss: 1.3193 - val_accuracy: 0.5718 Epoch 5/30 263/263 [==============================] - 1s 4ms/step - loss: 1.2700 - accuracy: 0.5784 - val_loss: 1.1708 - val_accuracy: 0.6149 Epoch 6/30 263/263 [==============================] - 1s 4ms/step - loss: 1.1703 - accuracy: 0.6196 - val_loss: 1.0695 - val_accuracy: 0.6592 Epoch 7/30 263/263 [==============================] - 1s 5ms/step - loss: 1.1118 - accuracy: 0.6419 - val_loss: 1.0547 - val_accuracy: 0.6620 Epoch 8/30 263/263 [==============================] - 1s 4ms/step - loss: 1.0657 - accuracy: 0.6578 - val_loss: 1.0521 - val_accuracy: 0.6573 Epoch 9/30 263/263 [==============================] - 1s 4ms/step - loss: 1.0215 - accuracy: 0.6730 - val_loss: 0.9803 - val_accuracy: 0.6917 Epoch 10/30 263/263 [==============================] - 1s 4ms/step - loss: 0.9898 - accuracy: 0.6836 - val_loss: 1.0238 - val_accuracy: 0.6787 Epoch 11/30 263/263 [==============================] - 1s 4ms/step - loss: 0.9717 - accuracy: 0.6895 - val_loss: 0.9391 - val_accuracy: 0.7024 Epoch 12/30 263/263 [==============================] - 1s 4ms/step - loss: 0.9396 - accuracy: 0.7002 - val_loss: 0.8684 - val_accuracy: 0.7246 Epoch 13/30 263/263 [==============================] - 1s 4ms/step - loss: 0.9093 - accuracy: 0.7135 - val_loss: 0.9138 - val_accuracy: 0.7121 Epoch 14/30 263/263 [==============================] - 1s 4ms/step - loss: 0.8993 - accuracy: 0.7153 - val_loss: 0.8496 - val_accuracy: 0.7332 Epoch 15/30 263/263 [==============================] - 1s 4ms/step - loss: 0.8756 - accuracy: 0.7225 - val_loss: 0.8766 - val_accuracy: 0.7195 Epoch 16/30 263/263 [==============================] - 1s 4ms/step - loss: 0.8445 - accuracy: 0.7337 - val_loss: 0.8252 - val_accuracy: 0.7432 Epoch 17/30 263/263 [==============================] - 1s 4ms/step - loss: 0.8383 - accuracy: 0.7357 - val_loss: 0.8077 - val_accuracy: 0.7494 Epoch 18/30 263/263 [==============================] - 1s 4ms/step - loss: 0.8253 - accuracy: 0.7374 - val_loss: 0.7905 - val_accuracy: 0.7524 Epoch 19/30 263/263 [==============================] - 1s 4ms/step - loss: 0.8066 - accuracy: 0.7450 - val_loss: 0.7995 - val_accuracy: 0.7437 Epoch 20/30 263/263 [==============================] - 1s 4ms/step - loss: 0.8033 - accuracy: 0.7452 - val_loss: 0.7711 - val_accuracy: 0.7585 Epoch 21/30 263/263 [==============================] - 1s 4ms/step - loss: 0.7944 - accuracy: 0.7502 - val_loss: 0.7974 - val_accuracy: 0.7496 Epoch 22/30 263/263 [==============================] - 1s 4ms/step - loss: 0.7780 - accuracy: 0.7538 - val_loss: 0.7902 - val_accuracy: 0.7532 Epoch 23/30 263/263 [==============================] - 1s 4ms/step - loss: 0.7677 - accuracy: 0.7565 - val_loss: 0.7841 - val_accuracy: 0.7510 Epoch 24/30 263/263 [==============================] - 1s 4ms/step - loss: 0.7675 - accuracy: 0.7582 - val_loss: 0.7841 - val_accuracy: 0.7602 Epoch 25/30 263/263 [==============================] - 1s 4ms/step - loss: 0.7522 - accuracy: 0.7607 - val_loss: 0.7625 - val_accuracy: 0.7635 Epoch 26/30 263/263 [==============================] - 1s 4ms/step - loss: 0.7393 - accuracy: 0.7642 - val_loss: 0.7293 - val_accuracy: 0.7726 Epoch 27/30 263/263 [==============================] - 1s 4ms/step - loss: 0.7293 - accuracy: 0.7700 - val_loss: 0.7446 - val_accuracy: 0.7664 Epoch 28/30 263/263 [==============================] - 1s 4ms/step - loss: 0.7392 - accuracy: 0.7662 - val_loss: 0.7721 - val_accuracy: 0.7593 Epoch 29/30 263/263 [==============================] - 1s 4ms/step - loss: 0.7196 - accuracy: 0.7713 - val_loss: 0.7433 - val_accuracy: 0.7660 Epoch 30/30 263/263 [==============================] - 1s 5ms/step - loss: 0.7184 - accuracy: 0.7721 - val_loss: 0.7046 - val_accuracy: 0.7793
Plotting the validation and training accuracies¶
# Plotting the accuracies
dict_hist = history_model_2.history
list_ep = [i for i in range(1,31)]
plt.figure(figsize = (8,8))
plt.plot(list_ep,dict_hist['accuracy'],ls = '--', label = 'accuracy')
plt.plot(list_ep,dict_hist['val_accuracy'],ls = '--', label = 'val_accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epochs')
plt.legend()
plt.show()
Observations:
- The second model which is more complex than the previous model is performing significantly better.
- The train and validation accuracy has improved significantly.
- The validation accuracy is slightly higher than training accuracy, which implies that the model complexity can be further increased.
- The plot shows that the train and validation accuracies have an upward trend even after 30 epochs which implies that the number of epochs can be increased.
Predictions on the test data¶
- Make predictions on the test set using the second model.
- Print the obtained results using the classification report and the confusion matrix.
- Final observations on the obtained results.
test_pred = model2.predict(X_test)
test_pred = np.argmax(test_pred, axis=-1)
Note: Earlier, we noticed that each entry of the test data is a one-hot encoded vector but to print the classification report and confusion matrix, we must convert each entry of y_test to a single label.
# Converting each entry to single label from one-hot encoded vector
y_test = np.argmax(y_test, axis=-1)
# Importing required functions
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
# Printing the classification report
print(classification_report(y_test, test_pred))
# Plotting the heatmap using confusion matrix
cm = confusion_matrix(y_test, test_pred)
plt.figure(figsize=(8,5))
sns.heatmap(cm, annot=True, fmt='.0f')
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()
precision recall f1-score support 0 0.79 0.80 0.80 1814 1 0.75 0.83 0.79 1828 2 0.82 0.78 0.80 1803 3 0.79 0.71 0.75 1719 4 0.77 0.85 0.81 1812 5 0.77 0.73 0.75 1768 6 0.75 0.80 0.77 1832 7 0.84 0.81 0.83 1808 8 0.76 0.72 0.74 1812 9 0.78 0.75 0.76 1804 accuracy 0.78 18000 macro avg 0.78 0.78 0.78 18000 weighted avg 0.78 0.78 0.78 18000
Observations:¶
- The accuracy is 78% on the test set. This is comparable with the results on the train and the validation sets which implies that the model is giving a generalized performance.
- The recall values for all the digits are higher than 70% with 3 having the least recall values of 71%.
- The confusion matrix shows that the model has confused 5 and 6 digits with digit 6 and 4 the most number of times.
- The highest recall of about 85% is for digit 4 i.e. the model can identify 85% of images with digit 4.
- The precision values have less variation (from 75% to 84%) than recall (from 71% to 85%).
- The least precision of 75% is for digit 1 and 6. The confusion matrix shows that the model confused it with digit 4 the most number of times.
- This indicates that the model is not able to identify small variations among digits.
Note: We can try tuning this model further or increase the complexity of the model and see if we can get better results. As this is image data, we can also try convolutional neural networks which might be able to identify small variations in the orientation of digits and give better results than simple feed-forward neural networks.
# Convert notebook to html
!jupyter nbconvert --to html "/content/drive/MyDrive/MIT - Data Sciences/Colab Notebooks/Week_Six_-_Deep_Learning/Street_View_Housing_Number_Digit_Recognition_using_ANNs/NN_Practice_Project_SVHN.ipynb"
[NbConvertApp] Converting notebook /content/drive/MyDrive/MIT - Data Sciences/Colab Notebooks/Week_Six_-_Deep_Learning/Street_View_Housing_Number_Digit_Recognition_using_ANNs/NN_Practice_Project_SVHN.ipynb to html [NbConvertApp] WARNING | Alternative text is missing on 4 image(s). [NbConvertApp] Writing 484066 bytes to /content/drive/MyDrive/MIT - Data Sciences/Colab Notebooks/Week_Six_-_Deep_Learning/Street_View_Housing_Number_Digit_Recognition_using_ANNs/NN_Practice_Project_SVHN.html