Fashion MNIST¶
In this practical application notebook, we will work with fashion MNIST dataset to carry out a classification exercise using Artificial Neural Networks.¶
Dataset¶
The dataset, Fashion MNIST, is a collection of apparel images falling into several classes. Classes are numbered from 0 to 9 and have the following meanings with Tshirt/Top represented as 0 and an Ankle Boot as 9.
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
Objective¶
In this exercise, we will create a simple ANN model to classify the images into some categories
Toolkit¶
We will use TensforFlow, tensorflow implementation of keras on google colab for this exercise.
Loading the libraries¶
#!pip install tensorflow
import warnings
warnings.filterwarnings("ignore")
import tensorflow as tf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
tf.__version__
'2.18.0'
Loading the Data¶
Let's import the data from the tf.keras.datasets and prepare the train and the test set.
# Load the data
(X_train, trainY), (X_test,testY) = tf.keras.datasets.fashion_mnist.load_data()
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz 29515/29515 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz 26421880/26421880 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz 5148/5148 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz 4422102/4422102 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step
X_train.shape, X_test.shape
((60000, 28, 28), (10000, 28, 28))
X_train.shape[1] * X_train.shape[2]
784
- This suggests that there are 60000 images of size 28*28 in the training set and 10000 images of size 28*28 in the test set.
- Note that we will need to flatten these images before fitting an ANN model.
- Let us now explore the classes present in the dataset.
np.unique(trainY)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint8)
- This suggests that the train set has 10 classes where each class denotes one type of apparel.
Encoding the target variable¶
- We need to one hot encode the target variable to be able to form the training target vector.
- Hint: check tf.keras.utils.to_categorical() - https://www.tensorflow.org/api_docs/python/tf/keras/utils/to_categorical
y_train = tf.keras.utils.to_categorical(trainY,num_classes=10)
y_test = tf.keras.utils.to_categorical(testY,num_classes=10)
# Let's have a look at the shapes of all the datasets
X_train.shape, y_train.shape, X_test.shape, y_test.shape
((60000, 28, 28), (60000, 10), (10000, 28, 28), (10000, 10))
## Let's normalize the dataset. Since there are pixel values ranging from 0-255, let us divide by 255 to get the new ranges from 0-1
X_train = X_train/255
X_test = X_test/255
Visualization¶
- Now, let us visualize the data items.
- We will visualize the first 24 images in the training dataset.
class_names_list = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
plt.figure(figsize=(8,8))
for i in range(24):
plt.subplot(4,6,i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(X_train[i], cmap=plt.cm.binary)
plt.xlabel(class_names_list[trainY[i]])
plt.show()
Model Building¶
- We will now start with the model building process.
- We will create a model with
- A layer to flatten the input
- A hidden layer with 64 nodes (You can play around with this number) and 'relu' activation.
- Output layer
Model-1¶
Question 1: Add the output layer with activation function and number of neurons required based on the problem statement.¶
# Initialize sequential model
model_1 = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax'), # Remove this and complete the code.
])
model_1.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩ │ flatten (Flatten) │ (None, 784) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense (Dense) │ (None, 64) │ 50,240 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_1 (Dense) │ (None, 10) │ 650 │ └──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
Total params: 50,890 (198.79 KB)
Trainable params: 50,890 (198.79 KB)
Non-trainable params: 0 (0.00 B)
Observations
- The summary of the model shows each layer's name, type, output shape, and the number of parameters at that particular layer.
- It also shows the total number of trainable and non-trainable parameters in the model. A parameter whose value is learned while training the model is called a trainable parameter otherwise it is called a non-trainable parameter.
- The Flatten layer simply flattens each image into a size of 784 (28*28) and there is no learning or training at this layer. Hence, the number of parameters is 0 for the Flatten layer.
- Each image in the form of 784 nodes would be the input for the 'dense' layer. Each node of the previous layer would be connected with each node of the current layer. Also, each connection has one weight to learn and each node has one bias. So, the total number of parameters are (784*64)+64 = 50,240.
- Similarly, the last layer - 'dense_1' have (64*10)+10 = 650 parameters.
Let us now compile the model.
- We will use 'adam' optimization and 'CategoricalCrossentropy' Loss as the loss. We will track the accuracy metric.
model_1.compile(optimizer='adam', loss='categorical_crossentropy', metrics = ['accuracy'])
# Let us now fit the model
fit_history = model_1.fit(X_train, y_train,validation_split=0.1, verbose=1, epochs=10, batch_size=64)
Epoch 1/10 844/844 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.7527 - loss: 0.7361 - val_accuracy: 0.8458 - val_loss: 0.4365 Epoch 2/10 844/844 ━━━━━━━━━━━━━━━━━━━━ 3s 3ms/step - accuracy: 0.8519 - loss: 0.4276 - val_accuracy: 0.8587 - val_loss: 0.4032 Epoch 3/10 844/844 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.8643 - loss: 0.3854 - val_accuracy: 0.8668 - val_loss: 0.3669 Epoch 4/10 844/844 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.8739 - loss: 0.3532 - val_accuracy: 0.8637 - val_loss: 0.3781 Epoch 5/10 844/844 ━━━━━━━━━━━━━━━━━━━━ 3s 3ms/step - accuracy: 0.8822 - loss: 0.3305 - val_accuracy: 0.8717 - val_loss: 0.3559 Epoch 6/10 844/844 ━━━━━━━━━━━━━━━━━━━━ 5s 3ms/step - accuracy: 0.8843 - loss: 0.3179 - val_accuracy: 0.8783 - val_loss: 0.3324 Epoch 7/10 844/844 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.8893 - loss: 0.3064 - val_accuracy: 0.8692 - val_loss: 0.3570 Epoch 8/10 844/844 ━━━━━━━━━━━━━━━━━━━━ 3s 3ms/step - accuracy: 0.8926 - loss: 0.2940 - val_accuracy: 0.8783 - val_loss: 0.3407 Epoch 9/10 844/844 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.8997 - loss: 0.2806 - val_accuracy: 0.8798 - val_loss: 0.3364 Epoch 10/10 844/844 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.9006 - loss: 0.2736 - val_accuracy: 0.8763 - val_loss: 0.3418
Observation
- We can observe that the model's accuracy increases with the increase in the number of epochs.
Evaluate the model on the test set¶
- Let's predict using the test data. The .predict() method in Keras models returns the probabilities of each observation belonging to each class. We will choose the class where the predicted probability is the highest.
- Also, let's build a function to print the classification report and confusion matrix.
def metrics_score(actual, predicted):
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
print(classification_report(actual, predicted))
cm = confusion_matrix(actual, predicted)
plt.figure(figsize=(8,5))
sns.heatmap(cm, annot=True, fmt='.0f', xticklabels=class_names_list, yticklabels=class_names_list)
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()
Question 2: What is the test accuracy for the model1?¶
model_1.evaluate(X_test, y_test, verbose = 1)
test_pred1 = np.argmax(model_1.predict(X_test), axis = -1)
test_pred1
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8663 - loss: 0.3692 313/313 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step
array([9, 2, 1, ..., 8, 1, 5])
Question 3: Which category has been most correctly classified by the model1?¶
metrics_score(testY, test_pred1)
precision recall f1-score support 0 0.78 0.85 0.82 1000 1 0.98 0.97 0.97 1000 2 0.82 0.72 0.77 1000 3 0.83 0.91 0.87 1000 4 0.81 0.75 0.78 1000 5 0.93 0.98 0.95 1000 6 0.66 0.67 0.67 1000 7 0.95 0.92 0.94 1000 8 0.97 0.95 0.96 1000 9 0.96 0.95 0.95 1000 accuracy 0.87 10000 macro avg 0.87 0.87 0.87 10000 weighted avg 0.87 0.87 0.87 10000
Observations
- Class 6 (Shirt) has the lowest recall and precision. The model is not able to identify the shirt. The confusion matrix shows that the model has predicted shirts as T-shirts/top, Pullover, and Coat which is understandable as these items have similar looks.
- Let's try changing the learning rate and train the model for more epochs and see if the model can identify even subtle differences in different objects.
Further Iterations to model building¶
- Let's change the learning rate and epochs and observe the effect on accuracy on the earlier network.
- Let's build a bigger network with the new learning rate and epochs.
Model-2¶
# Initialize sequential model
model_2 = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
#tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(10, activation = 'softmax')
])
model_2.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), loss= 'categorical_crossentropy', metrics= ['accuracy'])
model_2.summary()
Model: "sequential_1"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩ │ flatten_1 (Flatten) │ (None, 784) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_2 (Dense) │ (None, 128) │ 100,480 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_3 (Dense) │ (None, 10) │ 1,290 │ └──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
Total params: 101,770 (397.54 KB)
Trainable params: 101,770 (397.54 KB)
Non-trainable params: 0 (0.00 B)
Observation
- The summary remains the same as the previous model because we have not changed anything about the structure of the NN.
fit_history_2 = model_2.fit(X_train, y_train, epochs=30, validation_split=0.1, batch_size=64, verbose = 2)
Epoch 1/30 844/844 - 4s - 4ms/step - accuracy: 0.8162 - loss: 0.5298 - val_accuracy: 0.8440 - val_loss: 0.4440 Epoch 2/30 844/844 - 6s - 7ms/step - accuracy: 0.8625 - loss: 0.3883 - val_accuracy: 0.8645 - val_loss: 0.3723 Epoch 3/30 844/844 - 4s - 5ms/step - accuracy: 0.8738 - loss: 0.3491 - val_accuracy: 0.8633 - val_loss: 0.3801 Epoch 4/30 844/844 - 5s - 6ms/step - accuracy: 0.8825 - loss: 0.3251 - val_accuracy: 0.8733 - val_loss: 0.3457 Epoch 5/30 844/844 - 3s - 4ms/step - accuracy: 0.8883 - loss: 0.3047 - val_accuracy: 0.8760 - val_loss: 0.3313 Epoch 6/30 844/844 - 5s - 5ms/step - accuracy: 0.8933 - loss: 0.2894 - val_accuracy: 0.8802 - val_loss: 0.3203 Epoch 7/30 844/844 - 5s - 6ms/step - accuracy: 0.8978 - loss: 0.2769 - val_accuracy: 0.8835 - val_loss: 0.3239 Epoch 8/30 844/844 - 6s - 7ms/step - accuracy: 0.9013 - loss: 0.2666 - val_accuracy: 0.8793 - val_loss: 0.3421 Epoch 9/30 844/844 - 4s - 5ms/step - accuracy: 0.9056 - loss: 0.2554 - val_accuracy: 0.8902 - val_loss: 0.3163 Epoch 10/30 844/844 - 5s - 6ms/step - accuracy: 0.9106 - loss: 0.2442 - val_accuracy: 0.8888 - val_loss: 0.3177 Epoch 11/30 844/844 - 5s - 6ms/step - accuracy: 0.9124 - loss: 0.2385 - val_accuracy: 0.8915 - val_loss: 0.3186 Epoch 12/30 844/844 - 3s - 3ms/step - accuracy: 0.9154 - loss: 0.2311 - val_accuracy: 0.8883 - val_loss: 0.3177 Epoch 13/30 844/844 - 5s - 6ms/step - accuracy: 0.9171 - loss: 0.2235 - val_accuracy: 0.8878 - val_loss: 0.3395 Epoch 14/30 844/844 - 4s - 5ms/step - accuracy: 0.9191 - loss: 0.2187 - val_accuracy: 0.8862 - val_loss: 0.3336 Epoch 15/30 844/844 - 3s - 3ms/step - accuracy: 0.9208 - loss: 0.2113 - val_accuracy: 0.8903 - val_loss: 0.3443 Epoch 16/30 844/844 - 5s - 6ms/step - accuracy: 0.9254 - loss: 0.2043 - val_accuracy: 0.8912 - val_loss: 0.3157 Epoch 17/30 844/844 - 4s - 5ms/step - accuracy: 0.9264 - loss: 0.1988 - val_accuracy: 0.8943 - val_loss: 0.3190 Epoch 18/30 844/844 - 4s - 4ms/step - accuracy: 0.9275 - loss: 0.1952 - val_accuracy: 0.8897 - val_loss: 0.3445 Epoch 19/30 844/844 - 4s - 5ms/step - accuracy: 0.9306 - loss: 0.1879 - val_accuracy: 0.8897 - val_loss: 0.3435 Epoch 20/30 844/844 - 2s - 3ms/step - accuracy: 0.9305 - loss: 0.1829 - val_accuracy: 0.8847 - val_loss: 0.3600 Epoch 21/30 844/844 - 4s - 4ms/step - accuracy: 0.9339 - loss: 0.1786 - val_accuracy: 0.8875 - val_loss: 0.3539 Epoch 22/30 844/844 - 3s - 4ms/step - accuracy: 0.9342 - loss: 0.1756 - val_accuracy: 0.8972 - val_loss: 0.3281 Epoch 23/30 844/844 - 4s - 5ms/step - accuracy: 0.9381 - loss: 0.1697 - val_accuracy: 0.8940 - val_loss: 0.3397 Epoch 24/30 844/844 - 3s - 3ms/step - accuracy: 0.9381 - loss: 0.1657 - val_accuracy: 0.8890 - val_loss: 0.3539 Epoch 25/30 844/844 - 4s - 4ms/step - accuracy: 0.9395 - loss: 0.1627 - val_accuracy: 0.8957 - val_loss: 0.3529 Epoch 26/30 844/844 - 3s - 4ms/step - accuracy: 0.9402 - loss: 0.1602 - val_accuracy: 0.8940 - val_loss: 0.3682 Epoch 27/30 844/844 - 3s - 3ms/step - accuracy: 0.9424 - loss: 0.1536 - val_accuracy: 0.8902 - val_loss: 0.3696 Epoch 28/30 844/844 - 3s - 3ms/step - accuracy: 0.9439 - loss: 0.1505 - val_accuracy: 0.8855 - val_loss: 0.3650 Epoch 29/30 844/844 - 7s - 8ms/step - accuracy: 0.9458 - loss: 0.1471 - val_accuracy: 0.8947 - val_loss: 0.3557 Epoch 30/30 844/844 - 3s - 3ms/step - accuracy: 0.9457 - loss: 0.1453 - val_accuracy: 0.8898 - val_loss: 0.3854
Observations
- We can see that the accuracy of the training data has increased by ~3% but the accuracy on the validation set has increased only by ~0.50% as compared to the model trained with 10 epochs.
- This indicates that if we further increase the number of epochs while keeping everything else the same then the model might start to overfit.
model_2.evaluate(X_test,y_test, verbose = 1)
test_pred2 = np.argmax(model_2.predict(X_test), axis = -1)
metrics_score(testY, test_pred2)
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8824 - loss: 0.4026 313/313 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step precision recall f1-score support 0 0.80 0.86 0.83 1000 1 0.99 0.97 0.98 1000 2 0.78 0.85 0.81 1000 3 0.87 0.90 0.89 1000 4 0.79 0.84 0.82 1000 5 0.98 0.96 0.97 1000 6 0.78 0.60 0.68 1000 7 0.91 0.98 0.94 1000 8 0.97 0.97 0.97 1000 9 0.98 0.93 0.95 1000 accuracy 0.88 10000 macro avg 0.89 0.88 0.88 10000 weighted avg 0.89 0.88 0.88 10000
Model-3¶
Question 4: For the above model i.e Model2, add 1 hidden layer with 128 neurons and relu activation function after the flatten layer. The test accuracy of this model lies between,¶
# Initialize sequential model
model_3 = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'), # Remove this and complete the code.
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation = 'softmax')
])
model_3.summary()
Model: "sequential_2"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩ │ flatten_2 (Flatten) │ (None, 784) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_4 (Dense) │ (None, 128) │ 100,480 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_5 (Dense) │ (None, 64) │ 8,256 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_6 (Dense) │ (None, 10) │ 650 │ └──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
Total params: 109,386 (427.29 KB)
Trainable params: 109,386 (427.29 KB)
Non-trainable params: 0 (0.00 B)
Observations
- We can see that the number of parameters has increased by ~2.15 times than the number of parameters in previous models.
- Increasing the number of parameters can significantly increase the training time of the model.
model_3.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), loss= 'categorical_crossentropy', metrics= ['accuracy'])
fit_history_3 = model_3.fit(X_train, y_train, epochs=30, validation_split=0.1, batch_size=64, verbose = 1)
Epoch 1/30 844/844 ━━━━━━━━━━━━━━━━━━━━ 5s 5ms/step - accuracy: 0.7599 - loss: 0.6924 - val_accuracy: 0.8547 - val_loss: 0.4064 Epoch 2/30 844/844 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.8620 - loss: 0.3888 - val_accuracy: 0.8662 - val_loss: 0.3795 Epoch 3/30 844/844 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.8741 - loss: 0.3455 - val_accuracy: 0.8723 - val_loss: 0.3450 Epoch 4/30 844/844 ━━━━━━━━━━━━━━━━━━━━ 4s 5ms/step - accuracy: 0.8830 - loss: 0.3192 - val_accuracy: 0.8802 - val_loss: 0.3443 Epoch 5/30 844/844 ━━━━━━━━━━━━━━━━━━━━ 4s 5ms/step - accuracy: 0.8891 - loss: 0.2995 - val_accuracy: 0.8825 - val_loss: 0.3293 Epoch 6/30 844/844 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.8957 - loss: 0.2810 - val_accuracy: 0.8805 - val_loss: 0.3311 Epoch 7/30 844/844 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.8999 - loss: 0.2634 - val_accuracy: 0.8842 - val_loss: 0.3247 Epoch 8/30 844/844 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.9030 - loss: 0.2629 - val_accuracy: 0.8838 - val_loss: 0.3393 Epoch 9/30 844/844 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.9070 - loss: 0.2469 - val_accuracy: 0.8837 - val_loss: 0.3285 Epoch 10/30 844/844 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.9088 - loss: 0.2424 - val_accuracy: 0.8827 - val_loss: 0.3477 Epoch 11/30 844/844 ━━━━━━━━━━━━━━━━━━━━ 6s 5ms/step - accuracy: 0.9140 - loss: 0.2320 - val_accuracy: 0.8913 - val_loss: 0.3231 Epoch 12/30 844/844 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.9128 - loss: 0.2311 - val_accuracy: 0.8903 - val_loss: 0.3259 Epoch 13/30 844/844 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.9198 - loss: 0.2136 - val_accuracy: 0.8882 - val_loss: 0.3362 Epoch 14/30 844/844 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.9205 - loss: 0.2116 - val_accuracy: 0.8845 - val_loss: 0.3503 Epoch 15/30 844/844 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.9226 - loss: 0.2043 - val_accuracy: 0.8897 - val_loss: 0.3241 Epoch 16/30 844/844 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.9269 - loss: 0.1937 - val_accuracy: 0.8925 - val_loss: 0.3309 Epoch 17/30 844/844 ━━━━━━━━━━━━━━━━━━━━ 6s 5ms/step - accuracy: 0.9275 - loss: 0.1951 - val_accuracy: 0.8902 - val_loss: 0.3333 Epoch 18/30 844/844 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.9297 - loss: 0.1908 - val_accuracy: 0.8868 - val_loss: 0.3480 Epoch 19/30 844/844 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.9321 - loss: 0.1837 - val_accuracy: 0.8917 - val_loss: 0.3301 Epoch 20/30 844/844 ━━━━━━━━━━━━━━━━━━━━ 7s 6ms/step - accuracy: 0.9355 - loss: 0.1719 - val_accuracy: 0.8882 - val_loss: 0.3459 Epoch 21/30 844/844 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.9361 - loss: 0.1706 - val_accuracy: 0.8875 - val_loss: 0.3548 Epoch 22/30 844/844 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.9376 - loss: 0.1651 - val_accuracy: 0.8825 - val_loss: 0.3784 Epoch 23/30 844/844 ━━━━━━━━━━━━━━━━━━━━ 6s 5ms/step - accuracy: 0.9388 - loss: 0.1639 - val_accuracy: 0.8908 - val_loss: 0.3658 Epoch 24/30 844/844 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.9417 - loss: 0.1568 - val_accuracy: 0.8922 - val_loss: 0.3587 Epoch 25/30 844/844 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.9423 - loss: 0.1535 - val_accuracy: 0.8793 - val_loss: 0.4636 Epoch 26/30 844/844 ━━━━━━━━━━━━━━━━━━━━ 4s 5ms/step - accuracy: 0.9430 - loss: 0.1507 - val_accuracy: 0.8868 - val_loss: 0.3975 Epoch 27/30 844/844 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.9453 - loss: 0.1461 - val_accuracy: 0.8910 - val_loss: 0.3819 Epoch 28/30 844/844 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.9495 - loss: 0.1352 - val_accuracy: 0.8875 - val_loss: 0.4024 Epoch 29/30 650/844 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - accuracy: 0.9509 - loss: 0.1312
Observations
- The validation accuracy of the model has further increased by ~0.71% and the training accuracy has further increased by ~1.4%. So, there is still a hint of overfitting.
- We can play around with hyperparameters of the model or try different layer structures to improve the model performance and reduce the overfitting.
- We can see that accuracy keeps increasing for the test data as the number of epochs increased but validation accuracy has become somewhat constant after 10 epochs.
- This indicates that the model learns the training data more closely after each epoch but cannot replicate the performance on the validation data which is a sign of overfitting.
- The same pattern can be observed for loss as well. It keeps decreasing for the training data with the increase in epochs but becomes somewhat constant for the validation data after 10 epochs.
Now, let's make final predictions on the test data using the last model we built.
Final Predictions on the Test Data¶
final_pred = np.argmax(model_3.predict(X_test), axis = -1)
metrics_score(testY, final_pred)
- The precision and recall for class 6 (Shirt) have increased. The confusion matrix shows that the model is still not able to differentiate between T-shirt/top and Shirt but became better in differentiating Shirt with Pullover and Coat.
- The model has become even better at identifying Trouser. It has an f1-score of 98% for class 1 (Trouser).
- The overall accuracy on the test data is approximately the same as the validation accuracy.
Let's visualize the images from the test data.¶
- We will randomly select 24 images from the test data and visualize them.
- The title of each image would show the actual and predicted label of that image and the probability of the predicted class.
- Higher the probability more confident the model is about the prediction.
rows = 4
cols = 6
fig = plt.figure(figsize=(15, 15))
for i in range(cols):
for j in range(rows):
random_index = np.random.randint(0, len(testY))
ax = fig.add_subplot(rows, cols, i * rows + j + 1)
ax.imshow(X_test[random_index, :])
pred_label = class_names_list[final_pred[random_index]]
true_label = class_names_list[testY[random_index]]
y_pred_test_max_probas = np.max(model_3.predict(X_test), axis=1)
pred_proba = y_pred_test_max_probas[random_index]
ax.set_title("actual: {}\npredicted: {}\nprobability: {:.3}\n".format(
true_label, pred_label, pred_proba
))
plt.show()
Comments¶
- We have trained 3 different models with some changes.
- The plots track the variation in the accuracies and the loss across epochs and allow us to map how better these models generalize.
- We have observed good performance on the train set but there is some amount of overfitting in the models that get more prominent as we increase the epochs.
- We went ahead with model 3 and evaluated the test data on it.
- Finally, we visualized some of the images from the test data.
# Convert notebook to html
!jupyter nbconvert --to html "/content/drive/MyDrive/MIT - Data Sciences/Colab Notebooks/Week_Six_-_Deep_Learning/Hand_On_Quiz_ANN/Hands_on_quiz_ANN.ipynb"