Audio MNIST Digit Recognition¶
Context¶
In the past decades, significant advances have been achieved in the area of audio recognition and a lot of research is going on globally to recognize audio data or speech using Deep Learning. The most common use case in this field is converting audio to spectrograms and vice versa.
Audio in its raw form is usually a wave and to capture that using a data structure, we need to have a huge array of amplitudes even for a very short audio clip. Although it depends on the sampling rate of the sound wave, this structured data conversion for any audio wave is very voluminous even for low sampling rates. So it becomes a problem to store and computationally very expensive to do even simple calculations on such data.
One of the best economical alternatives to this is using spectrograms. Spectrograms are created by doing Fourier or Short Time Fourier Transforms on sound waves. There are various kinds of spectrograms but the ones we will be using are called MFCC spectrograms. To put it in simple terms, a spectrogram is a way to visually encapsulate audio data. It is a graph on a 2-D plane where the X-axis represents time and the Y-axis represents Mel Coefficients. But since it is continuous on a 2-D plane, we can treat this as an image.
Objective¶
The objective here is to build an Artificial Neural Network that can look at Mel or MFCC spectrograms of audio files and classify them into 10 classes. The audio files are recordings of different speakers uttering a particular digit and the corresponding class to be predicted is the digit itself.
Dataset¶
The dataset we will use is the Audio MNIST dataset, which has audio files (having .wav extension) stored in 10 different folders. Each folder consists of these digits spoken by a particular speaker.
Understanding the required packages¶
Librosa
: Librosa is a Python package that helps in dealing with audio data. librosa.display visualizes and displays the audio data using Matplotlib. Similarly, there exists a collection of submodules under librosa that provides various other functionalities. Run the command in the below cell to install the library.IPython.display
: Display is a public API to display the tools available in Ipython. In this case study, we will create an audio object to display the digits in the MNIST audio data.tqdm
: tqdm is a Python package that allows us to add a progress bar to our application. This package will help us in iterating over the audio data.
Installing Librosa¶
!pip install librosa
Requirement already satisfied: librosa in /usr/local/lib/python3.10/dist-packages (0.10.2.post1) Requirement already satisfied: audioread>=2.1.9 in /usr/local/lib/python3.10/dist-packages (from librosa) (3.0.1) Requirement already satisfied: numpy!=1.22.0,!=1.22.1,!=1.22.2,>=1.20.3 in /usr/local/lib/python3.10/dist-packages (from librosa) (1.26.4) Requirement already satisfied: scipy>=1.2.0 in /usr/local/lib/python3.10/dist-packages (from librosa) (1.13.1) Requirement already satisfied: scikit-learn>=0.20.0 in /usr/local/lib/python3.10/dist-packages (from librosa) (1.5.2) Requirement already satisfied: joblib>=0.14 in /usr/local/lib/python3.10/dist-packages (from librosa) (1.4.2) Requirement already satisfied: decorator>=4.3.0 in /usr/local/lib/python3.10/dist-packages (from librosa) (4.4.2) Requirement already satisfied: numba>=0.51.0 in /usr/local/lib/python3.10/dist-packages (from librosa) (0.60.0) Requirement already satisfied: soundfile>=0.12.1 in /usr/local/lib/python3.10/dist-packages (from librosa) (0.12.1) Requirement already satisfied: pooch>=1.1 in /usr/local/lib/python3.10/dist-packages (from librosa) (1.8.2) Requirement already satisfied: soxr>=0.3.2 in /usr/local/lib/python3.10/dist-packages (from librosa) (0.5.0.post1) Requirement already satisfied: typing-extensions>=4.1.1 in /usr/local/lib/python3.10/dist-packages (from librosa) (4.12.2) Requirement already satisfied: lazy-loader>=0.1 in /usr/local/lib/python3.10/dist-packages (from librosa) (0.4) Requirement already satisfied: msgpack>=1.0 in /usr/local/lib/python3.10/dist-packages (from librosa) (1.0.8) Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from lazy-loader>=0.1->librosa) (24.1) Requirement already satisfied: llvmlite<0.44,>=0.43.0dev0 in /usr/local/lib/python3.10/dist-packages (from numba>=0.51.0->librosa) (0.43.0) Requirement already satisfied: platformdirs>=2.5.0 in /usr/local/lib/python3.10/dist-packages (from pooch>=1.1->librosa) (4.3.6) Requirement already satisfied: requests>=2.19.0 in /usr/local/lib/python3.10/dist-packages (from pooch>=1.1->librosa) (2.32.3) Requirement already satisfied: threadpoolctl>=3.1.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=0.20.0->librosa) (3.5.0) Requirement already satisfied: cffi>=1.0 in /usr/local/lib/python3.10/dist-packages (from soundfile>=0.12.1->librosa) (1.17.1) Requirement already satisfied: pycparser in /usr/local/lib/python3.10/dist-packages (from cffi>=1.0->soundfile>=0.12.1->librosa) (2.22) Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->pooch>=1.1->librosa) (3.3.2) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->pooch>=1.1->librosa) (3.10) Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->pooch>=1.1->librosa) (2.2.3) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->pooch>=1.1->librosa) (2024.8.30)
Importing the necessary libraries and loading the data¶
# For Audio Preprocessing
import librosa
import librosa.display as dsp
from IPython.display import Audio
# For Data Preprocessing
import pandas as pd
import numpy as np
import os
# For Data Visualization
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm import tqdm
#The data is provided as a zip file
import zipfile
import os
sns.set_style("dark")
Mounting the Drive and Unzipping the file¶
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
path = '/content/drive/MyDrive/MIT - Data Sciences/Colab Notebooks/Week_Six_-_Deep_Learning/Audio_MNIST_Digit_Recognition/Audio_MNIST_Archive.zip'
#The data is provided as a zip file so we need to extract the files from the zip file
with zipfile.ZipFile(path, 'r') as zip_ref:
zip_ref.extractall()
Let's read and check some of the audio samples¶
The below function called "get_audio" takes a digit as an argument and plots the audio wave and returns the audio for a given digit.
Let's understand the functioning of some of the new functions used to create the get_audio() function.
.wav
: .wav is a file format like .csv which stores the raw audio format. We will load the .wav file using the librosa package.dsp.waveshow()
: It visualizes the waveform in the time domain. This method creates a plot that alternates between a raw samples-based view of the signal and an amplitude-envelope view of the signal. The "sr" parameter is the sampling rate, i.e., samples per second.Audio()
: From the Ipython package, we can create an audio object.
def get_audio(digit = 0):
# Audio Sample Directory
sample = np.random.randint(1, 10)
# Index of Audio
index = np.random.randint(1, 5)
# Modified file location
if sample < 10:
file = f"/content/data/0{sample}/{digit}_0{sample}_{index}.wav"
else:
file = f"/content/data/{sample}/{digit}_{sample}_{index}.wav"
# Get Audio from the location
# Audio will be automatically resampled to the given rate (default sr = 22050)
data, sample_rate = librosa.load(file)
# Plot the audio wave
dsp.waveshow(data, sr = sample_rate)
plt.show()
# Show the widget
return Audio(data = data, rate = sample_rate)
# Show the audio and plot of digit 0
get_audio(0)
# Show the audio and plot of digit 1
get_audio(1)
# Show the audio and plot of digit 2
get_audio(2)
# Show the audio and plot of digit 3
get_audio(3)
# Show the audio and plot of digit 4
get_audio(4)
# Show the audio and plot of digit 5
get_audio(5)
# Show the audio and plot of digit 6
get_audio(6)
# Show the audio and plot of digit 7
get_audio(7)
# Show the audio and plot of digit 8
get_audio(8)
# Show the audio and plot of digit 9
get_audio(9)
Observations:
- The X-axis represents time and Y-axis represents the amplitude of the vibrations. The intuition behind the Fourier Transform is that any wave can be broken down or deconstructed as a sum of many composite sine waves. Since these are composed of sine waves, they are symmetric about the time axis, i.e, they extend equally above and below the time axis at a particular time.
- From the various audio plots ranging from 0 to 9, we can observe the amplitude at a given point in time. For example, when we say "Zero", the "Z" sound has low amplitude and the "ero" sound has higher amplitude. Similarly, the remaining digits can be interpreted by looking at the visualizations.
Visualizing the spectrogram of the audio data¶
What is a spectrogram?¶
A spectrogram is a visual way of representing the signal strength or “loudness” of a signal over time at various frequencies or time steps present in a particular waveform. A spectrogram gives a detailed view of audio. It represents amplitude, frequency, and time in a single plot. Since spectrograms are continuous plots, they can be interpreted as an image. Different spectrograms have different attributes on their axes and they are usually different to interpret. In a Research and Development scenario, we make use of a vocoder, which is an encoder that converts spectrograms back to audio using parameters learned by machine learning. One great vocoder is the WaveNet vocoder which is used in almost all Text to Speech architectures.
Here, we will be using MFCC spectrograms, which are also called Mel spectrograms.
# A function which returns audio file for a mentioned digit
def get_audio_raw(digit = 0):
# Audio Sample Directory
sample = np.random.randint(1, 10)
# Index of Audio
index = np.random.randint(1, 5)
# Modified file location
if sample < 10:
file = f"/content/data/0{sample}/{digit}_0{sample}_{index}.wav"
else:
file = f"/content/data/{sample}/{digit}_{sample}_{index}.wav"
# Get Audio from the location
data, sample_rate = librosa.load(file)
# Return audio
return data, sample_rate
Extracting features from the audio file¶
Mel-frequency cepstral coefficients (MFCCs) Feature Extraction
MFCCs are usually the final features used in many machine learning models trained on audio data. They are usually a set of mel coefficients defined for each time step through which the raw audio data can be encoded. So for example, if we have an audio sample extending for 30 time steps, and we are defining each time step by 40 Mel Coefficients, our entire sample can be represented by 40 * 30 Mel Coefficients. And if we want to create a Mel Spectrogram out of it, our spectrogram will resemble a 2-D array of 40 horizontal rows and 30 vertical columns.
In this time step, we will first extract the Mel Coefficents for each audio file and add them to our dataset.
extract_features
: Returns the MFCC extracted features for an audio file.process_and_create_dataset
: Iterate through the audio of each digit, extract the features using the extract_features() function, and append the data into a DataFrame.
Creating a function that extracts the data from audio files
# Will take an audio file as input and return extracted features using MEL_FREQUENCY CEPSTRAL COEFFICIENT as the output
def extract_features(file):
# Load audio and its sample rate
audio, sample_rate = librosa.load(file)
# Extract features using mel-frequency coefficient
extracted_features = librosa.feature.mfcc(y = audio,
sr = sample_rate,
n_mfcc = 40)
# Scale the extracted features
extracted_features = np.mean(extracted_features.T, axis = 0)
# Return the extracted features
return extracted_features
def preprocess_and_create_dataset():
# Path of the folder where the audio files are present
root_folder_path = "/content/data/"
# Empty List to create dataset
dataset = []
# Iterating through folders where each folder has the audio of each digit
for folder in tqdm(range(1, 11)):
if folder < 10:
# Path of the folder
folder = os.path.join(root_folder_path, "0" + str(folder))
else:
folder = os.path.join(root_folder_path, str(folder))
# Iterate through each file of the present folder
for file in tqdm(os.listdir(folder)):
# Path of the file
abs_file_path = os.path.join(folder, file)
# Pass path of file to the extracted_features() function to create features
extracted_features = extract_features(abs_file_path)
# Class of the audio, i.e., the digit it represents
class_label = file[0]
# Append a list where the feature represents a column and class of the digit represents another column
dataset.append([extracted_features, class_label])
# After iterating through all the folders, convert the list to a DataFrame
return pd.DataFrame(dataset, columns = ['features', 'class'])
Now. let's create the dataset using the defined function
# Create the dataset by calling the function
dataset = preprocess_and_create_dataset()
0%| | 0/10 [00:00<?, ?it/s] 0%| | 0/500 [00:00<?, ?it/s] 1%| | 6/500 [00:00<00:08, 59.87it/s] 2%|▏ | 12/500 [00:00<00:08, 55.13it/s] 4%|▎ | 18/500 [00:00<00:09, 52.76it/s] 5%|▍ | 24/500 [00:00<00:09, 51.28it/s] 6%|▌ | 30/500 [00:00<00:09, 51.10it/s] 7%|▋ | 37/500 [00:00<00:08, 53.11it/s] 9%|▊ | 43/500 [00:00<00:08, 52.74it/s] 10%|▉ | 49/500 [00:00<00:08, 51.07it/s] 11%|█ | 56/500 [00:01<00:08, 55.41it/s] 13%|█▎ | 63/500 [00:01<00:07, 57.71it/s] 14%|█▍ | 69/500 [00:01<00:08, 53.38it/s] 15%|█▌ | 75/500 [00:01<00:08, 50.72it/s] 16%|█▌ | 81/500 [00:01<00:08, 48.37it/s] 17%|█▋ | 86/500 [00:01<00:08, 48.00it/s] 18%|█▊ | 91/500 [00:01<00:08, 47.38it/s] 19%|█▉ | 96/500 [00:01<00:08, 47.17it/s] 20%|██ | 101/500 [00:01<00:08, 46.70it/s] 21%|██ | 106/500 [00:02<00:08, 46.76it/s] 22%|██▏ | 111/500 [00:02<00:08, 46.28it/s] 23%|██▎ | 116/500 [00:02<00:08, 46.50it/s] 24%|██▍ | 121/500 [00:02<00:08, 46.36it/s] 25%|██▌ | 126/500 [00:02<00:08, 46.05it/s] 26%|██▌ | 131/500 [00:02<00:08, 45.76it/s] 27%|██▋ | 136/500 [00:02<00:07, 45.69it/s] 28%|██▊ | 142/500 [00:02<00:07, 47.11it/s] 30%|██▉ | 149/500 [00:02<00:06, 52.21it/s] 32%|███▏ | 160/500 [00:03<00:05, 67.55it/s] 34%|███▍ | 171/500 [00:03<00:04, 78.85it/s] 36%|███▋ | 182/500 [00:03<00:03, 86.24it/s] 39%|███▊ | 193/500 [00:03<00:03, 92.11it/s] 41%|████ | 203/500 [00:03<00:03, 94.10it/s] 43%|████▎ | 213/500 [00:03<00:03, 89.65it/s] 45%|████▍ | 224/500 [00:03<00:02, 94.40it/s] 47%|████▋ | 234/500 [00:03<00:02, 94.00it/s] 49%|████▉ | 245/500 [00:03<00:02, 96.74it/s] 51%|█████ | 255/500 [00:04<00:02, 95.14it/s] 53%|█████▎ | 265/500 [00:04<00:02, 96.36it/s] 55%|█████▌ | 275/500 [00:04<00:02, 96.64it/s] 57%|█████▋ | 285/500 [00:04<00:02, 95.63it/s] 59%|█████▉ | 295/500 [00:04<00:02, 95.74it/s] 61%|██████ | 305/500 [00:04<00:02, 94.08it/s] 63%|██████▎ | 315/500 [00:04<00:02, 92.16it/s] 65%|██████▌ | 325/500 [00:04<00:01, 92.87it/s] 67%|██████▋ | 335/500 [00:04<00:01, 94.50it/s] 69%|██████▉ | 346/500 [00:05<00:01, 96.53it/s] 71%|███████ | 356/500 [00:05<00:01, 96.15it/s] 73%|███████▎ | 366/500 [00:05<00:01, 96.51it/s] 75%|███████▌ | 377/500 [00:05<00:01, 97.60it/s] 77%|███████▋ | 387/500 [00:05<00:01, 97.72it/s] 79%|███████▉ | 397/500 [00:05<00:01, 98.36it/s] 81%|████████▏ | 407/500 [00:05<00:00, 93.04it/s] 84%|████████▎ | 418/500 [00:05<00:00, 95.61it/s] 86%|████████▌ | 428/500 [00:05<00:00, 94.98it/s] 88%|████████▊ | 438/500 [00:05<00:00, 96.05it/s] 90%|████████▉ | 448/500 [00:06<00:00, 97.18it/s] 92%|█████████▏| 458/500 [00:06<00:00, 96.56it/s] 94%|█████████▎| 468/500 [00:06<00:00, 97.18it/s] 96%|█████████▌| 478/500 [00:06<00:00, 93.44it/s] 98%|█████████▊| 488/500 [00:06<00:00, 88.20it/s] 100%|██████████| 500/500 [00:06<00:00, 75.09it/s] 10%|█ | 1/10 [00:06<01:00, 6.67s/it] 0%| | 0/500 [00:00<?, ?it/s] 2%|▏ | 9/500 [00:00<00:05, 84.39it/s] 4%|▎ | 18/500 [00:00<00:05, 84.90it/s] 5%|▌ | 27/500 [00:00<00:07, 67.25it/s] 7%|▋ | 35/500 [00:00<00:07, 58.34it/s] 8%|▊ | 42/500 [00:00<00:07, 59.99it/s] 10%|▉ | 49/500 [00:00<00:08, 53.87it/s] 11%|█ | 55/500 [00:00<00:08, 53.69it/s] 12%|█▏ | 61/500 [00:01<00:08, 52.37it/s] 13%|█▎ | 67/500 [00:01<00:08, 52.12it/s] 15%|█▍ | 73/500 [00:01<00:08, 50.00it/s] 16%|█▌ | 79/500 [00:01<00:08, 49.48it/s] 18%|█▊ | 88/500 [00:01<00:07, 58.29it/s] 20%|█▉ | 98/500 [00:01<00:05, 67.94it/s] 21%|██ | 105/500 [00:01<00:06, 62.80it/s] 23%|██▎ | 115/500 [00:01<00:05, 71.21it/s] 25%|██▌ | 125/500 [00:01<00:04, 77.85it/s] 27%|██▋ | 135/500 [00:02<00:04, 83.62it/s] 29%|██▉ | 144/500 [00:02<00:04, 79.22it/s] 31%|███ | 155/500 [00:02<00:04, 86.01it/s] 33%|███▎ | 165/500 [00:02<00:03, 89.57it/s] 35%|███▌ | 175/500 [00:02<00:03, 91.85it/s] 37%|███▋ | 186/500 [00:02<00:03, 94.68it/s] 39%|███▉ | 196/500 [00:02<00:03, 94.00it/s] 41%|████ | 206/500 [00:02<00:03, 95.34it/s] 43%|████▎ | 216/500 [00:02<00:02, 95.62it/s] 45%|████▌ | 226/500 [00:03<00:02, 96.62it/s] 47%|████▋ | 236/500 [00:03<00:02, 88.36it/s] 49%|████▉ | 245/500 [00:03<00:02, 88.55it/s] 51%|█████ | 255/500 [00:03<00:02, 89.45it/s] 53%|█████▎ | 265/500 [00:03<00:02, 92.08it/s] 55%|█████▌ | 275/500 [00:03<00:02, 93.22it/s] 57%|█████▋ | 285/500 [00:03<00:02, 93.15it/s] 59%|█████▉ | 295/500 [00:03<00:02, 94.07it/s] 61%|██████ | 305/500 [00:03<00:02, 95.02it/s] 63%|██████▎ | 315/500 [00:04<00:01, 95.65it/s] 65%|██████▌ | 325/500 [00:04<00:01, 95.14it/s] 67%|██████▋ | 335/500 [00:04<00:01, 91.95it/s] 69%|██████▉ | 345/500 [00:04<00:01, 89.22it/s] 71%|███████ | 355/500 [00:04<00:01, 91.31it/s] 73%|███████▎ | 365/500 [00:04<00:01, 93.33it/s] 75%|███████▌ | 376/500 [00:04<00:01, 96.61it/s] 77%|███████▋ | 387/500 [00:04<00:01, 98.38it/s] 79%|███████▉ | 397/500 [00:04<00:01, 98.60it/s] 82%|████████▏ | 408/500 [00:04<00:00, 99.60it/s] 84%|████████▍ | 419/500 [00:05<00:00, 100.26it/s] 86%|████████▌ | 430/500 [00:05<00:00, 97.13it/s] 88%|████████▊ | 440/500 [00:05<00:00, 90.00it/s] 90%|█████████ | 450/500 [00:05<00:00, 88.62it/s] 92%|█████████▏| 460/500 [00:05<00:00, 90.22it/s] 94%|█████████▍| 470/500 [00:05<00:00, 91.43it/s] 96%|█████████▌| 480/500 [00:05<00:00, 93.24it/s] 100%|██████████| 500/500 [00:05<00:00, 83.63it/s] 20%|██ | 2/10 [00:12<00:50, 6.28s/it] 0%| | 0/500 [00:00<?, ?it/s] 2%|▏ | 12/500 [00:00<00:04, 116.65it/s] 5%|▍ | 24/500 [00:00<00:04, 105.39it/s] 7%|▋ | 35/500 [00:00<00:05, 91.54it/s] 9%|▉ | 45/500 [00:00<00:06, 72.96it/s] 11%|█ | 53/500 [00:00<00:06, 70.84it/s] 12%|█▏ | 61/500 [00:00<00:07, 60.85it/s] 14%|█▎ | 68/500 [00:00<00:07, 57.00it/s] 15%|█▍ | 74/500 [00:01<00:07, 55.33it/s] 16%|█▌ | 80/500 [00:01<00:07, 52.67it/s] 17%|█▋ | 86/500 [00:01<00:07, 52.79it/s] 18%|█▊ | 92/500 [00:01<00:07, 51.04it/s] 20%|█▉ | 98/500 [00:01<00:07, 51.40it/s] 21%|██ | 104/500 [00:01<00:07, 50.69it/s] 22%|██▏ | 110/500 [00:01<00:07, 51.10it/s] 23%|██▎ | 116/500 [00:01<00:07, 51.12it/s] 24%|██▍ | 122/500 [00:02<00:07, 52.76it/s] 26%|██▌ | 128/500 [00:02<00:07, 48.32it/s] 27%|██▋ | 133/500 [00:02<00:07, 46.79it/s] 28%|██▊ | 138/500 [00:02<00:07, 47.56it/s] 29%|██▉ | 144/500 [00:02<00:07, 49.56it/s] 30%|███ | 150/500 [00:02<00:06, 52.02it/s] 31%|███ | 156/500 [00:02<00:06, 51.75it/s] 32%|███▏ | 162/500 [00:02<00:06, 50.89it/s] 34%|███▎ | 168/500 [00:02<00:06, 50.37it/s] 35%|███▍ | 174/500 [00:03<00:06, 50.17it/s] 36%|███▌ | 180/500 [00:03<00:06, 49.74it/s] 37%|███▋ | 185/500 [00:03<00:06, 48.35it/s] 38%|███▊ | 190/500 [00:03<00:06, 47.29it/s] 39%|███▉ | 195/500 [00:03<00:06, 47.08it/s] 40%|████ | 200/500 [00:03<00:06, 46.47it/s] 41%|████ | 205/500 [00:03<00:06, 46.15it/s] 42%|████▏ | 210/500 [00:03<00:06, 46.01it/s] 43%|████▎ | 215/500 [00:04<00:06, 44.84it/s] 44%|████▍ | 222/500 [00:04<00:05, 50.93it/s] 46%|████▌ | 228/500 [00:04<00:05, 49.92it/s] 47%|████▋ | 234/500 [00:04<00:05, 52.18it/s] 48%|████▊ | 241/500 [00:04<00:04, 55.96it/s] 49%|████▉ | 247/500 [00:04<00:04, 56.67it/s] 51%|█████ | 253/500 [00:04<00:04, 57.14it/s] 52%|█████▏ | 260/500 [00:04<00:03, 60.72it/s] 53%|█████▎ | 267/500 [00:04<00:04, 53.51it/s] 55%|█████▍ | 273/500 [00:05<00:04, 51.17it/s] 56%|█████▌ | 279/500 [00:05<00:04, 49.81it/s] 57%|█████▋ | 285/500 [00:05<00:04, 48.40it/s] 58%|█████▊ | 290/500 [00:05<00:04, 47.24it/s] 59%|█████▉ | 295/500 [00:05<00:04, 46.79it/s] 60%|██████ | 300/500 [00:05<00:04, 45.25it/s] 61%|██████ | 305/500 [00:05<00:04, 43.63it/s] 62%|██████▏ | 310/500 [00:05<00:04, 44.72it/s] 63%|██████▎ | 315/500 [00:05<00:04, 45.87it/s] 64%|██████▍ | 320/500 [00:06<00:03, 46.47it/s] 65%|██████▌ | 325/500 [00:06<00:03, 45.76it/s] 66%|██████▌ | 330/500 [00:06<00:03, 45.53it/s] 67%|██████▋ | 335/500 [00:06<00:03, 45.50it/s] 68%|██████▊ | 340/500 [00:06<00:03, 45.79it/s] 69%|██████▉ | 345/500 [00:06<00:03, 45.12it/s] 70%|███████ | 350/500 [00:06<00:03, 45.15it/s] 71%|███████ | 356/500 [00:06<00:03, 46.99it/s] 72%|███████▏ | 362/500 [00:07<00:02, 47.81it/s] 74%|███████▎ | 368/500 [00:07<00:02, 48.58it/s] 75%|███████▍ | 373/500 [00:07<00:02, 47.43it/s] 76%|███████▌ | 378/500 [00:07<00:02, 47.69it/s] 77%|███████▋ | 383/500 [00:07<00:02, 46.72it/s] 78%|███████▊ | 388/500 [00:07<00:02, 46.67it/s] 79%|███████▊ | 393/500 [00:07<00:02, 46.77it/s] 80%|███████▉ | 398/500 [00:07<00:02, 47.33it/s] 81%|████████ | 403/500 [00:07<00:02, 45.78it/s] 83%|████████▎ | 414/500 [00:07<00:01, 63.16it/s] 85%|████████▌ | 425/500 [00:08<00:00, 76.40it/s] 87%|████████▋ | 435/500 [00:08<00:00, 82.17it/s] 89%|████████▉ | 446/500 [00:08<00:00, 88.52it/s] 91%|█████████▏| 457/500 [00:08<00:00, 94.47it/s] 93%|█████████▎| 467/500 [00:08<00:00, 95.83it/s] 96%|█████████▌| 478/500 [00:08<00:00, 98.74it/s] 100%|██████████| 500/500 [00:08<00:00, 56.75it/s] 30%|███ | 3/10 [00:21<00:52, 7.44s/it] 0%| | 0/500 [00:00<?, ?it/s] 2%|▏ | 9/500 [00:00<00:05, 84.54it/s] 4%|▍ | 20/500 [00:00<00:04, 98.81it/s] 6%|▌ | 31/500 [00:00<00:04, 100.91it/s] 8%|▊ | 42/500 [00:00<00:04, 102.90it/s] 11%|█ | 53/500 [00:00<00:04, 104.79it/s] 13%|█▎ | 64/500 [00:00<00:04, 106.09it/s] 15%|█▌ | 75/500 [00:00<00:04, 103.76it/s] 17%|█▋ | 86/500 [00:00<00:04, 103.47it/s] 19%|█▉ | 97/500 [00:00<00:03, 104.73it/s] 22%|██▏ | 108/500 [00:01<00:03, 101.65it/s] 24%|██▍ | 119/500 [00:01<00:03, 95.50it/s] 26%|██▌ | 129/500 [00:01<00:03, 95.98it/s] 28%|██▊ | 139/500 [00:01<00:03, 96.63it/s] 30%|███ | 150/500 [00:01<00:03, 99.35it/s] 32%|███▏ | 161/500 [00:01<00:03, 100.63it/s] 34%|███▍ | 172/500 [00:01<00:03, 100.10it/s] 37%|███▋ | 183/500 [00:01<00:03, 100.51it/s] 39%|███▉ | 194/500 [00:01<00:03, 101.54it/s] 41%|████ | 205/500 [00:02<00:02, 101.33it/s] 43%|████▎ | 216/500 [00:02<00:02, 101.06it/s] 45%|████▌ | 227/500 [00:02<00:02, 97.06it/s] 48%|████▊ | 238/500 [00:02<00:02, 98.13it/s] 50%|████▉ | 248/500 [00:02<00:02, 97.00it/s] 52%|█████▏ | 259/500 [00:02<00:02, 99.29it/s] 54%|█████▍ | 270/500 [00:02<00:02, 100.91it/s] 56%|█████▌ | 281/500 [00:02<00:02, 99.03it/s] 58%|█████▊ | 292/500 [00:02<00:02, 100.47it/s] 61%|██████ | 303/500 [00:03<00:01, 101.00it/s] 63%|██████▎ | 314/500 [00:03<00:01, 100.31it/s] 65%|██████▌ | 325/500 [00:03<00:01, 96.07it/s] 67%|██████▋ | 336/500 [00:03<00:01, 98.65it/s] 69%|██████▉ | 347/500 [00:03<00:01, 100.12it/s] 72%|███████▏ | 358/500 [00:03<00:01, 101.82it/s] 74%|███████▍ | 369/500 [00:03<00:01, 101.33it/s] 76%|███████▌ | 380/500 [00:03<00:01, 102.42it/s] 78%|███████▊ | 391/500 [00:03<00:01, 104.04it/s] 80%|████████ | 402/500 [00:03<00:00, 105.20it/s] 83%|████████▎ | 413/500 [00:04<00:00, 105.35it/s] 85%|████████▍ | 424/500 [00:04<00:00, 106.35it/s] 87%|████████▋ | 435/500 [00:04<00:00, 95.93it/s] 89%|████████▉ | 445/500 [00:04<00:00, 96.24it/s] 91%|█████████ | 455/500 [00:04<00:00, 96.46it/s] 93%|█████████▎| 465/500 [00:04<00:00, 96.96it/s] 95%|█████████▌| 475/500 [00:04<00:00, 97.08it/s] 97%|█████████▋| 485/500 [00:04<00:00, 96.77it/s] 100%|██████████| 500/500 [00:05<00:00, 99.59it/s] 40%|████ | 4/10 [00:26<00:38, 6.50s/it] 0%| | 0/500 [00:00<?, ?it/s] 2%|▏ | 12/500 [00:00<00:04, 110.73it/s] 5%|▍ | 24/500 [00:00<00:04, 107.40it/s] 7%|▋ | 35/500 [00:00<00:04, 98.61it/s] 9%|▉ | 46/500 [00:00<00:04, 99.23it/s] 11%|█▏ | 57/500 [00:00<00:04, 101.79it/s] 14%|█▎ | 68/500 [00:00<00:04, 103.67it/s] 16%|█▌ | 79/500 [00:00<00:04, 104.50it/s] 18%|█▊ | 90/500 [00:00<00:03, 104.82it/s] 20%|██ | 102/500 [00:00<00:03, 106.77it/s] 23%|██▎ | 113/500 [00:01<00:03, 104.74it/s] 25%|██▍ | 124/500 [00:01<00:03, 105.58it/s] 27%|██▋ | 135/500 [00:01<00:03, 104.41it/s] 29%|██▉ | 146/500 [00:01<00:03, 100.70it/s] 31%|███▏ | 157/500 [00:01<00:03, 101.91it/s] 34%|███▎ | 168/500 [00:01<00:03, 103.41it/s] 36%|███▌ | 179/500 [00:01<00:03, 104.65it/s] 38%|███▊ | 190/500 [00:01<00:03, 101.38it/s] 40%|████ | 201/500 [00:01<00:02, 100.54it/s] 42%|████▏ | 212/500 [00:02<00:02, 100.43it/s] 45%|████▍ | 223/500 [00:02<00:02, 101.92it/s] 47%|████▋ | 234/500 [00:02<00:02, 102.89it/s] 49%|████▉ | 245/500 [00:02<00:02, 96.99it/s] 51%|█████ | 256/500 [00:02<00:02, 99.04it/s] 53%|█████▎ | 267/500 [00:02<00:02, 101.34it/s] 56%|█████▌ | 278/500 [00:02<00:02, 103.34it/s] 58%|█████▊ | 289/500 [00:02<00:02, 101.00it/s] 60%|██████ | 300/500 [00:02<00:01, 102.36it/s] 62%|██████▏ | 311/500 [00:03<00:01, 103.00it/s] 64%|██████▍ | 322/500 [00:03<00:01, 103.16it/s] 67%|██████▋ | 333/500 [00:03<00:01, 104.09it/s] 69%|██████▉ | 344/500 [00:03<00:01, 105.33it/s] 71%|███████ | 355/500 [00:03<00:01, 94.73it/s] 73%|███████▎ | 365/500 [00:03<00:01, 78.82it/s] 75%|███████▍ | 374/500 [00:03<00:01, 77.37it/s] 77%|███████▋ | 383/500 [00:03<00:01, 65.39it/s] 78%|███████▊ | 391/500 [00:04<00:01, 57.52it/s] 80%|███████▉ | 398/500 [00:04<00:01, 54.12it/s] 81%|████████ | 404/500 [00:04<00:01, 52.07it/s] 82%|████████▏ | 410/500 [00:04<00:01, 50.22it/s] 84%|████████▍ | 419/500 [00:04<00:01, 59.21it/s] 85%|████████▌ | 426/500 [00:04<00:01, 55.44it/s] 86%|████████▋ | 432/500 [00:04<00:01, 53.08it/s] 88%|████████▊ | 438/500 [00:05<00:01, 52.05it/s] 89%|████████▉ | 444/500 [00:05<00:01, 50.65it/s] 90%|█████████ | 450/500 [00:05<00:01, 49.49it/s] 91%|█████████ | 456/500 [00:05<00:00, 47.71it/s] 92%|█████████▏| 461/500 [00:05<00:00, 47.63it/s] 93%|█████████▎| 467/500 [00:05<00:00, 48.14it/s] 94%|█████████▍| 472/500 [00:05<00:00, 47.62it/s] 95%|█████████▌| 477/500 [00:05<00:00, 46.27it/s] 96%|█████████▋| 482/500 [00:06<00:00, 47.12it/s] 98%|█████████▊| 488/500 [00:06<00:00, 49.16it/s] 99%|█████████▊| 493/500 [00:06<00:00, 47.76it/s] 100%|██████████| 500/500 [00:06<00:00, 77.96it/s] 50%|█████ | 5/10 [00:32<00:32, 6.47s/it] 0%| | 0/500 [00:00<?, ?it/s] 1%| | 5/500 [00:00<00:11, 43.28it/s] 2%|▏ | 10/500 [00:00<00:11, 43.74it/s] 3%|▎ | 15/500 [00:00<00:10, 45.98it/s] 4%|▍ | 20/500 [00:00<00:10, 46.61it/s] 5%|▌ | 26/500 [00:00<00:09, 49.07it/s] 6%|▌ | 31/500 [00:00<00:09, 48.13it/s] 7%|▋ | 36/500 [00:00<00:09, 48.02it/s] 8%|▊ | 41/500 [00:00<00:09, 46.83it/s] 9%|▉ | 46/500 [00:00<00:09, 47.18it/s] 10%|█ | 51/500 [00:01<00:09, 46.42it/s] 11%|█ | 56/500 [00:01<00:09, 45.03it/s] 12%|█▏ | 61/500 [00:01<00:09, 46.11it/s] 13%|█▎ | 66/500 [00:01<00:09, 45.86it/s] 14%|█▍ | 72/500 [00:01<00:08, 47.81it/s] 16%|█▌ | 78/500 [00:01<00:08, 49.09it/s] 17%|█▋ | 83/500 [00:01<00:08, 48.70it/s] 18%|█▊ | 88/500 [00:01<00:08, 48.12it/s] 19%|█▊ | 93/500 [00:01<00:09, 44.33it/s] 20%|█▉ | 98/500 [00:02<00:08, 45.37it/s] 21%|██ | 104/500 [00:02<00:08, 48.79it/s] 22%|██▏ | 109/500 [00:02<00:07, 49.00it/s] 23%|██▎ | 115/500 [00:02<00:07, 49.39it/s] 24%|██▍ | 120/500 [00:02<00:07, 48.58it/s] 25%|██▌ | 125/500 [00:02<00:07, 48.71it/s] 26%|██▌ | 130/500 [00:02<00:08, 46.04it/s] 27%|██▋ | 136/500 [00:02<00:07, 47.43it/s] 28%|██▊ | 141/500 [00:02<00:07, 47.83it/s] 29%|██▉ | 146/500 [00:03<00:07, 47.76it/s] 30%|███ | 151/500 [00:03<00:07, 48.01it/s] 31%|███ | 156/500 [00:03<00:07, 47.57it/s] 32%|███▏ | 161/500 [00:03<00:07, 47.39it/s] 33%|███▎ | 166/500 [00:03<00:07, 47.40it/s] 34%|███▍ | 171/500 [00:03<00:07, 46.75it/s] 35%|███▌ | 176/500 [00:03<00:06, 47.18it/s] 36%|███▌ | 181/500 [00:03<00:06, 47.73it/s] 37%|███▋ | 186/500 [00:03<00:06, 48.27it/s] 38%|███▊ | 191/500 [00:04<00:06, 47.83it/s] 39%|███▉ | 196/500 [00:04<00:06, 48.28it/s] 40%|████ | 201/500 [00:04<00:06, 48.45it/s] 41%|████▏ | 207/500 [00:04<00:05, 49.36it/s] 42%|████▏ | 212/500 [00:04<00:05, 49.12it/s] 43%|████▎ | 217/500 [00:04<00:05, 47.57it/s] 45%|████▍ | 224/500 [00:04<00:05, 53.36it/s] 47%|████▋ | 235/500 [00:04<00:03, 69.04it/s] 49%|████▉ | 245/500 [00:04<00:03, 76.86it/s] 51%|█████ | 256/500 [00:04<00:02, 85.31it/s] 53%|█████▎ | 267/500 [00:05<00:02, 90.34it/s] 56%|█████▌ | 278/500 [00:05<00:02, 95.06it/s] 58%|█████▊ | 288/500 [00:05<00:02, 95.83it/s] 60%|█████▉ | 298/500 [00:05<00:02, 95.81it/s] 62%|██████▏ | 308/500 [00:05<00:01, 96.66it/s] 64%|██████▎ | 318/500 [00:05<00:02, 85.23it/s] 66%|██████▌ | 328/500 [00:05<00:01, 87.50it/s] 68%|██████▊ | 338/500 [00:05<00:01, 89.69it/s] 70%|██████▉ | 348/500 [00:05<00:01, 89.75it/s] 72%|███████▏ | 358/500 [00:06<00:01, 90.39it/s] 74%|███████▎ | 368/500 [00:06<00:01, 92.30it/s] 76%|███████▌ | 378/500 [00:06<00:01, 93.49it/s] 78%|███████▊ | 388/500 [00:06<00:01, 94.76it/s] 80%|███████▉ | 398/500 [00:06<00:01, 95.23it/s] 82%|████████▏ | 409/500 [00:06<00:00, 98.39it/s] 84%|████████▍ | 419/500 [00:06<00:00, 96.64it/s] 86%|████████▌ | 430/500 [00:06<00:00, 99.65it/s] 88%|████████▊ | 441/500 [00:06<00:00, 102.16it/s] 91%|█████████ | 453/500 [00:07<00:00, 105.47it/s] 93%|█████████▎| 464/500 [00:07<00:00, 105.76it/s] 95%|█████████▌| 475/500 [00:07<00:00, 105.12it/s] 97%|█████████▋| 487/500 [00:07<00:00, 107.12it/s] 100%|██████████| 500/500 [00:07<00:00, 66.94it/s] 60%|██████ | 6/10 [00:40<00:27, 6.82s/it] 0%| | 0/500 [00:00<?, ?it/s] 2%|▏ | 11/500 [00:00<00:04, 106.92it/s] 4%|▍ | 22/500 [00:00<00:04, 108.31it/s] 7%|▋ | 33/500 [00:00<00:04, 95.41it/s] 9%|▉ | 44/500 [00:00<00:04, 100.33it/s] 11%|█ | 56/500 [00:00<00:04, 104.74it/s] 13%|█▎ | 67/500 [00:00<00:04, 105.37it/s] 16%|█▌ | 79/500 [00:00<00:03, 107.50it/s] 18%|█▊ | 91/500 [00:00<00:03, 108.84it/s] 20%|██ | 102/500 [00:00<00:03, 108.65it/s] 23%|██▎ | 114/500 [00:01<00:03, 110.17it/s] 25%|██▌ | 126/500 [00:01<00:03, 110.78it/s] 28%|██▊ | 138/500 [00:01<00:03, 112.11it/s] 30%|███ | 150/500 [00:01<00:03, 97.75it/s] 32%|███▏ | 161/500 [00:01<00:03, 99.95it/s] 35%|███▍ | 173/500 [00:01<00:03, 103.04it/s] 37%|███▋ | 185/500 [00:01<00:02, 105.66it/s] 39%|███▉ | 196/500 [00:01<00:02, 106.25it/s] 41%|████▏ | 207/500 [00:01<00:02, 106.39it/s] 44%|████▎ | 218/500 [00:02<00:02, 107.01it/s] 46%|████▌ | 229/500 [00:02<00:02, 106.39it/s] 48%|████▊ | 240/500 [00:02<00:02, 106.90it/s] 50%|█████ | 251/500 [00:02<00:02, 101.47it/s] 52%|█████▏ | 262/500 [00:02<00:02, 101.64it/s] 55%|█████▍ | 273/500 [00:02<00:02, 101.77it/s] 57%|█████▋ | 284/500 [00:02<00:02, 102.82it/s] 59%|█████▉ | 295/500 [00:02<00:02, 102.25it/s] 61%|██████ | 306/500 [00:02<00:01, 97.65it/s] 63%|██████▎ | 316/500 [00:03<00:01, 97.33it/s] 65%|██████▌ | 327/500 [00:03<00:01, 98.68it/s] 67%|██████▋ | 337/500 [00:03<00:01, 96.41it/s] 69%|██████▉ | 347/500 [00:03<00:01, 97.01it/s] 71%|███████▏ | 357/500 [00:03<00:01, 90.63it/s] 73%|███████▎ | 367/500 [00:03<00:01, 83.98it/s] 75%|███████▌ | 376/500 [00:03<00:01, 83.04it/s] 77%|███████▋ | 385/500 [00:03<00:01, 77.71it/s] 79%|███████▊ | 393/500 [00:04<00:01, 70.25it/s] 80%|████████ | 401/500 [00:04<00:01, 68.96it/s] 82%|████████▏ | 408/500 [00:04<00:01, 68.12it/s] 83%|████████▎ | 415/500 [00:04<00:01, 60.16it/s] 84%|████████▍ | 422/500 [00:04<00:01, 53.00it/s] 87%|████████▋ | 433/500 [00:04<00:01, 65.78it/s] 89%|████████▊ | 443/500 [00:04<00:00, 74.15it/s] 91%|█████████ | 454/500 [00:04<00:00, 83.33it/s] 93%|█████████▎| 466/500 [00:04<00:00, 91.53it/s] 96%|█████████▌| 478/500 [00:05<00:00, 97.38it/s] 98%|█████████▊| 489/500 [00:05<00:00, 100.50it/s] 100%|██████████| 500/500 [00:05<00:00, 94.11it/s] 70%|███████ | 7/10 [00:45<00:18, 6.33s/it] 0%| | 0/500 [00:00<?, ?it/s] 2%|▏ | 12/500 [00:00<00:04, 114.12it/s] 5%|▍ | 24/500 [00:00<00:04, 95.86it/s] 7%|▋ | 34/500 [00:00<00:04, 96.84it/s] 9%|▉ | 44/500 [00:00<00:04, 96.97it/s] 11%|█ | 54/500 [00:00<00:04, 97.42it/s] 13%|█▎ | 64/500 [00:00<00:04, 96.70it/s] 15%|█▍ | 74/500 [00:00<00:04, 97.05it/s] 17%|█▋ | 84/500 [00:00<00:04, 95.56it/s] 19%|█▉ | 94/500 [00:00<00:04, 95.45it/s] 21%|██ | 105/500 [00:01<00:04, 97.31it/s] 23%|██▎ | 116/500 [00:01<00:03, 99.49it/s] 25%|██▌ | 126/500 [00:01<00:03, 95.82it/s] 27%|██▋ | 136/500 [00:01<00:03, 96.30it/s] 29%|██▉ | 147/500 [00:01<00:03, 98.65it/s] 32%|███▏ | 158/500 [00:01<00:03, 99.73it/s] 34%|███▍ | 169/500 [00:01<00:03, 101.34it/s] 36%|███▌ | 180/500 [00:01<00:03, 98.56it/s] 38%|███▊ | 190/500 [00:01<00:03, 91.33it/s] 40%|████ | 200/500 [00:02<00:03, 79.82it/s] 42%|████▏ | 209/500 [00:02<00:04, 70.59it/s] 43%|████▎ | 217/500 [00:02<00:04, 64.33it/s] 45%|████▍ | 224/500 [00:02<00:04, 61.40it/s] 46%|████▌ | 231/500 [00:02<00:04, 59.62it/s] 48%|████▊ | 238/500 [00:02<00:04, 56.36it/s] 49%|████▉ | 244/500 [00:02<00:04, 55.66it/s] 50%|█████ | 250/500 [00:03<00:04, 52.44it/s] 51%|█████ | 256/500 [00:03<00:04, 51.23it/s] 52%|█████▏ | 262/500 [00:03<00:04, 50.17it/s] 54%|█████▎ | 268/500 [00:03<00:04, 50.73it/s] 55%|█████▍ | 274/500 [00:03<00:04, 51.58it/s] 56%|█████▌ | 280/500 [00:03<00:04, 50.48it/s] 57%|█████▋ | 286/500 [00:03<00:04, 50.14it/s] 59%|█████▊ | 293/500 [00:03<00:03, 55.29it/s] 60%|█████▉ | 299/500 [00:04<00:03, 50.39it/s] 61%|██████▏ | 307/500 [00:04<00:03, 57.00it/s] 63%|██████▎ | 313/500 [00:04<00:03, 53.34it/s] 64%|██████▍ | 319/500 [00:04<00:03, 53.43it/s] 65%|██████▌ | 325/500 [00:04<00:03, 50.78it/s] 66%|██████▌ | 331/500 [00:04<00:03, 49.84it/s] 67%|██████▋ | 337/500 [00:04<00:03, 48.72it/s] 69%|██████▊ | 343/500 [00:04<00:03, 49.90it/s] 70%|██████▉ | 349/500 [00:05<00:02, 50.48it/s] 71%|███████ | 355/500 [00:05<00:02, 51.78it/s] 72%|███████▏ | 361/500 [00:05<00:02, 52.69it/s] 73%|███████▎ | 367/500 [00:05<00:02, 51.86it/s] 75%|███████▍ | 373/500 [00:05<00:02, 52.55it/s] 76%|███████▌ | 379/500 [00:05<00:02, 50.84it/s] 77%|███████▋ | 385/500 [00:05<00:02, 48.98it/s] 78%|███████▊ | 391/500 [00:05<00:02, 49.51it/s] 80%|███████▉ | 399/500 [00:05<00:01, 56.73it/s] 81%|████████ | 405/500 [00:06<00:01, 52.31it/s] 82%|████████▏ | 411/500 [00:06<00:01, 51.39it/s] 83%|████████▎ | 417/500 [00:06<00:01, 51.57it/s] 85%|████████▍ | 423/500 [00:06<00:01, 51.19it/s] 86%|████████▌ | 429/500 [00:06<00:01, 50.58it/s] 87%|████████▋ | 435/500 [00:06<00:01, 49.35it/s] 88%|████████▊ | 440/500 [00:06<00:01, 49.30it/s] 89%|████████▉ | 445/500 [00:06<00:01, 49.43it/s] 90%|█████████ | 451/500 [00:07<00:00, 49.67it/s] 91%|█████████ | 456/500 [00:07<00:00, 49.68it/s] 92%|█████████▏| 461/500 [00:07<00:00, 49.68it/s] 93%|█████████▎| 467/500 [00:07<00:00, 50.03it/s] 95%|█████████▍| 473/500 [00:07<00:00, 49.81it/s] 96%|█████████▌| 478/500 [00:07<00:00, 49.12it/s] 97%|█████████▋| 483/500 [00:07<00:00, 48.70it/s] 98%|█████████▊| 488/500 [00:07<00:00, 49.02it/s] 99%|█████████▊| 493/500 [00:07<00:00, 49.20it/s] 100%|██████████| 500/500 [00:08<00:00, 62.22it/s] 80%|████████ | 8/10 [00:53<00:13, 6.88s/it] 0%| | 0/500 [00:00<?, ?it/s] 1%| | 6/500 [00:00<00:09, 51.02it/s] 3%|▎ | 17/500 [00:00<00:06, 79.81it/s] 6%|▌ | 28/500 [00:00<00:05, 89.79it/s] 8%|▊ | 39/500 [00:00<00:04, 94.71it/s] 10%|▉ | 49/500 [00:00<00:04, 94.84it/s] 12%|█▏ | 59/500 [00:00<00:04, 91.34it/s] 14%|█▍ | 69/500 [00:00<00:04, 92.87it/s] 16%|█▌ | 80/500 [00:00<00:04, 95.59it/s] 18%|█▊ | 90/500 [00:00<00:04, 96.37it/s] 20%|██ | 100/500 [00:01<00:04, 96.89it/s] 22%|██▏ | 111/500 [00:01<00:03, 99.29it/s] 24%|██▍ | 121/500 [00:01<00:03, 94.91it/s] 26%|██▌ | 131/500 [00:01<00:04, 86.85it/s] 28%|██▊ | 141/500 [00:01<00:04, 88.25it/s] 30%|███ | 150/500 [00:01<00:04, 80.42it/s] 32%|███▏ | 160/500 [00:01<00:04, 84.02it/s] 34%|███▍ | 169/500 [00:01<00:04, 75.34it/s] 35%|███▌ | 177/500 [00:02<00:04, 70.33it/s] 37%|███▋ | 185/500 [00:02<00:04, 63.05it/s] 38%|███▊ | 192/500 [00:02<00:05, 58.86it/s] 40%|███▉ | 199/500 [00:02<00:05, 56.16it/s] 41%|████ | 205/500 [00:02<00:05, 53.54it/s] 42%|████▏ | 211/500 [00:02<00:05, 51.74it/s] 43%|████▎ | 217/500 [00:02<00:05, 49.63it/s] 45%|████▍ | 224/500 [00:03<00:05, 53.85it/s] 47%|████▋ | 235/500 [00:03<00:03, 67.30it/s] 49%|████▉ | 246/500 [00:03<00:03, 77.25it/s] 51%|█████▏ | 257/500 [00:03<00:02, 84.76it/s] 53%|█████▎ | 267/500 [00:03<00:02, 88.27it/s] 55%|█████▌ | 277/500 [00:03<00:02, 91.19it/s] 58%|█████▊ | 288/500 [00:03<00:02, 94.12it/s] 60%|█████▉ | 298/500 [00:03<00:02, 89.03it/s] 62%|██████▏ | 308/500 [00:03<00:02, 91.80it/s] 64%|██████▎ | 318/500 [00:03<00:01, 93.41it/s] 66%|██████▌ | 329/500 [00:04<00:01, 96.31it/s] 68%|██████▊ | 340/500 [00:04<00:01, 98.00it/s] 70%|███████ | 351/500 [00:04<00:01, 99.80it/s] 72%|███████▏ | 362/500 [00:04<00:01, 97.14it/s] 74%|███████▍ | 372/500 [00:04<00:01, 96.52it/s] 76%|███████▋ | 382/500 [00:04<00:01, 95.60it/s] 79%|███████▊ | 393/500 [00:04<00:01, 97.71it/s] 81%|████████ | 403/500 [00:04<00:01, 91.69it/s] 83%|████████▎ | 414/500 [00:04<00:00, 94.55it/s] 85%|████████▌ | 425/500 [00:05<00:00, 97.02it/s] 87%|████████▋ | 435/500 [00:05<00:00, 97.54it/s] 89%|████████▉ | 445/500 [00:05<00:00, 97.88it/s] 91%|█████████ | 455/500 [00:05<00:00, 96.18it/s] 93%|█████████▎| 466/500 [00:05<00:00, 99.68it/s] 95%|█████████▌| 476/500 [00:05<00:00, 97.57it/s] 97%|█████████▋| 486/500 [00:05<00:00, 97.19it/s] 100%|██████████| 500/500 [00:05<00:00, 85.16it/s] 90%|█████████ | 9/10 [00:59<00:06, 6.57s/it] 0%| | 0/500 [00:00<?, ?it/s] 2%|▏ | 11/500 [00:00<00:04, 102.77it/s] 4%|▍ | 22/500 [00:00<00:04, 102.90it/s] 7%|▋ | 33/500 [00:00<00:04, 102.25it/s] 9%|▉ | 44/500 [00:00<00:04, 102.72it/s] 11%|█ | 55/500 [00:00<00:04, 101.95it/s] 13%|█▎ | 66/500 [00:00<00:04, 102.01it/s] 15%|█▌ | 77/500 [00:00<00:04, 100.72it/s] 18%|█▊ | 88/500 [00:00<00:04, 100.07it/s] 20%|█▉ | 99/500 [00:01<00:04, 93.73it/s] 22%|██▏ | 110/500 [00:01<00:04, 96.06it/s] 24%|██▍ | 121/500 [00:01<00:03, 98.52it/s] 26%|██▋ | 132/500 [00:01<00:03, 100.40it/s] 29%|██▊ | 143/500 [00:01<00:03, 99.46it/s] 31%|███ | 153/500 [00:01<00:03, 95.72it/s] 33%|███▎ | 163/500 [00:01<00:03, 95.78it/s] 35%|███▍ | 173/500 [00:01<00:03, 95.37it/s] 37%|███▋ | 183/500 [00:01<00:03, 95.56it/s] 39%|███▉ | 194/500 [00:01<00:03, 97.78it/s] 41%|████ | 204/500 [00:02<00:03, 90.00it/s] 43%|████▎ | 214/500 [00:02<00:03, 92.67it/s] 45%|████▌ | 225/500 [00:02<00:02, 95.63it/s] 47%|████▋ | 235/500 [00:02<00:02, 94.86it/s] 49%|████▉ | 245/500 [00:02<00:02, 94.84it/s] 51%|█████ | 255/500 [00:02<00:02, 95.71it/s] 53%|█████▎ | 265/500 [00:02<00:02, 94.31it/s] 55%|█████▌ | 275/500 [00:02<00:02, 93.92it/s] 57%|█████▋ | 286/500 [00:02<00:02, 96.48it/s] 59%|█████▉ | 296/500 [00:03<00:02, 93.55it/s] 61%|██████ | 306/500 [00:03<00:02, 93.68it/s] 63%|██████▎ | 317/500 [00:03<00:01, 97.29it/s] 66%|██████▌ | 328/500 [00:03<00:01, 99.66it/s] 68%|██████▊ | 338/500 [00:03<00:01, 99.36it/s] 70%|██████▉ | 348/500 [00:03<00:01, 98.75it/s] 72%|███████▏ | 359/500 [00:03<00:01, 100.74it/s] 74%|███████▍ | 370/500 [00:03<00:01, 101.09it/s] 76%|███████▌ | 381/500 [00:03<00:01, 101.80it/s] 78%|███████▊ | 392/500 [00:04<00:01, 102.78it/s] 81%|████████ | 403/500 [00:04<00:01, 96.15it/s] 83%|████████▎ | 413/500 [00:04<00:00, 93.85it/s] 85%|████████▍ | 423/500 [00:04<00:00, 85.48it/s] 86%|████████▋ | 432/500 [00:04<00:00, 72.63it/s] 88%|████████▊ | 441/500 [00:04<00:00, 74.40it/s] 90%|████████▉ | 449/500 [00:04<00:00, 70.22it/s] 91%|█████████▏| 457/500 [00:04<00:00, 64.12it/s] 93%|█████████▎| 464/500 [00:05<00:00, 62.57it/s] 94%|█████████▍| 471/500 [00:05<00:00, 57.59it/s] 95%|█████████▌| 477/500 [00:05<00:00, 55.26it/s] 97%|█████████▋| 483/500 [00:05<00:00, 54.22it/s] 98%|█████████▊| 489/500 [00:05<00:00, 54.14it/s] 100%|██████████| 500/500 [00:05<00:00, 86.84it/s] 100%|██████████| 10/10 [01:05<00:00, 6.55s/it]
View first 5 rows of the data
# View the head of the DataFrame
dataset.head()
features | class | |
---|---|---|
0 | [-624.9928, 26.151525, 29.532307, 30.333982, 1... | 6 |
1 | [-640.4427, 76.820496, 15.161342, 52.54554, 35... | 3 |
2 | [-560.63367, 96.26038, 6.234333, 20.001095, 23... | 0 |
3 | [-560.98895, 82.12716, -8.395368, 25.753336, 1... | 5 |
4 | [-654.49335, 147.02797, 18.860567, 20.616182, ... | 1 |
# Storing the class as int
dataset['class'] = [int(x) for x in dataset['class']]
# Check the frequency of classes in the dataset
dataset['class'].value_counts()
count | |
---|---|
class | |
6 | 500 |
3 | 500 |
0 | 500 |
5 | 500 |
1 | 500 |
4 | 500 |
7 | 500 |
8 | 500 |
2 | 500 |
9 | 500 |
Visualizing the Mel Frequency Cepstral Coefficients Using a Spectrogram¶
draw_spectrograms
: From the Mel Coefficients we are extracting for a particular audio, this function is creating the 2-D graph of those coefficients with the X-axis representing time and the Y-axis shows the corresponding Mel coefficients in that time step.
# A function which returns MFCC
def draw_spectrograms(audio_data, sample_rate):
# Extract features
extracted_features = librosa.feature.mfcc(y = audio_data,
sr = sample_rate,
n_mfcc = 40)
# Return features without scaling
return extracted_features
The very first MFCC coefficient (0th coefficient) does not provide information about the overall shape of the spectrum. It simply communicates a constant offset or the addition of a constant value to the full spectrum. As a result, when performing classification, many practitioners will disregard the initial MFCC. In the images, you can see those represented by blue pixels.
We can plot the MFCCs, but it's difficult to tell what kind of signal is hiding behind such representation.
# Creating subplots
fig, ax = plt.subplots(5, 2, figsize = (15, 30))
# Initializing row and column variables for subplots
row = 0
column = 0
for digit in range(10):
# Get the audio of different classes (0-9)
audio_data, sample_rate = get_audio_raw(digit)
# Extract their MFCC
mfcc = draw_spectrograms(audio_data, sample_rate)
print(f"Shape of MFCC of audio digit {digit} ---> ", mfcc.shape)
# Display the plots and its title
ax[row,column].set_title(f"MFCC of audio class {digit} across time")
librosa.display.specshow(mfcc, sr = 22050, ax = ax[row, column])
# Set X-labels and Y-labels
ax[row,column].set_xlabel("Time")
ax[row,column].set_ylabel("MFCC Coefficients")
# Conditions for positioning of the plots
if column == 1:
column = 0
row += 1
else:
column+=1
plt.tight_layout(pad = 3)
plt.show()
Shape of MFCC of audio digit 0 ---> (40, 34) Shape of MFCC of audio digit 1 ---> (40, 24) Shape of MFCC of audio digit 2 ---> (40, 25) Shape of MFCC of audio digit 3 ---> (40, 24) Shape of MFCC of audio digit 4 ---> (40, 24) Shape of MFCC of audio digit 5 ---> (40, 28) Shape of MFCC of audio digit 6 ---> (40, 36) Shape of MFCC of audio digit 7 ---> (40, 28) Shape of MFCC of audio digit 8 ---> (40, 20) Shape of MFCC of audio digit 9 ---> (40, 22)
Visual Inspection of MFCC Spectrograms:
On inspecting them visually, we can see that there are a lot of deviations from the spectrograms of one audio to another. There are a lot of tiny rectangles and bars whose positions are unique to each audio. So, the Artificial Neural Network should be able to perform decently in identifying these audios.
Perform Train-Test-Split¶
- Split the data into train and test sets
# Import train_test_split function
from sklearn.model_selection import train_test_split
X = np.array(dataset['features'].to_list())
Y = np.array(dataset['class'].to_list())
# Create train set and test set
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size = 0.75, shuffle = True, random_state = 8)
# Checking the shape of the data
X_train.shape
(3750, 40)
Modeling¶
- Create an artificial neural network to recognize the digit.
About the libraries:
Keras
: Keras is an open-source deep-learning library in Python. Keras is popular because the API was clean and simple, allowing standard deep learning models to be defined, fit, and evaluated in just a few lines of code.Sklearn
:- Simple and efficient tools for predictive data analysis
- Accessible to everybody, and reusable in various contexts
- Built on NumPy, SciPy, and matplotlib
- Open source, commercially usable
Import necessary libraries for building the model¶
# To create an ANN model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
# To create a checkpoint and save the best model
from tensorflow.keras.callbacks import ModelCheckpoint
# To load the model
from tensorflow.keras.models import load_model
# To evaluate the model
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.preprocessing import LabelBinarizer
Model Creation¶
Why are we using ANN's?¶
When we are converting audios to their corresponding spectrograms, we will have similar spectrograms for similar audios irrespective of who the speaker is, and what is their pitch and timber like. So local spatiality is never going to be a problem. So having convolutional layers on top of our fully connected layers is just adding to our computational redundancy.
We will use a Sequential model with multiple connected hidden layers, and an output layer that returns a single, continuous value.
- A Sequential model is a linear stack of layers. Sequential models can be created by giving a list of layer instances.
- A dense layer of neurons is a simple layer of neurons in which each neuron receives input from all of the neurons in the previous layer.
- The most popular function employed for hidden layers is the rectified linear activation function, or ReLU activation function. It's popular because it's easy to use and effective in getting around the limitations of other popular activation functions like Sigmoid and Tanh.
# Crete a Sequential Object
model = Sequential()
# Add first layer with 100 neurons to the sequental object
model.add(Dense(100, input_shape = (40, ), activation = 'relu'))
# Add second layer with 100 neurons to the sequental object
model.add(Dense(100, activation = 'relu'))
# Add third layer with 100 neurons to the sequental object
model.add(Dense(100, activation = 'relu'))
# Output layer with 10 neurons as it has 10 classes
model.add(Dense(10, activation = 'softmax'))
/usr/local/lib/python3.10/dist-packages/keras/src/layers/core/dense.py:87: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead. super().__init__(activity_regularizer=activity_regularizer, **kwargs)
# Print Summary of the model
model.summary()
Model: "sequential_1"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩ │ dense_4 (Dense) │ (None, 100) │ 4,100 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_5 (Dense) │ (None, 100) │ 10,100 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_6 (Dense) │ (None, 100) │ 10,100 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_7 (Dense) │ (None, 10) │ 1,010 │ └──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
Total params: 25,310 (98.87 KB)
Trainable params: 25,310 (98.87 KB)
Non-trainable params: 0 (0.00 B)
# Compile the model
model.compile(loss = 'sparse_categorical_crossentropy',
metrics = ['accuracy'],
optimizer = 'adam')
Model Checkpoint & Training¶
# Set the number of epochs for training
num_epochs = 100
# Set the batch size for training
batch_size = 32
# Fit the model
model.fit(X_train, Y_train, validation_data = (X_test, Y_test), epochs = num_epochs, batch_size = batch_size, verbose = 1)
Epoch 1/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.3217 - loss: 8.3036 - val_accuracy: 0.7176 - val_loss: 0.7922 Epoch 2/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.7643 - loss: 0.6603 - val_accuracy: 0.6984 - val_loss: 0.9958 Epoch 3/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.8312 - loss: 0.5183 - val_accuracy: 0.9464 - val_loss: 0.2023 Epoch 4/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.9347 - loss: 0.2126 - val_accuracy: 0.8848 - val_loss: 0.3381 Epoch 5/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9246 - loss: 0.2124 - val_accuracy: 0.9496 - val_loss: 0.1648 Epoch 6/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9580 - loss: 0.1467 - val_accuracy: 0.8240 - val_loss: 0.5829 Epoch 7/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.9220 - loss: 0.2242 - val_accuracy: 0.9376 - val_loss: 0.1729 Epoch 8/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9210 - loss: 0.2432 - val_accuracy: 0.9696 - val_loss: 0.0823 Epoch 9/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9644 - loss: 0.0971 - val_accuracy: 0.9648 - val_loss: 0.1145 Epoch 10/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9664 - loss: 0.1077 - val_accuracy: 0.9536 - val_loss: 0.1233 Epoch 11/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9666 - loss: 0.0932 - val_accuracy: 0.9496 - val_loss: 0.1567 Epoch 12/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9681 - loss: 0.1010 - val_accuracy: 0.8968 - val_loss: 0.2989 Epoch 13/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9393 - loss: 0.1721 - val_accuracy: 0.9520 - val_loss: 0.1396 Epoch 14/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9676 - loss: 0.0911 - val_accuracy: 0.9256 - val_loss: 0.2266 Epoch 15/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9427 - loss: 0.1789 - val_accuracy: 0.9368 - val_loss: 0.2089 Epoch 16/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9665 - loss: 0.0984 - val_accuracy: 0.9664 - val_loss: 0.0975 Epoch 17/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.9814 - loss: 0.0546 - val_accuracy: 0.9688 - val_loss: 0.0961 Epoch 18/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.9734 - loss: 0.0773 - val_accuracy: 0.9776 - val_loss: 0.0713 Epoch 19/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.9741 - loss: 0.0742 - val_accuracy: 0.9688 - val_loss: 0.1106 Epoch 20/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9722 - loss: 0.0777 - val_accuracy: 0.9688 - val_loss: 0.0876 Epoch 21/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9800 - loss: 0.0600 - val_accuracy: 0.9576 - val_loss: 0.1198 Epoch 22/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.9712 - loss: 0.0806 - val_accuracy: 0.9848 - val_loss: 0.0560 Epoch 23/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.9789 - loss: 0.0568 - val_accuracy: 0.9760 - val_loss: 0.0746 Epoch 24/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - accuracy: 0.9779 - loss: 0.0519 - val_accuracy: 0.9784 - val_loss: 0.0620 Epoch 25/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.9813 - loss: 0.0456 - val_accuracy: 0.9896 - val_loss: 0.0310 Epoch 26/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.9733 - loss: 0.0837 - val_accuracy: 0.9816 - val_loss: 0.0518 Epoch 27/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.9661 - loss: 0.1009 - val_accuracy: 0.9536 - val_loss: 0.1363 Epoch 28/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.9750 - loss: 0.0807 - val_accuracy: 0.9880 - val_loss: 0.0400 Epoch 29/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.9910 - loss: 0.0249 - val_accuracy: 0.9032 - val_loss: 0.4782 Epoch 30/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.9653 - loss: 0.1130 - val_accuracy: 0.9584 - val_loss: 0.1406 Epoch 31/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9712 - loss: 0.0758 - val_accuracy: 0.9600 - val_loss: 0.1058 Epoch 32/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9839 - loss: 0.0414 - val_accuracy: 0.9712 - val_loss: 0.0947 Epoch 33/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9825 - loss: 0.0441 - val_accuracy: 0.9728 - val_loss: 0.0893 Epoch 34/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9867 - loss: 0.0408 - val_accuracy: 0.9864 - val_loss: 0.0373 Epoch 35/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9946 - loss: 0.0150 - val_accuracy: 0.9760 - val_loss: 0.0799 Epoch 36/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9910 - loss: 0.0259 - val_accuracy: 0.9800 - val_loss: 0.0729 Epoch 37/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9730 - loss: 0.0621 - val_accuracy: 0.9896 - val_loss: 0.0314 Epoch 38/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9869 - loss: 0.0392 - val_accuracy: 0.9784 - val_loss: 0.0669 Epoch 39/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9859 - loss: 0.0406 - val_accuracy: 0.9816 - val_loss: 0.0514 Epoch 40/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.9888 - loss: 0.0377 - val_accuracy: 0.9632 - val_loss: 0.1168 Epoch 41/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.9897 - loss: 0.0297 - val_accuracy: 0.9856 - val_loss: 0.0361 Epoch 42/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9859 - loss: 0.0379 - val_accuracy: 0.9704 - val_loss: 0.1161 Epoch 43/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9841 - loss: 0.0446 - val_accuracy: 0.9592 - val_loss: 0.1443 Epoch 44/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9866 - loss: 0.0427 - val_accuracy: 0.9896 - val_loss: 0.0340 Epoch 45/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9882 - loss: 0.0335 - val_accuracy: 0.9864 - val_loss: 0.0634 Epoch 46/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9761 - loss: 0.0822 - val_accuracy: 0.9824 - val_loss: 0.0626 Epoch 47/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9958 - loss: 0.0141 - val_accuracy: 0.9864 - val_loss: 0.0568 Epoch 48/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.9943 - loss: 0.0152 - val_accuracy: 0.9848 - val_loss: 0.0476 Epoch 49/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.9961 - loss: 0.0181 - val_accuracy: 0.9856 - val_loss: 0.0413 Epoch 50/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - accuracy: 0.9970 - loss: 0.0103 - val_accuracy: 0.9728 - val_loss: 0.1119 Epoch 51/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.9970 - loss: 0.0153 - val_accuracy: 0.9872 - val_loss: 0.0497 Epoch 52/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.9980 - loss: 0.0063 - val_accuracy: 0.9848 - val_loss: 0.0615 Epoch 53/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.9787 - loss: 0.0645 - val_accuracy: 0.9576 - val_loss: 0.2024 Epoch 54/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9704 - loss: 0.1012 - val_accuracy: 0.9832 - val_loss: 0.0504 Epoch 55/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.9928 - loss: 0.0189 - val_accuracy: 0.9856 - val_loss: 0.0455 Epoch 56/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9953 - loss: 0.0163 - val_accuracy: 0.9832 - val_loss: 0.0532 Epoch 57/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9904 - loss: 0.0246 - val_accuracy: 0.9912 - val_loss: 0.0296 Epoch 58/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.9954 - loss: 0.0120 - val_accuracy: 0.9904 - val_loss: 0.0243 Epoch 59/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 1.0000 - loss: 0.0012 - val_accuracy: 0.9912 - val_loss: 0.0270 Epoch 60/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 1.0000 - loss: 6.3428e-04 - val_accuracy: 0.9920 - val_loss: 0.0279 Epoch 61/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 1.0000 - loss: 5.6069e-04 - val_accuracy: 0.9904 - val_loss: 0.0293 Epoch 62/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 1.0000 - loss: 4.6827e-04 - val_accuracy: 0.9912 - val_loss: 0.0283 Epoch 63/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 1.0000 - loss: 3.9484e-04 - val_accuracy: 0.9896 - val_loss: 0.0293 Epoch 64/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 1.0000 - loss: 3.5886e-04 - val_accuracy: 0.9912 - val_loss: 0.0288 Epoch 65/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 1.0000 - loss: 3.1961e-04 - val_accuracy: 0.9920 - val_loss: 0.0290 Epoch 66/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 1.0000 - loss: 3.5523e-04 - val_accuracy: 0.9896 - val_loss: 0.0274 Epoch 67/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 1.0000 - loss: 2.4252e-04 - val_accuracy: 0.9912 - val_loss: 0.0294 Epoch 68/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 1.0000 - loss: 2.1483e-04 - val_accuracy: 0.9912 - val_loss: 0.0280 Epoch 69/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 1.0000 - loss: 2.8091e-04 - val_accuracy: 0.9912 - val_loss: 0.0290 Epoch 70/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 1.0000 - loss: 1.9054e-04 - val_accuracy: 0.9896 - val_loss: 0.0317 Epoch 71/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 1.0000 - loss: 2.5850e-04 - val_accuracy: 0.9928 - val_loss: 0.0281 Epoch 72/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 1.0000 - loss: 2.2865e-04 - val_accuracy: 0.9912 - val_loss: 0.0291 Epoch 73/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 1.0000 - loss: 1.4773e-04 - val_accuracy: 0.9912 - val_loss: 0.0293 Epoch 74/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 1.0000 - loss: 1.7660e-04 - val_accuracy: 0.9904 - val_loss: 0.0326 Epoch 75/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 1.0000 - loss: 1.6848e-04 - val_accuracy: 0.9920 - val_loss: 0.0307 Epoch 76/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 1.0000 - loss: 1.7139e-04 - val_accuracy: 0.9936 - val_loss: 0.0297 Epoch 77/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 1.0000 - loss: 1.5607e-04 - val_accuracy: 0.9912 - val_loss: 0.0296 Epoch 78/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 1.0000 - loss: 1.5534e-04 - val_accuracy: 0.9904 - val_loss: 0.0327 Epoch 79/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 1.0000 - loss: 1.7493e-04 - val_accuracy: 0.9928 - val_loss: 0.0282 Epoch 80/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 1.0000 - loss: 1.5362e-04 - val_accuracy: 0.9784 - val_loss: 0.0750 Epoch 81/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9449 - loss: 0.2314 - val_accuracy: 0.9664 - val_loss: 0.1038 Epoch 82/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9771 - loss: 0.0689 - val_accuracy: 0.9032 - val_loss: 0.3474 Epoch 83/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9679 - loss: 0.1102 - val_accuracy: 0.9816 - val_loss: 0.0749 Epoch 84/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9922 - loss: 0.0267 - val_accuracy: 0.9872 - val_loss: 0.0366 Epoch 85/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.9904 - loss: 0.0293 - val_accuracy: 0.9912 - val_loss: 0.0358 Epoch 86/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.9956 - loss: 0.0095 - val_accuracy: 0.9864 - val_loss: 0.0356 Epoch 87/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9940 - loss: 0.0172 - val_accuracy: 0.9896 - val_loss: 0.0319 Epoch 88/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9988 - loss: 0.0065 - val_accuracy: 0.9904 - val_loss: 0.0366 Epoch 89/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9969 - loss: 0.0149 - val_accuracy: 0.9784 - val_loss: 0.1229 Epoch 90/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.9921 - loss: 0.0307 - val_accuracy: 0.9800 - val_loss: 0.0719 Epoch 91/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9892 - loss: 0.0367 - val_accuracy: 0.9848 - val_loss: 0.0592 Epoch 92/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9989 - loss: 0.0042 - val_accuracy: 0.9880 - val_loss: 0.0472 Epoch 93/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.9995 - loss: 0.0021 - val_accuracy: 0.9832 - val_loss: 0.0844 Epoch 94/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.9942 - loss: 0.0151 - val_accuracy: 0.9880 - val_loss: 0.0458 Epoch 95/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.9965 - loss: 0.0108 - val_accuracy: 0.9872 - val_loss: 0.0548 Epoch 96/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9907 - loss: 0.0355 - val_accuracy: 0.9888 - val_loss: 0.0371 Epoch 97/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.9991 - loss: 0.0036 - val_accuracy: 0.9832 - val_loss: 0.0775 Epoch 98/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.9988 - loss: 0.0029 - val_accuracy: 0.9920 - val_loss: 0.0266 Epoch 99/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 1.0000 - loss: 3.7711e-04 - val_accuracy: 0.9944 - val_loss: 0.0219 Epoch 100/100 118/118 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 1.0000 - loss: 1.9965e-04 - val_accuracy: 0.9944 - val_loss: 0.0229
<keras.src.callbacks.history.History at 0x7e8715c620e0>
Model Evaluation¶
# Make predictions on the test set
Y_pred = model.predict(X_test)
Y_pred = [np.argmax(i) for i in Y_pred]
# Set style as dark
sns.set_style("dark")
# Set figure size
plt.figure(figsize = (15, 8))
# Plot the title
plt.title("CONFUSION MATRIX FOR MNIST AUDIO PREDICTION")
# Confusion matrix
cm = confusion_matrix([int(x) for x in Y_test], Y_pred)
# Plot the confusion matrix as heatmap
sns.heatmap(cm, annot = True, cmap = "cool", fmt = 'g', cbar = False)
# Set X-label and Y-label
plt.xlabel("ACTUAL VALUES")
plt.ylabel("PREDICTED VALUES")
# Show the plot
plt.show()
# Print the metrics
print(classification_report(Y_test, Y_pred))
precision recall f1-score support 0 0.99 0.99 0.99 125 1 1.00 1.00 1.00 141 2 0.99 0.99 0.99 137 3 0.99 0.98 0.98 125 4 1.00 1.00 1.00 122 5 0.99 1.00 1.00 134 6 1.00 1.00 1.00 106 7 0.98 0.99 0.99 118 8 1.00 1.00 1.00 123 9 1.00 0.99 1.00 119 accuracy 0.99 1250 macro avg 0.99 0.99 0.99 1250 weighted avg 0.99 0.99 0.99 1250
Observations:
- From the confusion matrix, we can observe that most of the observations are correctly identified by the model.
- In very few cases, the model is not able to identify the correct digit. For example, 9 observations are 0 but the model has predicted them as 2.
- The model has given a great performance with 99% recall, precision and F1-score.
# Convert notebook to html
!jupyter nbconvert --to html "/content/drive/MyDrive/MIT - Data Sciences/Colab Notebooks/Week_Six_-_Deep_Learning/Audio_MNIST_Digit_Recognition/Audio_MNIST_Digit_Recognition.ipynb"