Hospital Length of Stay (LOS) Prediction¶
Context:¶
Hospital management is a vital area that gained a lot of attention during the COVID-19 pandemic. Inefficient distribution of resources like beds, ventilators might lead to a lot of complications. However, this can be mitigated by predicting the length of stay (LOS) of a patient before getting admitted. Once this is determined, the hospital can plan a suitable treatment, resources, and staff to reduce the LOS and increase the chances of recovery. The rooms and bed can also be planned in accordance with that.
HealthPlus hospital has been incurring a lot of losses in revenue and life due to its inefficient management system. They have been unsuccessful in allocating pieces of equipment, beds, and hospital staff fairly. A system that could estimate the length of stay (LOS) of a patient can solve this problem to a great extent.
Objective:¶
As a Data Scientist, you have been hired by HealthPlus to analyze the data, find out what factors affect the LOS the most, and come up with a machine learning model which can predict the LOS of a patient using the data available during admission and after running a few tests. Also, bring about useful insights and policies from the data, which can help the hospital to improve their health care infrastructure and revenue.
Data Dictionary:¶
The data contains various information recorded during the time of admission of the patient. It only contains records of patients who were admitted to the hospital. The detailed data dictionary is given below:
- patientid: Patient ID
- Age: Range of age of the patient
- gender: Gender of the patient
- Type of Admission: Trauma, emergency or urgent
- Severity of Illness: Extreme, moderate, or minor
- health_conditions: Any previous health conditions suffered by the patient
- Visitors with Patient: The number of patients who accompany the patient
- Insurance: Does the patient have health insurance or not?
- Admission_Deposit: The deposit paid by the patient during admission
- Stay (in days): The number of days that the patient has stayed in the hospital. This is the target variable
- Available Extra Rooms in Hospital: The number of rooms available during admission
- Department: The department which will be treating the patient
- Ward_Facility_Code: The code of the ward facility in which the patient will be admitted
- doctor_name: The doctor who will be treating the patient
- staff_available: The number of staff who are not occupied at the moment in the ward
Approach to solve the problem:¶
- Import the necessary libraries
- Read the dataset and get an overview
- Exploratory data analysis - a. Univariate b. Bivariate
- Data preprocessing if any
- Define the performance metric and build ML models
- Checking for assumptions
- Compare models and determine the best one
- Observations and business insights
Importing Libraries¶
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")
# Removes the limit for the number of displayed columns
pd.set_option("display.max_columns", None)
# Sets the limit for the number of displayed rows
pd.set_option("display.max_rows", 200)
# To build models for prediction
from Scikit-learn.model_selection import train_test_split, cross_val_score, KFold
from Scikit-learn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet
from Scikit-learn.tree import DecisionTreeRegressor
from Scikit-learn.ensemble import RandomForestRegressor,BaggingRegressor
# To encode categorical variables
from Scikit-learn.preprocessing import LabelEncoder
# For tuning the model
from Scikit-learn.model_selection import GridSearchCV
# To check model performance
from Scikit-learn.metrics import make_scorer,mean_squared_error, r2_score, mean_absolute_error
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
# Read the healthcare dataset file
data = pd.read_csv("filepath/healthcare_data.csv")
# Copying data to another variable to avoid any changes to original data
same_data = data.copy()
Data Overview¶
# View the first 5 rows of the dataset
data.head()
Available Extra Rooms in Hospital | Department | Ward_Facility_Code | doctor_name | staff_available | patientid | Age | gender | Type of Admission | Severity of Illness | health_conditions | Visitors with Patient | Insurance | Admission_Deposit | Stay (in days) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 4 | gynecology | D | Dr Sophia | 0 | 33070 | 41-50 | Female | Trauma | Extreme | Diabetes | 4 | Yes | 2966.408696 | 8 |
1 | 4 | gynecology | B | Dr Sophia | 2 | 34808 | 31-40 | Female | Trauma | Minor | Heart disease | 2 | No | 3554.835677 | 9 |
2 | 2 | gynecology | B | Dr Sophia | 8 | 44577 | 21-30 | Female | Trauma | Extreme | Diabetes | 2 | Yes | 5624.733654 | 7 |
3 | 4 | gynecology | D | Dr Olivia | 7 | 3695 | 31-40 | Female | Urgent | Moderate | NaN | 4 | No | 4814.149231 | 8 |
4 | 2 | anesthesia | E | Dr Mark | 10 | 108956 | 71-80 | Male | Trauma | Moderate | Diabetes | 2 | No | 5169.269637 | 34 |
# View the last 5 rows of the dataset
data.tail()
Available Extra Rooms in Hospital | Department | Ward_Facility_Code | doctor_name | staff_available | patientid | Age | gender | Type of Admission | Severity of Illness | health_conditions | Visitors with Patient | Insurance | Admission_Deposit | Stay (in days) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
499995 | 4 | gynecology | F | Dr Sarah | 2 | 43001 | 11-20 | Female | Trauma | Minor | High Blood Pressure | 3 | No | 4105.795901 | 10 |
499996 | 13 | gynecology | F | Dr Olivia | 8 | 85601 | 31-40 | Female | Emergency | Moderate | Other | 2 | No | 4631.550257 | 11 |
499997 | 2 | gynecology | B | Dr Sarah | 3 | 22447 | 11-20 | Female | Emergency | Moderate | High Blood Pressure | 2 | No | 5456.930075 | 8 |
499998 | 2 | radiotherapy | A | Dr John | 1 | 29957 | 61-70 | Female | Trauma | Extreme | Diabetes | 2 | No | 4694.127772 | 23 |
499999 | 3 | gynecology | F | Dr Sophia | 3 | 45008 | 41-50 | Female | Trauma | Moderate | Heart disease | 4 | Yes | 4713.868519 | 10 |
# Understand the shape of the data
data.shape
(500000, 15)
- The dataset has 5,00,000 rows and 15 columns.
# Checking the info of the data
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 500000 entries, 0 to 499999 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Available Extra Rooms in Hospital 500000 non-null int64 1 Department 500000 non-null object 2 Ward_Facility_Code 500000 non-null object 3 doctor_name 500000 non-null object 4 staff_available 500000 non-null int64 5 patientid 500000 non-null int64 6 Age 500000 non-null object 7 gender 500000 non-null object 8 Type of Admission 500000 non-null object 9 Severity of Illness 500000 non-null object 10 health_conditions 348112 non-null object 11 Visitors with Patient 500000 non-null int64 12 Insurance 500000 non-null object 13 Admission_Deposit 500000 non-null float64 14 Stay (in days) 500000 non-null int64 dtypes: float64(1), int64(5), object(9) memory usage: 57.2+ MB
Observations:
- Available Extra Rooms in Hospital, staff_available, patientid, Visitors with Patient, Admission_Deposit, and Stay (in days) are of numeric data type and the rest of the columns are of object data type.
- The number of non-null values is the same as the total number of entries in the data, i.e., there are no null values.
- The column patientid is an identifier for patients in the data. This column will not help with our analysis so we can drop it.
# To view patientid and the number of times they have been admitted to the hospital
data['patientid'].value_counts()
patientid 126719 21 125695 21 44572 21 126623 21 125625 19 .. 37634 1 91436 1 118936 1 52366 1 105506 1 Name: count, Length: 126399, dtype: int64
Observation:
- The maximum number of times the same patient admitted to the hospital is 21 and minimum is 1.
# Dropping patientid from the data as it is an identifier and will not add value to the analysis
data=data.drop(columns=["patientid"])
# Checking for duplicate values in the data
data.duplicated().sum()
0
Observation:
- Data contains unique rows. There is no need to remove any rows.
# Checking the descriptive statistics of the columns
data.describe().T
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
Available Extra Rooms in Hospital | 500000.0 | 3.638800 | 2.698124 | 0.000000 | 2.000000 | 3.000000 | 4.000000 | 24.00000 |
staff_available | 500000.0 | 5.020470 | 3.158103 | 0.000000 | 2.000000 | 5.000000 | 8.000000 | 10.00000 |
Visitors with Patient | 500000.0 | 3.549414 | 2.241054 | 0.000000 | 2.000000 | 3.000000 | 4.000000 | 32.00000 |
Admission_Deposit | 500000.0 | 4722.315734 | 1047.324220 | 1654.005148 | 4071.714532 | 4627.003792 | 5091.612717 | 10104.72639 |
Stay (in days) | 500000.0 | 12.381062 | 7.913174 | 3.000000 | 8.000000 | 9.000000 | 11.000000 | 51.00000 |
Observations:
- There are around 3 rooms available in the hospital on average and there are times when the hospital is full and there are no rooms available (minimum value is 0). The maximum number of rooms available in the hospital is 24.
- On average, there are around 5 staff personnel available to treat the new patients but it can also be zero at times. The maximum number of staff available in the hospital is 10.
- On average, around 3 visitors accompany the patient. Some patients come on their own (minimum value is zero) and a few cases have 32 visitors. It will be interesting to see if there is any relationship between the number of visitors and the severity of the patient.
- The average admission deposit lies around 4,722 dollars and a minimum of 1,654 dollars is paid on every admission.
- Patient's stay ranges from 3 to 51 days. There might be outliers in this variable. The median length of stay is 9 days.
# List of all important categorical variables
cat_col = ["Department", "Type of Admission", 'Severity of Illness', 'gender', 'Insurance', 'health_conditions', 'doctor_name', "Ward_Facility_Code", "Age"]
# Printing the number of occurrences of each unique value in each categorical column
for column in cat_col:
print(data[column].value_counts(1))
print("-" * 50)
Department gynecology 0.686956 radiotherapy 0.168630 anesthesia 0.088358 TB & Chest disease 0.045780 surgery 0.010276 Name: proportion, dtype: float64 -------------------------------------------------- Type of Admission Trauma 0.621072 Emergency 0.271568 Urgent 0.107360 Name: proportion, dtype: float64 -------------------------------------------------- Severity of Illness Moderate 0.560394 Minor 0.263074 Extreme 0.176532 Name: proportion, dtype: float64 -------------------------------------------------- gender Female 0.74162 Male 0.20696 Other 0.05142 Name: proportion, dtype: float64 -------------------------------------------------- Insurance Yes 0.78592 No 0.21408 Name: proportion, dtype: float64 -------------------------------------------------- health_conditions Other 0.271209 High Blood Pressure 0.228093 Diabetes 0.211553 Asthama 0.188198 Heart disease 0.100947 Name: proportion, dtype: float64 -------------------------------------------------- doctor_name Dr Sarah 0.199192 Dr Olivia 0.196704 Dr Sophia 0.149506 Dr Nathan 0.141554 Dr Sam 0.111422 Dr John 0.102526 Dr Mark 0.088820 Dr Isaac 0.006718 Dr Simon 0.003558 Name: proportion, dtype: float64 -------------------------------------------------- Ward_Facility_Code F 0.241076 D 0.238110 B 0.207770 E 0.190748 A 0.093102 C 0.029194 Name: proportion, dtype: float64 -------------------------------------------------- Age 21-30 0.319586 31-40 0.266746 41-50 0.160812 11-20 0.093072 61-70 0.053112 51-60 0.043436 71-80 0.037406 81-90 0.016362 0-10 0.006736 91-100 0.002732 Name: proportion, dtype: float64 --------------------------------------------------
Observations:
- The majority of patients (~82%) admit to the hospital with moderate and minor illness, which is understandable as extreme illness is less frequent than moderate and minor illness.
- Gynecology department gets the most number of patients (~68%) in the hospital, whereas patients in Surgery department are very few (~1%).
- Ward A and C accommodate the least number of patients (~12%). These might be wards reserved for patient with extreme illness and patients who need surgery. It would be interesting to see if patients from these wards also stay for longer duration.
- The majority of patients belong to the age group of 21-50 (~75%), and the majority of patients are women (~74%). The most number of patients in the gynecology department of the hospital can justify this.
- Most of the patients admitted to the hospital are the cases of trauma (~62%).
- After 'Other' category, High Blood Pressure and Diabetes are the most common health conditions.
Exploratory Data Analysis (EDA)¶
Univariate Analysis¶
# Function to plot a boxplot and a histogram along the same scale
def histogram_boxplot(data, feature, figsize=(12, 7), kde=False, bins=None):
"""
Boxplot and histogram combined
data: dataframe
feature: dataframe column
figsize: size of figure (default (12,7))
kde: whether to the show density curve (default False)
bins: number of bins for histogram (default None)
"""
f2, (ax_box2, ax_hist2) = plt.subplots(
nrows = 2, # Number of rows of the subplot grid = 2
sharex = True, # x-axis will be shared among all subplots
gridspec_kw = {"height_ratios": (0.25, 0.75)},
figsize = figsize,
) # Creating the 2 subplots
sns.boxplot(data = data, x = feature, ax = ax_box2, showmeans = True, color = "violet"
) # Boxplot will be created and a star will indicate the mean value of the column
sns.histplot(
data = data, x = feature, kde = kde, ax = ax_hist2, bins = bins, palette = "winter"
) if bins else sns.histplot(
data = data, x = feature, kde = kde, ax = ax_hist2
) # For histogram
ax_hist2.axvline(
data[feature].mean(), color = "green", linestyle = "--"
) # Add mean to the histogram
ax_hist2.axvline(
data[feature].median(), color = "black", linestyle = "-"
) # Add median to the histogram
Length of stay¶
histogram_boxplot(data, "Stay (in days)", kde = True, bins = 30)
Observations:
- Fewer patients are staying more than 10 days in the hospital and very few stay for more than 40 days. This might be because the majority of patients are admitted for moderate or minor illnesses.
- The peak of the distribution shows that most of the patients stay for 8-9 days in the hospital.
Admission Deposit¶
histogram_boxplot(data, "Admission_Deposit", kde = True, bins = 30)
Observation:
- The distribution of admission fees is close to normal with outliers on both sides. Few patients are paying a high amount of admission fees and few patients are paying a low amount of admission fees.
Visitors with Patients¶
histogram_boxplot(data, "Visitors with Patient", kde = True, bins = 30)
Observations:
- The distribution of the number of visitors with the patient is highly skewed towards the right.
- 2 and 4 are the most common number of visitors with patients.
Bivariate Analysis¶
# Finding the correlation between various columns of the dataset
plt.figure(figsize = (15,7))
sns.heatmap(data.corr(numeric_only = True), annot = True, vmin = -1, vmax = 1, fmt = ".2f", cmap = "Spectral")
<Axes: >
Observations:
- The heatmap shows that there is no correlation between variables.
- The continuous variables show no correlation with the target variable (Stay (in days)), which indicates that the categorical variables might be more important for the prediction.
# Function to plot stacked bar plots
def stacked_barplot(data, predictor, target):
"""
Print the category counts and plot a stacked bar chart
data: dataframe
predictor: independent variable
target: target variable
"""
count = data[predictor].nunique()
sorter = data[target].value_counts().index[-1]
tab1 = pd.crosstab(data[predictor], data[target], margins = True).sort_values(
by = sorter, ascending = False
)
print(tab1)
print("-" * 120)
tab = pd.crosstab(data[predictor], data[target], normalize = "index").sort_values(
by = sorter, ascending = False
)
tab.plot(kind = "bar", stacked = True, figsize = (count + 1, 5))
plt.legend(
loc = "lower left",
frameon = False,
)
plt.legend(loc = "upper left", bbox_to_anchor = (1, 1))
plt.show()
Let's start by checking the distribution of the LOS for the various wards
sns.barplot(y = 'Ward_Facility_Code', x = 'Stay (in days)', data = data)
plt.show()
Observation:
- The hypothesis we made earlier is correct, i.e., wards A and C has the patients staying for the longest duration, which implies these wards might be for patients with serious illnesses.
stacked_barplot(data, "Ward_Facility_Code", "Department")
Department TB & Chest disease anesthesia gynecology radiotherapy \ Ward_Facility_Code A 4709 15611 0 21093 All 22890 44179 343478 84315 B 0 0 103885 0 C 1319 4199 0 9079 D 0 0 119055 0 E 16862 24369 0 54143 F 0 0 120538 0 Department surgery All Ward_Facility_Code A 5138 46551 All 5138 500000 B 0 103885 C 0 14597 D 0 119055 E 0 95374 F 0 120538 ------------------------------------------------------------------------------------------------------------------------
Observations:
- Ward Facility B, D, and F are dedicated only to the gynecology department.
- Wards A, C, and E have patients with all other diseases, and patients undergoing surgery are admitted to ward A only.
Usually, the more severe the illness, the more the LOS, let's check the distribution of severe patients in various wards.
stacked_barplot(data, "Ward_Facility_Code", "Severity of Illness")
Severity of Illness Extreme Minor Moderate All Ward_Facility_Code All 88266 131537 280197 500000 D 29549 27220 62286 119055 B 24222 23579 56084 103885 A 13662 7877 25012 46551 E 11488 22254 61632 95374 F 5842 47594 67102 120538 C 3503 3013 8081 14597 ------------------------------------------------------------------------------------------------------------------------
Observations:
- Ward A has the highest number of extreme cases. We observed earlier that ward A has the longest length of stay in the hospital as well. It might require more staff and resources as compared to other wards.
- Ward F has the highest number of minor cases and Ward E has the highest number of moderate cases.
Age can also be an important factor to find the length of stay. Let's check the same.
sns.barplot(y = 'Age', x = 'Stay (in days)', data = data)
plt.show()
Observation:
- Patients aged between 1-10 and 51-100 tend to stay the most number of days in the hospital. This might be because the majority of the patients between the 21-50 age group get admitted to the gynecology department and patients in age groups 1-10 and 5-100 might get admitted due to some serious illness.
Let's look at the doctors, their department names, and the total number of patients they have treated.
data.groupby(['doctor_name'])['Department'].agg(Department_Name='unique',Patients_Treated='count')
Department_Name | Patients_Treated | |
---|---|---|
doctor_name | ||
Dr Isaac | [surgery] | 3359 |
Dr John | [TB & Chest disease, anesthesia, radiotherapy] | 51263 |
Dr Mark | [anesthesia, TB & Chest disease] | 44410 |
Dr Nathan | [gynecology] | 70777 |
Dr Olivia | [gynecology] | 98352 |
Dr Sam | [radiotherapy] | 55711 |
Dr Sarah | [gynecology] | 99596 |
Dr Simon | [surgery] | 1779 |
Dr Sophia | [gynecology] | 74753 |
Observations:
- The hospital employs a total of 9 doctors. Four of the doctors work in the department of gynecology, which sees the most patients.
- The majority of patients that attended the hospital were treated by Dr. Sarah and Olivia.
- Two doctors are working in the surgical department (Dr. Isaac and Dr. Simon), while Dr. Sam works in the radiotherapy department.
- The only two doctors who work in several departments are Dr. John and Dr. Mark.
Data Preparation for Model Building¶
- Before we proceed to build a model, we'll have to encode categorical features.
- Separate the independent variables and dependent Variables.
- We'll split the data into train and test to be able to evaluate the model that we train on the training data.
# Creating dummy variables for the categorical columns
# drop_first=True is used to avoid redundant variables
data = pd.get_dummies(
data,
columns = data.select_dtypes(include = ["object", "category"]).columns.tolist(),
drop_first = True,
)
# Check the data after handling categorical data
data
Available Extra Rooms in Hospital | staff_available | Visitors with Patient | Admission_Deposit | Stay (in days) | Department_anesthesia | Department_gynecology | Department_radiotherapy | Department_surgery | Ward_Facility_Code_B | Ward_Facility_Code_C | Ward_Facility_Code_D | Ward_Facility_Code_E | Ward_Facility_Code_F | doctor_name_Dr John | doctor_name_Dr Mark | doctor_name_Dr Nathan | doctor_name_Dr Olivia | doctor_name_Dr Sam | doctor_name_Dr Sarah | doctor_name_Dr Simon | doctor_name_Dr Sophia | Age_11-20 | Age_21-30 | Age_31-40 | Age_41-50 | Age_51-60 | Age_61-70 | Age_71-80 | Age_81-90 | Age_91-100 | gender_Male | gender_Other | Type of Admission_Trauma | Type of Admission_Urgent | Severity of Illness_Minor | Severity of Illness_Moderate | health_conditions_Diabetes | health_conditions_Heart disease | health_conditions_High Blood Pressure | health_conditions_Other | Insurance_Yes | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 4 | 0 | 4 | 2966.408696 | 8 | False | True | False | False | False | False | True | False | False | False | False | False | False | False | False | False | True | False | False | False | True | False | False | False | False | False | False | False | True | False | False | False | True | False | False | False | True |
1 | 4 | 2 | 2 | 3554.835677 | 9 | False | True | False | False | True | False | False | False | False | False | False | False | False | False | False | False | True | False | False | True | False | False | False | False | False | False | False | False | True | False | True | False | False | True | False | False | False |
2 | 2 | 8 | 2 | 5624.733654 | 7 | False | True | False | False | True | False | False | False | False | False | False | False | False | False | False | False | True | False | True | False | False | False | False | False | False | False | False | False | True | False | False | False | True | False | False | False | True |
3 | 4 | 7 | 4 | 4814.149231 | 8 | False | True | False | False | False | False | True | False | False | False | False | False | True | False | False | False | False | False | False | True | False | False | False | False | False | False | False | False | False | True | False | True | False | False | False | False | False |
4 | 2 | 10 | 2 | 5169.269637 | 34 | True | False | False | False | False | False | False | True | False | False | True | False | False | False | False | False | False | False | False | False | False | False | False | True | False | False | True | False | True | False | False | True | True | False | False | False | False |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
499995 | 4 | 2 | 3 | 4105.795901 | 10 | False | True | False | False | False | False | False | False | True | False | False | False | False | False | True | False | False | True | False | False | False | False | False | False | False | False | False | False | True | False | True | False | False | False | True | False | False |
499996 | 13 | 8 | 2 | 4631.550257 | 11 | False | True | False | False | False | False | False | False | True | False | False | False | True | False | False | False | False | False | False | True | False | False | False | False | False | False | False | False | False | False | False | True | False | False | False | True | False |
499997 | 2 | 3 | 2 | 5456.930075 | 8 | False | True | False | False | True | False | False | False | False | False | False | False | False | False | True | False | False | True | False | False | False | False | False | False | False | False | False | False | False | False | False | True | False | False | True | False | False |
499998 | 2 | 1 | 2 | 4694.127772 | 23 | False | False | True | False | False | False | False | False | False | True | False | False | False | False | False | False | False | False | False | False | False | False | True | False | False | False | False | False | True | False | False | False | True | False | False | False | False |
499999 | 3 | 3 | 4 | 4713.868519 | 10 | False | True | False | False | False | False | False | False | True | False | False | False | False | False | False | False | True | False | False | False | True | False | False | False | False | False | False | False | True | False | False | True | False | True | False | False | True |
500000 rows × 42 columns
# Separating independent variables and the target variable
x = data.drop('Stay (in days)',axis=1)
y = data['Stay (in days)']
# Splitting the dataset into train and test datasets
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, shuffle = True, random_state = 1)
# Checking the shape of the train and test data
print("Shape of Training set : ", x_train.shape)
print("Shape of test set : ", x_test.shape)
Shape of Training set : (400000, 41) Shape of test set : (100000, 41)
Model Building¶
- We will be using different metrics functions defined in Scikit-learn like RMSE, MAE, 𝑅2, Adjusted 𝑅2, and MAPE for regression models evaluation. We will define a function to calculate these metric.
- The mean absolute percentage error (MAPE) measures the accuracy of predictions as a percentage, and can be calculated as the average of absolute percentage error for all data points. The absolute percentage error is defined as predicted value minus actual values divided by actual values. It works best if there are no extreme values in the data and none of the actual values are 0.
# Function to compute adjusted R-squared
def adj_r2_score(predictors, targets, predictions):
r2 = r2_score(targets, predictions)
n = predictors.shape[0]
k = predictors.shape[1]
return 1 - ((1 - r2) * (n - 1) / (n - k - 1))
# Function to compute MAPE
def mape_score(targets, predictions):
return np.mean(np.abs(targets - predictions) / targets) * 100
# Function to compute different metrics to check performance of a regression model
def model_performance_regression(model, predictors, target):
"""
Function to compute different metrics to check regression model performance
model: regressor
predictors: independent variables
target: dependent variable
"""
pred = model.predict(predictors) # Predict using the independent variables
r2 = r2_score(target, pred) # To compute R-squared
adjr2 = adj_r2_score(predictors, target, pred) # To compute adjusted R-squared
rmse = np.sqrt(mean_squared_error(target, pred)) # To compute RMSE
mae = mean_absolute_error(target, pred) # To compute MAE
mape = mape_score(target, pred) # To compute MAPE
# Creating a dataframe of metrics
df_perf = pd.DataFrame(
{
"RMSE": rmse,
"MAE": mae,
"R-squared": r2,
"Adj. R-squared": adjr2,
"MAPE": mape,
},
index=[0],
)
return df_perf
Linear Regression¶
from Scikit-learn.linear_model import LinearRegression
# Initialize the model
model = LinearRegression()
# Fit the model on the training data
model.fit(x_train, y_train)
LinearRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
LinearRegression()
# Checking performance on the training data
linear_reg = model_performance_regression(model, x_train, y_train)
linear_reg
RMSE | MAE | R-squared | Adj. R-squared | MAPE | |
---|---|---|---|---|---|
0 | 3.135093 | 2.146244 | 0.842813 | 0.842796 | 19.591833 |
# Checking performance on the testing data
linear_reg_test = model_performance_regression(model, x_test, y_test)
linear_reg_test
RMSE | MAE | R-squared | Adj. R-squared | MAPE | |
---|---|---|---|---|---|
0 | 3.144055 | 2.155765 | 0.843028 | 0.842964 | 19.676966 |
Observations:
The Root Mean Squared Error and the adjusted R2 of train and test data are very close, indicating that our model is not overfitting to the training data.
The adjusted R2 of ~0.84 implies that the independent variables are able to explain ~84% variance in the target variable.
Mean Absolute Error (MAE) indicates that the current model can predict LOS of patients within mean error of 2.15 days on the test data.
The units of both RMSE and MAE are the same, i.e., days in this case. But RMSE is greater than MAE because it penalizes the outliers more.
Mean Absolute Percentage Error is ~19% on the test data, indicating that the average difference between the predicted value and the actual value is ~19%.
Regularization¶
Regularization is a fundamental concept in machine learning. It is a method of preventing the model from overfitting by adding additional information to it.
The machine learning model may perform well with training data but not with test data. It means that when dealing with unseen data, the model cannot anticipate the result since it introduces noise into the output, and so the model is termed overfit. A regularization technique can be used to solve this problem.
By lowering the magnitude of the variables, this technique allows for the preservation of all variables or features in the model. As a result, it maintains accuracy as well as model generalization.
Its primary function is to regularize or lower the coefficient of features towards zero. In other words, "the regularization strategy reduces the magnitude of the features while maintaining the same number of features."
Regularization is accomplished by introducing a penalty or complexity term into the complex model.
Regularization procedures are classified into two types, which are listed below:
- Ridge Regression
- Lasso Regression
Ridge Regression¶
Ridge regression is a sort of linear regression in which a small amount of bias is introduced to improve long-term predictions.
Ridge regression is a regularization technique that is used to reduce model complexity. It's also known as L2 regularization.
The penalty term is added to the cost function in this technique. The amount of bias introduced into the model is referred to as the Ridge Regression penalty.
We may compute it by multiplying the squared weight of each individual feature by the alpha.
In general, Ridge Regression calculates the equation's parameters:
ˆy =slope×X+y intercept
By minimizing the:
the sum of squared residuals+α×slope2
As we can see from the above equation, if the values of α tend to zero, the equation becomes the linear regression model's cost function. As a result, for the minimum value of α, the model will be similar to the linear regression model.
Because a general linear or polynomial regression will fail if the independent variables are highly collinear, Ridge regression can be utilized to tackle such situations.
When we have more parameters than samples, it is easier to solve problems using Ridge Regression.
Ridge Regression with default parameters¶
ridge_model = Ridge() #creating Ridge Regression model
ridge_model.fit(x_train, y_train) # Fitting the data into the model
Ridge()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Ridge()
ridge_reg = model_performance_regression(ridge_model, x_test, y_test) #getting performance metrics on test data
ridge_reg
RMSE | MAE | R-squared | Adj. R-squared | MAPE | |
---|---|---|---|---|---|
0 | 3.144057 | 2.155826 | 0.843028 | 0.842963 | 19.677968 |
Observations:
- The performance metrics are showing almost similar results as compared to the Least Squares method.
Ridge Regression with optimized α¶
folds = KFold(n_splits=10, shuffle=True, random_state=1)
params = {'alpha':[0.001, 0.01, 0.1, 0.2, 0.5, 0.9, 1, 5,10,20]}
model = Ridge()
model_cv = GridSearchCV(estimator=model, param_grid=params, scoring='r2', cv=folds, return_train_score=True)
model_cv.fit(x_train,y_train)
GridSearchCV(cv=KFold(n_splits=10, random_state=1, shuffle=True), estimator=Ridge(), param_grid={'alpha': [0.001, 0.01, 0.1, 0.2, 0.5, 0.9, 1, 5, 10, 20]}, return_train_score=True, scoring='r2')In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
GridSearchCV(cv=KFold(n_splits=10, random_state=1, shuffle=True), estimator=Ridge(), param_grid={'alpha': [0.001, 0.01, 0.1, 0.2, 0.5, 0.9, 1, 5, 10, 20]}, return_train_score=True, scoring='r2')
Ridge()
Ridge()
model_cv.best_params_ #getting optimised parameters for alpha
{'alpha': 0.1}
ridge_model_tuned = Ridge(alpha=0.1) ##creating Tuned Ridge Regression model using optimised alpha value
ridge_model_tuned.fit(x_train, y_train) # Fitting the data into the tuned model
Ridge(alpha=0.1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Ridge(alpha=0.1)
ridge_reg_tuned = model_performance_regression(ridge_model_tuned, x_test, y_test) #getting performance metrics on test data
ridge_reg_tuned
RMSE | MAE | R-squared | Adj. R-squared | MAPE | |
---|---|---|---|---|---|
0 | 3.144055 | 2.155771 | 0.843028 | 0.842964 | 19.677066 |
Observations:
- After applying the Grid SearchCV, the optimized value of alpha results out to be 0.1.
- It can be observed that after tuning the parameters of Ridge Regression, the performance parameters does not change implying that Ridge Regression does not help in improving the model.
Lasso Regression¶
Lasso regression is another regularisation technique for reducing model complexity. It is an abbreviation for Least Absolute and Selection Operator.
It is identical to Ridge Regression except that the penalty term only contains absolute weights rather than a square of weights.
Because it uses absolute data, it can decrease the slope to zero, whereas Ridge Regression can only get close to zero.
It is also known as L1 regularisation.
Fundamentally, Lasso Regression calculates the equation's parameters:
ˆy =slope×X+y intercept
By minimizing the:
the sum of squared residuals+α×|slope|
Lasso Regression with default parameters¶
lasso_model = Lasso()
lasso_model.fit(x_train, y_train)
Lasso()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Lasso()
lasso_reg = model_performance_regression(lasso_model, x_test, y_test)
lasso_reg
RMSE | MAE | R-squared | Adj. R-squared | MAPE | |
---|---|---|---|---|---|
0 | 6.064339 | 3.873332 | 0.416006 | 0.415766 | 34.652716 |
Observations:
- After fitting the data into Lasso Regression Model with default value of alpha (=1), the performance metrics are showing poor results as compared to Least Squares method and Ridge Regression.
- We can tune the alpha to get the optimized value similar to Ridge Regression using Grid SearchCV.
Lasso Regression with optimized α¶
folds = KFold(n_splits=10, shuffle=True, random_state=1)
params = {'alpha':[0.001, 0.01, 0.1, 0.2, 0.5, 0.9, 1, 5,10,20]}
model = Lasso()
model_cv = GridSearchCV(estimator=model, param_grid=params, scoring='r2', cv=folds, return_train_score=True)
model_cv.fit(x_train,y_train)
GridSearchCV(cv=KFold(n_splits=10, random_state=1, shuffle=True), estimator=Lasso(), param_grid={'alpha': [0.001, 0.01, 0.1, 0.2, 0.5, 0.9, 1, 5, 10, 20]}, return_train_score=True, scoring='r2')In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
GridSearchCV(cv=KFold(n_splits=10, random_state=1, shuffle=True), estimator=Lasso(), param_grid={'alpha': [0.001, 0.01, 0.1, 0.2, 0.5, 0.9, 1, 5, 10, 20]}, return_train_score=True, scoring='r2')
Lasso()
Lasso()
model_cv.best_params_
{'alpha': 0.001}
lasso_model_tuned = Lasso(alpha=0.001)
lasso_model_tuned.fit(x_train, y_train)
Lasso(alpha=0.001)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Lasso(alpha=0.001)
lasso_reg_tuned = model_performance_regression(lasso_model_tuned, x_test, y_test)
lasso_reg_tuned
RMSE | MAE | R-squared | Adj. R-squared | MAPE | |
---|---|---|---|---|---|
0 | 3.144315 | 2.157198 | 0.843002 | 0.842938 | 19.702959 |
Observation:
- After applying the Grid SearchCV, the optimized value of alpha results out to be 0.001.
- The performance metrics are showing similar results as compared to Least Squares method and Ridge Regression, implying that after adding the penalty, the model does not improve.
Elastic Net Regression¶
Elastic Net is a regularized regression model that combines L1 and L2 penalties, i.e., lasso and ridge regression. As a result, it performs a more efficient smoothing process.
- The elastic net includes the penalty of lasso regression, and when used in isolation, it becomes the ridge regression.
- In the procedure of regularization with an elastic net, first, the coefficient of ridge regression is determined.
- After this, a lasso algorithm is performed on the ridge regression coefficient to shrink the coefficient.
- It has two parameters to be set, α1 and α2 where α1 controls the L1 penalty and α2 controls the L2 penalty.
Instead of utilising two α-parameters, we can use simply one α and one L1-ratio-parameter, which sets the proportion of our L1 penalty in relation to α. If α=1 and L1-ratio = 0.3, our L1 penalty is multiplied by 0.3, and our L2 penalty is multiplied by 1−L1−ratio=0.7.
ElasticNetMSE=MSE(y,ypred)+α⋅(1−L1Ratio)m∑i=1|θi|+α⋅L1Ratiom∑i=1|θi|
Elastic Net Regression with default parameters¶
elasticnet_model = ElasticNet()
elasticnet_model.fit(x_train, y_train)
ElasticNet()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
ElasticNet()
elasticnet_reg = model_performance_regression(elasticnet_model, x_test, y_test)
elasticnet_reg
RMSE | MAE | R-squared | Adj. R-squared | MAPE | |
---|---|---|---|---|---|
0 | 6.556087 | 4.678504 | 0.317455 | 0.317175 | 40.121657 |
Observations:
- After fitting the data into Elastic Net Model with default value of alpha (=1) and l1_ratio, the performance metrics are showing poor results as compared to Least Squares method and Ridge Regression.
- We can tune the alpha to get the optimized value similar to Ridge Regression using Grid SearchCV.
Elastic Net Regression with optimized α and L1−ratio¶
folds = KFold(n_splits=10, shuffle=True, random_state=1)
params = {'alpha':[0.001, 0.01, 0.1, 0.2, 0.5, 0.9],
'l1_ratio': [0.001, 0.01, 0.02, 0.03, 0.04, 0.05]}
model = ElasticNet()
model_cv = GridSearchCV(estimator=model, param_grid=params, scoring='r2', cv=folds, return_train_score=True)
model_cv.fit(x_train,y_train)
GridSearchCV(cv=KFold(n_splits=10, random_state=1, shuffle=True), estimator=ElasticNet(), param_grid={'alpha': [0.001, 0.01, 0.1, 0.2, 0.5, 0.9], 'l1_ratio': [0.001, 0.01, 0.02, 0.03, 0.04, 0.05]}, return_train_score=True, scoring='r2')In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
GridSearchCV(cv=KFold(n_splits=10, random_state=1, shuffle=True), estimator=ElasticNet(), param_grid={'alpha': [0.001, 0.01, 0.1, 0.2, 0.5, 0.9], 'l1_ratio': [0.001, 0.01, 0.02, 0.03, 0.04, 0.05]}, return_train_score=True, scoring='r2')
ElasticNet()
ElasticNet()
model_cv.best_params_
{'alpha': 0.001, 'l1_ratio': 0.05}
elasticnet_model_tuned = ElasticNet(alpha=0.001, l1_ratio=0.05)
elasticnet_model_tuned.fit(x_train, y_train)
ElasticNet(alpha=0.001, l1_ratio=0.05)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
ElasticNet(alpha=0.001, l1_ratio=0.05)
elasticnet_reg_tuned = model_performance_regression(elasticnet_model_tuned, x_test, y_test)
elasticnet_reg_tuned
RMSE | MAE | R-squared | Adj. R-squared | MAPE | |
---|---|---|---|---|---|
0 | 3.157478 | 2.178911 | 0.841685 | 0.84162 | 19.981572 |
Observation
- After applying the Grid SearchCV, the optimized value of alpha results out to be 0.001, and l1_ratio = 0.05.
- The performance metrics are showing almost similar results as compared to Least Squares method, Ridge Regression and Lasso Regression, implying that after tuning the Elastic Net, the model does not improve.
models= pd.concat([linear_reg_test,ridge_reg,ridge_reg_tuned,lasso_reg,lasso_reg_tuned,elasticnet_reg,
elasticnet_reg_tuned], axis=0) #combining all models into a single dataframe
models['Models'] = ['Least Squares', 'Ridge Regression', 'Ridge Regression Tuned', 'Lasso Regression',
'Lasso Regression Tuned', 'Elastic Net Regression',
'Elastic Net Regression Tuned'] #adding names of the models as a column to the dataframe
models = models.iloc[:,[5, 0,1,2,3,4]] #ordering names of the models as the first column
models
Models | RMSE | MAE | R-squared | Adj. R-squared | MAPE | |
---|---|---|---|---|---|---|
0 | Least Squares | 3.144055 | 2.155765 | 0.843028 | 0.842964 | 19.676966 |
0 | Ridge Regression | 3.144057 | 2.155826 | 0.843028 | 0.842963 | 19.677968 |
0 | Ridge Regression Tuned | 3.144055 | 2.155771 | 0.843028 | 0.842964 | 19.677066 |
0 | Lasso Regression | 6.064339 | 3.873332 | 0.416006 | 0.415766 | 34.652716 |
0 | Lasso Regression Tuned | 3.144315 | 2.157198 | 0.843002 | 0.842938 | 19.702959 |
0 | Elastic Net Regression | 6.556087 | 4.678504 | 0.317455 | 0.317175 | 40.121657 |
0 | Elastic Net Regression Tuned | 3.157478 | 2.178911 | 0.841685 | 0.841620 | 19.981572 |
Observations:
- As per the above result, the Least Squares Method is giving the best results as compared to other models.
- Regularization technique does not offer any significant improvement to the performance metrics.
- So, we will apply some Non Linear models to check if the model performance improves or not.
Forward Feature Selection using SequentialFeatureSelector¶
We will see how to use SequentialFeatureSelector to select a subset of key features using forward feature selection. It is a greedy search algorithm that is used to reduce an initial d-dimensional feature space to a k-dimensional feature subspace where k < d. It is useful to automatically select a subset of the most relevant featuresthat are most relevant to the problem.
Why should we do feature selection?
- Reduces dimensionality
- Discards deceptive features; Deceptive features appear to aid learning on the training set but impair generalization
- Speeds training/testing
How does forward feature selection work?
- It starts with an empty model and adds variables one by one.
- In each forward step, you add the one variable that gives the highest improvement to your model.
We will use forward feature selection on all the variables.
# Installing mlxtend library. You need to run the below code only once if mlxtend library is not previously installed.
!pip install mlxtend
Requirement already satisfied: mlxtend in /usr/local/lib/python3.10/dist-packages (0.22.0) Requirement already satisfied: scipy>=1.2.1 in /usr/local/lib/python3.10/dist-packages (from mlxtend) (1.11.4) Requirement already satisfied: numpy>=1.16.2 in /usr/local/lib/python3.10/dist-packages (from mlxtend) (1.25.2) Requirement already satisfied: pandas>=0.24.2 in /usr/local/lib/python3.10/dist-packages (from mlxtend) (2.0.3) Requirement already satisfied: scikit-learn>=1.0.2 in /usr/local/lib/python3.10/dist-packages (from mlxtend) (1.2.2) Requirement already satisfied: matplotlib>=3.0.0 in /usr/local/lib/python3.10/dist-packages (from mlxtend) (3.7.1) Requirement already satisfied: joblib>=0.13.2 in /usr/local/lib/python3.10/dist-packages (from mlxtend) (1.4.0) Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from mlxtend) (67.7.2) Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.0.0->mlxtend) (1.2.1) Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.0.0->mlxtend) (0.12.1) Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.0.0->mlxtend) (4.51.0) Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.0.0->mlxtend) (1.4.5) Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.0.0->mlxtend) (24.0) Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.0.0->mlxtend) (9.4.0) Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.0.0->mlxtend) (3.1.2) Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.0.0->mlxtend) (2.8.2) Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=0.24.2->mlxtend) (2023.4) Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=0.24.2->mlxtend) (2024.1) Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=1.0.2->mlxtend) (3.5.0) Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.7->matplotlib>=3.0.0->mlxtend) (1.16.0)
# Importing Sequential Feature Selector
from mlxtend.feature_selection import SequentialFeatureSelector as SFS
Parameters to pass in SequentialFeatureSelector:
estimator: scikit-learn classifier or regressor.
k_features: int or tuple or str (default: 1).
- The number of features to choose, where k features equals the entire feature collection, can be specified as an integer.
- The SFS will consider returning any feature combination between min and max that scored highest in cross-validation if a tuple containing a min and max value is provided. For example, instead of a set amount of characteristics k, the tuple (1, 4) will return any combination of 1 to 4 features.
- A string argument such as "best" or "parsimonious". If you choose "best," the feature selector will provide the feature subset with the best cross-validation performance. If the input "parsimonious" is provided, the smallest feature subset that is within one standard error of the cross-validation performance will be chosen.
forward: bool (default: True). Forward selection if True, backward selection otherwise.
floating: bool (default: False). Adds a conditional exclusion/inclusion if True:
- Sequential floating forward selection (SFFS) starts from the empty set.
- After each forward step, it performs backward steps as long as the objective function increases.
- Once it stops increasing, the forward selection is continued.
verbose: int (default: 0), level of verbosity to use in logging. If 0 then no output, if 1then the number of features in the current set, and if 2 then detailed logging including timestamp and cv scores at each step.
scoring: str, callable, or None (default: None). If None (default), uses 'accuracy' for Scikit-learn classifiers and 'r2' for Scikit-learn regressors.
cv: int (default: 5). Integer or iterable yielding train, test splits. If cv is an integer and estimator is a classifier (or y consists of integer class labels) stratified k-fold. Otherwise, regular k-fold cross-validation is performed. No cross-validation if cv is None, False, or 0.
n_jobs: int (default: 1). The number of CPUs to use for evaluating different feature subsets in parallel. -1 means 'all CPUs'.
# Initializing the model to pass to SFS
reg = LinearRegression()
# Forward Feature Selection
sfs = SFS(
reg,
k_features=x_train.shape[1],
forward=True,
floating=False,
scoring="r2",
n_jobs=-1,
verbose=2,
cv=5,
)
# Perform SFS
sfs = sfs.fit(x_train, y_train)
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 37 tasks | elapsed: 1.2min [Parallel(n_jobs=-1)]: Done 41 out of 41 | elapsed: 1.3min finished [2024-05-06 08:43:34] Features: 1/41 -- score: 0.49188988610314494[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 37 tasks | elapsed: 58.9s [Parallel(n_jobs=-1)]: Done 40 out of 40 | elapsed: 1.1min finished [2024-05-06 08:44:38] Features: 2/41 -- score: 0.6046160397618378[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 39 out of 39 | elapsed: 1.2min finished [2024-05-06 08:45:48] Features: 3/41 -- score: 0.6461909142668075[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 38 out of 38 | elapsed: 1.2min finished [2024-05-06 08:47:00] Features: 4/41 -- score: 0.7013054914238064[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 37 out of 37 | elapsed: 1.3min finished [2024-05-06 08:48:15] Features: 5/41 -- score: 0.7323069421611198[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 36 out of 36 | elapsed: 1.3min finished [2024-05-06 08:49:36] Features: 6/41 -- score: 0.8191351388509401[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 35 out of 35 | elapsed: 1.3min finished [2024-05-06 08:50:57] Features: 7/41 -- score: 0.8303282862064949[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 out of 34 | elapsed: 1.4min finished [2024-05-06 08:52:22] Features: 8/41 -- score: 0.8395075505200165[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 33 out of 33 | elapsed: 1.5min finished [2024-05-06 08:53:53] Features: 9/41 -- score: 0.8406253593745727[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 32 out of 32 | elapsed: 1.5min finished [2024-05-06 08:55:24] Features: 10/41 -- score: 0.8414600432551443[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 31 out of 31 | elapsed: 1.6min finished [2024-05-06 08:57:01] Features: 11/41 -- score: 0.8422172722438954[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 30 out of 30 | elapsed: 1.6min finished [2024-05-06 08:58:36] Features: 12/41 -- score: 0.8423261376358377[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 29 out of 29 | elapsed: 1.7min finished [2024-05-06 09:00:15] Features: 13/41 -- score: 0.8423952895273421[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 28 out of 28 | elapsed: 1.9min finished [2024-05-06 09:02:07] Features: 14/41 -- score: 0.8424554518867557[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 27 out of 27 | elapsed: 1.9min finished [2024-05-06 09:03:58] Features: 15/41 -- score: 0.8425213613275844[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 26 out of 26 | elapsed: 1.9min finished [2024-05-06 09:05:52] Features: 16/41 -- score: 0.8425660490400733[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 out of 25 | elapsed: 1.9min finished [2024-05-06 09:07:44] Features: 17/41 -- score: 0.8426615241241737[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 24 out of 24 | elapsed: 1.9min finished [2024-05-06 09:09:38] Features: 18/41 -- score: 0.8426866078058511[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 23 out of 23 | elapsed: 1.9min finished [2024-05-06 09:11:33] Features: 19/41 -- score: 0.8427094423867043[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 22 out of 22 | elapsed: 2.1min finished [2024-05-06 09:13:40] Features: 20/41 -- score: 0.8427304449567397[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 21 out of 21 | elapsed: 1.9min finished [2024-05-06 09:15:34] Features: 21/41 -- score: 0.8427404093833673[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 20 out of 20 | elapsed: 1.9min finished [2024-05-06 09:17:25] Features: 22/41 -- score: 0.8427505409320879[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 19 out of 19 | elapsed: 1.9min finished [2024-05-06 09:19:18] Features: 23/41 -- score: 0.8427572335567[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 18 out of 18 | elapsed: 1.9min finished [2024-05-06 09:21:10] Features: 24/41 -- score: 0.8427691076570655[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 17 out of 17 | elapsed: 1.9min finished [2024-05-06 09:23:06] Features: 25/41 -- score: 0.8427740700470843[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 16 out of 16 | elapsed: 1.7min finished [2024-05-06 09:24:49] Features: 26/41 -- score: 0.8427758697137149[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 15 out of 15 | elapsed: 1.7min finished [2024-05-06 09:26:32] Features: 27/41 -- score: 0.8427778696285013[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 14 out of 14 | elapsed: 1.6min finished [2024-05-06 09:28:10] Features: 28/41 -- score: 0.8427791238823727[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 13 out of 13 | elapsed: 1.6min finished [2024-05-06 09:29:44] Features: 29/41 -- score: 0.8427807461672205[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 12 out of 12 | elapsed: 1.5min finished [2024-05-06 09:31:16] Features: 30/41 -- score: 0.8427809322153685[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 11 out of 11 | elapsed: 1.6min finished [2024-05-06 09:32:49] Features: 31/41 -- score: 0.8427809322153685[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 10 out of 10 | elapsed: 1.3min finished [2024-05-06 09:34:07] Features: 32/41 -- score: 0.8427809322153685[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 out of 9 | elapsed: 1.2min finished [2024-05-06 09:35:22] Features: 33/41 -- score: 0.8427803380446244[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 8 out of 8 | elapsed: 1.1min finished [2024-05-06 09:36:30] Features: 34/41 -- score: 0.8427796949690372[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 7 out of 7 | elapsed: 1.0min finished [2024-05-06 09:37:33] Features: 35/41 -- score: 0.8427789615641776[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 6 out of 6 | elapsed: 53.3s finished [2024-05-06 09:38:26] Features: 36/41 -- score: 0.8427779197388698[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 5 out of 5 | elapsed: 47.9s finished [2024-05-06 09:39:14] Features: 37/41 -- score: 0.8427759963463547[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 4 out of 4 | elapsed: 38.4s finished [2024-05-06 09:39:52] Features: 38/41 -- score: 0.8427733960444744[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 3 out of 3 | elapsed: 32.3s finished [2024-05-06 09:40:25] Features: 39/41 -- score: 0.8427687463538682[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 2 out of 2 | elapsed: 19.6s finished [2024-05-06 09:40:45] Features: 40/41 -- score: 0.8427687463538682[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [2024-05-06 09:40:58] Features: 41/41 -- score: 0.8427687463538682
Now, let's plot the the model performance with addition of each feature. We will use the plot_sequential_feature_selection function for this. It has the following parameters:
metric_dict: mlxtend.SequentialFeatureSelector.get_metric_dict() object, which is a dictionary with items where each dictionary value is a list with the number of iterations (number of feature subsets) as its length. The dictionary keys corresponding to these lists are as follows:
- 'feature_idx': tuple of the indices of the feature subset
- 'cv_scores': list with individual CV scores
- 'avg_score': of CV average scores
- 'feature_names': Name of features in the subset
- 'ci_bound': confidence interval bound of the CV score average
- 'std_dev': standard deviation of the CV score average
- 'std_err': standard error of the CV score average
figsize: tuple (default: None). Height and width of the figure.
kind: str (default: "std_dev"). The kind of error bar or confidence interval in {'std_dev', 'std_err', 'ci', None}.
color: str (default: "blue"). Color of the lineplot (accepts any matplotlib color name).
bcolor: str (default: "steelblue"). Color of the error bars / confidence intervals (accepts any matplotlib color name).
marker: str (default: "o"). Marker of the line plot (accepts any matplotlib marker name).
alpha: float in [0, 1] (default: 0.2). Transparency of the error bars / confidence intervals.
ylabel: str (default: "Performance"). Y-axis label.
confidence_interval: float (default: 0.95). Confidence level if kind='ci'.
sfs.get_metric_dict()
{1: {'feature_idx': (5,), 'cv_scores': array([0.48930483, 0.48939957, 0.49302582, 0.48922331, 0.4984959 ]), 'avg_score': 0.49188988610314494, 'feature_names': ('Department_gynecology',), 'ci_bound': 0.004631493101657075, 'std_dev': 0.0036034589801322166, 'std_err': 0.0018017294900661083}, 2: {'feature_idx': (5, 6), 'cv_scores': array([0.60393142, 0.60360277, 0.60160635, 0.60347042, 0.61046924]), 'avg_score': 0.6046160397618378, 'feature_names': ('Department_gynecology', 'Department_radiotherapy'), 'ci_bound': 0.003903818905262394, 'std_dev': 0.0030373037338530533, 'std_err': 0.0015186518669265266}, 3: {'feature_idx': (5, 6, 23), 'cv_scores': array([0.64519204, 0.6466788 , 0.64365579, 0.64406087, 0.65136707]), 'avg_score': 0.6461909142668075, 'feature_names': ('Department_gynecology', 'Department_radiotherapy', 'Age_31-40'), 'ci_bound': 0.0035892679605198543, 'std_dev': 0.0027925724125139285, 'std_err': 0.0013962862062569643}, 4: {'feature_idx': (5, 6, 23, 24), 'cv_scores': array([0.70130514, 0.70174535, 0.69909054, 0.70011695, 0.70426948]), 'avg_score': 0.7013054914238064, 'feature_names': ('Department_gynecology', 'Department_radiotherapy', 'Age_31-40', 'Age_41-50'), 'ci_bound': 0.0022481465718417765, 'std_dev': 0.001749134409820941, 'std_err': 0.0008745672049104705}, 5: {'feature_idx': (5, 6, 22, 23, 24), 'cv_scores': array([0.73232311, 0.73311883, 0.73025991, 0.73047301, 0.73535986]), 'avg_score': 0.7323069421611198, 'feature_names': ('Department_gynecology', 'Department_radiotherapy', 'Age_21-30', 'Age_31-40', 'Age_41-50'), 'ci_bound': 0.0024068522854820544, 'std_dev': 0.0018726128467877884, 'std_err': 0.0009363064233938942}, 6: {'feature_idx': (5, 6, 21, 22, 23, 24), 'cv_scores': array([0.81759505, 0.82082637, 0.81776443, 0.81792308, 0.82156676]), 'avg_score': 0.8191351388509401, 'feature_names': ('Department_gynecology', 'Department_radiotherapy', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50'), 'ci_bound': 0.00218823495067042, 'std_dev': 0.0017025211331549424, 'std_err': 0.0008512605665774712}, 7: {'feature_idx': (4, 5, 6, 21, 22, 23, 24), 'cv_scores': array([0.82910903, 0.83188966, 0.82885568, 0.82915996, 0.8326271 ]), 'avg_score': 0.8303282862064949, 'feature_names': ('Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50'), 'ci_bound': 0.002051848579320233, 'std_dev': 0.0015964079027512008, 'std_err': 0.0007982039513756004}, 8: {'feature_idx': (4, 5, 6, 7, 21, 22, 23, 24), 'cv_scores': array([0.83841525, 0.84068 , 0.83778767, 0.83893676, 0.84171807]), 'avg_score': 0.8395075505200165, 'feature_names': ('Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50'), 'ci_bound': 0.0018835514590598713, 'std_dev': 0.0014654670255823212, 'std_err': 0.0007327335127911606}, 9: {'feature_idx': (4, 5, 6, 7, 18, 21, 22, 23, 24), 'cv_scores': array([0.8395841 , 0.84177861, 0.83891449, 0.84007992, 0.84276968]), 'avg_score': 0.8406253593745727, 'feature_names': ('Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'doctor_name_Dr Sarah', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50'), 'ci_bound': 0.0018390805560939815, 'std_dev': 0.0014308671522521225, 'std_err': 0.0007154335761260613}, 10: {'feature_idx': (4, 5, 6, 7, 18, 21, 22, 23, 24, 37), 'cv_scores': array([0.84042973, 0.84272043, 0.8397547 , 0.84087775, 0.84351761]), 'avg_score': 0.8414600432551443, 'feature_names': ('Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'doctor_name_Dr Sarah', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'health_conditions_Heart disease'), 'ci_bound': 0.0018295399651148694, 'std_dev': 0.0014234442483451405, 'std_err': 0.0007117221241725703}, 11: {'feature_idx': (0, 4, 5, 6, 7, 18, 21, 22, 23, 24, 37), 'cv_scores': array([0.84122914, 0.84335217, 0.84047911, 0.84174549, 0.84428045]), 'avg_score': 0.8422172722438954, 'feature_names': ('Available Extra Rooms in Hospital', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'doctor_name_Dr Sarah', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'health_conditions_Heart disease'), 'ci_bound': 0.0017961805756157043, 'std_dev': 0.0013974895099866864, 'std_err': 0.0006987447549933432}, 12: {'feature_idx': (0, 4, 5, 6, 7, 12, 18, 21, 22, 23, 24, 37), 'cv_scores': array([0.84133146, 0.84346648, 0.84059185, 0.84183307, 0.84440783]), 'avg_score': 0.8423261376358377, 'feature_names': ('Available Extra Rooms in Hospital', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_F', 'doctor_name_Dr Sarah', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'health_conditions_Heart disease'), 'ci_bound': 0.001806196554090374, 'std_dev': 0.001405282281515688, 'std_err': 0.0007026411407578439}, 13: {'feature_idx': (0, 4, 5, 6, 7, 12, 18, 21, 22, 23, 24, 32, 37), 'cv_scores': array([0.84143434, 0.84354144, 0.84068201, 0.84186707, 0.84445158]), 'avg_score': 0.8423952895273421, 'feature_names': ('Available Extra Rooms in Hospital', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_F', 'doctor_name_Dr Sarah', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Type of Admission_Trauma', 'health_conditions_Heart disease'), 'ci_bound': 0.0017883348234278233, 'std_dev': 0.0013913852482385265, 'std_err': 0.0006956926241192632}, 14: {'feature_idx': (0, 4, 5, 6, 7, 11, 12, 18, 21, 22, 23, 24, 32, 37), 'cv_scores': array([0.84149253, 0.84359962, 0.84074705, 0.84189538, 0.84454268]), 'avg_score': 0.8424554518867557, 'feature_names': ('Available Extra Rooms in Hospital', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Sarah', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Type of Admission_Trauma', 'health_conditions_Heart disease'), 'ci_bound': 0.0018017191642011664, 'std_dev': 0.001401798720070234, 'std_err': 0.0007008993600351169}, 15: {'feature_idx': (0, 4, 5, 6, 7, 9, 11, 12, 18, 21, 22, 23, 24, 32, 37), 'cv_scores': array([0.84151756, 0.84365472, 0.84083538, 0.84196472, 0.84463442]), 'avg_score': 0.8425213613275844, 'feature_names': ('Available Extra Rooms in Hospital', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_C', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Sarah', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Type of Admission_Trauma', 'health_conditions_Heart disease'), 'ci_bound': 0.0018094336278431788, 'std_dev': 0.0014078008348694048, 'std_err': 0.0007039004174347023}, 16: {'feature_idx': (0, 4, 5, 6, 7, 9, 11, 12, 18, 20, 21, 22, 23, 24, 32, 37), 'cv_scores': array([0.84157076, 0.84368731, 0.84088641, 0.84200681, 0.84467896]), 'avg_score': 0.8425660490400733, 'feature_names': ('Available Extra Rooms in Hospital', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_C', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Type of Admission_Trauma', 'health_conditions_Heart disease'), 'ci_bound': 0.0018036427274464757, 'std_dev': 0.00140329531762485, 'std_err': 0.0007016476588124251}, 17: {'feature_idx': (0, 4, 5, 6, 7, 9, 11, 12, 18, 20, 21, 22, 23, 24, 32, 36, 37), 'cv_scores': array([0.84163027, 0.84381662, 0.8409956 , 0.84209143, 0.84477371]), 'avg_score': 0.8426615241241737, 'feature_names': ('Available Extra Rooms in Hospital', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_C', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Type of Admission_Trauma', 'health_conditions_Diabetes', 'health_conditions_Heart disease'), 'ci_bound': 0.00181397721035091, 'std_dev': 0.0014113358964208509, 'std_err': 0.0007056679482104253}, 18: {'feature_idx': (0, 4, 5, 6, 7, 9, 11, 12, 18, 20, 21, 22, 23, 24, 32, 33, 36, 37), 'cv_scores': array([0.84165314, 0.84383133, 0.84104803, 0.84211931, 0.84478123]), 'avg_score': 0.8426866078058511, 'feature_names': ('Available Extra Rooms in Hospital', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_C', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'health_conditions_Diabetes', 'health_conditions_Heart disease'), 'ci_bound': 0.0017968945257366444, 'std_dev': 0.0013980449874360099, 'std_err': 0.0006990224937180049}, 19: {'feature_idx': (0, 3, 4, 5, 6, 7, 9, 11, 12, 18, 20, 21, 22, 23, 24, 32, 33, 36, 37), 'cv_scores': array([0.84165556, 0.84387059, 0.8410869 , 0.84214295, 0.84479122]), 'avg_score': 0.8427094423867043, 'feature_names': ('Available Extra Rooms in Hospital', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_C', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'health_conditions_Diabetes', 'health_conditions_Heart disease'), 'ci_bound': 0.0017944661334763027, 'std_dev': 0.0013961556157569975, 'std_err': 0.0006980778078784988}, 20: {'feature_idx': (0, 3, 4, 5, 6, 7, 9, 11, 12, 18, 20, 21, 22, 23, 24, 28, 32, 33, 36, 37), 'cv_scores': array([0.84169653, 0.84389602, 0.84109857, 0.84214173, 0.84481937]), 'avg_score': 0.8427304449567397, 'feature_names': ('Available Extra Rooms in Hospital', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_C', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_81-90', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'health_conditions_Diabetes', 'health_conditions_Heart disease'), 'ci_bound': 0.0017994745127761138, 'std_dev': 0.0014000523050033567, 'std_err': 0.0007000261525016782}, 21: {'feature_idx': (0, 3, 4, 5, 6, 7, 9, 11, 12, 18, 20, 21, 22, 23, 24, 26, 28, 32, 33, 36, 37), 'cv_scores': array([0.84171921, 0.84389189, 0.84112059, 0.84214082, 0.84482953]), 'avg_score': 0.8427404093833673, 'feature_names': ('Available Extra Rooms in Hospital', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_C', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_61-70', 'Age_81-90', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'health_conditions_Diabetes', 'health_conditions_Heart disease'), 'ci_bound': 0.001791726954504796, 'std_dev': 0.001394024441458253, 'std_err': 0.0006970122207291264}, 22: {'feature_idx': (0, 3, 4, 5, 6, 7, 9, 11, 12, 18, 20, 21, 22, 23, 24, 25, 26, 28, 32, 33, 36, 37), 'cv_scores': array([0.84172767, 0.84391208, 0.84112203, 0.84215577, 0.84483515]), 'avg_score': 0.8427505409320879, 'feature_names': ('Available Extra Rooms in Hospital', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_C', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_81-90', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'health_conditions_Diabetes', 'health_conditions_Heart disease'), 'ci_bound': 0.0017945182676311698, 'std_dev': 0.0013961961779006527, 'std_err': 0.0006980980889503263}, 23: {'feature_idx': (0, 3, 4, 5, 6, 7, 9, 11, 12, 15, 18, 20, 21, 22, 23, 24, 25, 26, 28, 32, 33, 36, 37), 'cv_scores': array([0.84173583, 0.84392189, 0.84113036, 0.84216222, 0.84483587]), 'avg_score': 0.8427572335567, 'feature_names': ('Available Extra Rooms in Hospital', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_C', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Nathan', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_81-90', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'health_conditions_Diabetes', 'health_conditions_Heart disease'), 'ci_bound': 0.0017921537624081036, 'std_dev': 0.0013943565125070934, 'std_err': 0.0006971782562535467}, 24: {'feature_idx': (0, 3, 4, 5, 6, 7, 9, 11, 12, 15, 18, 20, 21, 22, 23, 24, 25, 26, 28, 32, 33, 35, 36, 37), 'cv_scores': array([0.84174792, 0.84392579, 0.84114267, 0.84217859, 0.84485057]), 'avg_score': 0.8427691076570655, 'feature_names': ('Available Extra Rooms in Hospital', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_C', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Nathan', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_81-90', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'Severity of Illness_Moderate', 'health_conditions_Diabetes', 'health_conditions_Heart disease'), 'ci_bound': 0.0017908723320688754, 'std_dev': 0.0013933595161687736, 'std_err': 0.0006966797580843868}, 25: {'feature_idx': (0, 3, 4, 5, 6, 7, 8, 9, 11, 12, 15, 18, 20, 21, 22, 23, 24, 25, 26, 28, 32, 33, 35, 36, 37), 'cv_scores': array([0.84175575, 0.84392909, 0.84114694, 0.84217963, 0.84485894]), 'avg_score': 0.8427740700470843, 'feature_names': ('Available Extra Rooms in Hospital', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_B', 'Ward_Facility_Code_C', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Nathan', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_81-90', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'Severity of Illness_Moderate', 'health_conditions_Diabetes', 'health_conditions_Heart disease'), 'ci_bound': 0.0017919222433743585, 'std_dev': 0.0013941763828333774, 'std_err': 0.0006970881914166886}, 26: {'feature_idx': (0, 3, 4, 5, 6, 7, 8, 9, 11, 12, 15, 18, 20, 21, 22, 23, 24, 25, 26, 27, 28, 32, 33, 35, 36, 37), 'cv_scores': array([0.84176058, 0.84392785, 0.84114221, 0.84218609, 0.84486261]), 'avg_score': 0.8427758697137149, 'feature_names': ('Available Extra Rooms in Hospital', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_B', 'Ward_Facility_Code_C', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Nathan', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_71-80', 'Age_81-90', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'Severity of Illness_Moderate', 'health_conditions_Diabetes', 'health_conditions_Heart disease'), 'ci_bound': 0.0017928808477604453, 'std_dev': 0.0013949222096126944, 'std_err': 0.0006974611048063472}, 27: {'feature_idx': (0, 3, 4, 5, 6, 7, 8, 9, 11, 12, 15, 18, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 35, 36, 37), 'cv_scores': array([0.84175977, 0.84393144, 0.8411444 , 0.84218713, 0.84486661]), 'avg_score': 0.8427778696285013, 'feature_names': ('Available Extra Rooms in Hospital', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_B', 'Ward_Facility_Code_C', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Nathan', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_71-80', 'Age_81-90', 'Age_91-100', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'Severity of Illness_Moderate', 'health_conditions_Diabetes', 'health_conditions_Heart disease'), 'ci_bound': 0.0017945631293705157, 'std_dev': 0.0013962310818579646, 'std_err': 0.0006981155409289823}, 28: {'feature_idx': (0, 3, 4, 5, 6, 7, 8, 9, 11, 12, 15, 18, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 35, 36, 37, 38), 'cv_scores': array([0.84175803, 0.84393421, 0.8411476 , 0.8421875 , 0.84486829]), 'avg_score': 0.8427791238823727, 'feature_names': ('Available Extra Rooms in Hospital', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_B', 'Ward_Facility_Code_C', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Nathan', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_71-80', 'Age_81-90', 'Age_91-100', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'Severity of Illness_Moderate', 'health_conditions_Diabetes', 'health_conditions_Heart disease', 'health_conditions_High Blood Pressure'), 'ci_bound': 0.001795122470303991, 'std_dev': 0.0013966662681068583, 'std_err': 0.0006983331340534291}, 29: {'feature_idx': (0, 3, 4, 5, 6, 7, 8, 9, 11, 12, 15, 18, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 37, 38), 'cv_scores': array([0.84176298, 0.84393899, 0.84113744, 0.84219395, 0.84487037]), 'avg_score': 0.8427807461672205, 'feature_names': ('Available Extra Rooms in Hospital', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_B', 'Ward_Facility_Code_C', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Nathan', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_71-80', 'Age_81-90', 'Age_91-100', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'Severity of Illness_Minor', 'Severity of Illness_Moderate', 'health_conditions_Diabetes', 'health_conditions_Heart disease', 'health_conditions_High Blood Pressure'), 'ci_bound': 0.0017983700228782758, 'std_dev': 0.0013991929743396865, 'std_err': 0.0006995964871698432}, 30: {'feature_idx': (0, 3, 4, 5, 6, 7, 8, 9, 11, 12, 15, 18, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 37, 38, 40), 'cv_scores': array([0.84176214, 0.84393978, 0.84113795, 0.84219458, 0.84487021]), 'avg_score': 0.8427809322153685, 'feature_names': ('Available Extra Rooms in Hospital', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_B', 'Ward_Facility_Code_C', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Nathan', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_71-80', 'Age_81-90', 'Age_91-100', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'Severity of Illness_Minor', 'Severity of Illness_Moderate', 'health_conditions_Diabetes', 'health_conditions_Heart disease', 'health_conditions_High Blood Pressure', 'Insurance_Yes'), 'ci_bound': 0.0017984071199195594, 'std_dev': 0.0013992218370981139, 'std_err': 0.0006996109185490568}, 31: {'feature_idx': (0, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 15, 18, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 37, 38, 40), 'cv_scores': array([0.84176214, 0.84393978, 0.84113795, 0.84219458, 0.84487021]), 'avg_score': 0.8427809322153685, 'feature_names': ('Available Extra Rooms in Hospital', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_B', 'Ward_Facility_Code_C', 'Ward_Facility_Code_D', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Nathan', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_71-80', 'Age_81-90', 'Age_91-100', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'Severity of Illness_Minor', 'Severity of Illness_Moderate', 'health_conditions_Diabetes', 'health_conditions_Heart disease', 'health_conditions_High Blood Pressure', 'Insurance_Yes'), 'ci_bound': 0.0017984071199195896, 'std_dev': 0.0013992218370981369, 'std_err': 0.0006996109185490685}, 32: {'feature_idx': (0, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 15, 16, 18, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 37, 38, 40), 'cv_scores': array([0.84176214, 0.84393978, 0.84113795, 0.84219458, 0.84487021]), 'avg_score': 0.8427809322153685, 'feature_names': ('Available Extra Rooms in Hospital', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_B', 'Ward_Facility_Code_C', 'Ward_Facility_Code_D', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Nathan', 'doctor_name_Dr Olivia', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_71-80', 'Age_81-90', 'Age_91-100', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'Severity of Illness_Minor', 'Severity of Illness_Moderate', 'health_conditions_Diabetes', 'health_conditions_Heart disease', 'health_conditions_High Blood Pressure', 'Insurance_Yes'), 'ci_bound': 0.001798407119919604, 'std_dev': 0.0013992218370981484, 'std_err': 0.0006996109185490742}, 33: {'feature_idx': (0, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 37, 38, 40), 'cv_scores': array([0.84176243, 0.84394197, 0.84113616, 0.84218883, 0.8448723 ]), 'avg_score': 0.8427803380446244, 'feature_names': ('Available Extra Rooms in Hospital', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_B', 'Ward_Facility_Code_C', 'Ward_Facility_Code_D', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Nathan', 'doctor_name_Dr Olivia', 'doctor_name_Dr Sam', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_71-80', 'Age_81-90', 'Age_91-100', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'Severity of Illness_Minor', 'Severity of Illness_Moderate', 'health_conditions_Diabetes', 'health_conditions_Heart disease', 'health_conditions_High Blood Pressure', 'Insurance_Yes'), 'ci_bound': 0.0018007868937655249, 'std_dev': 0.0014010733819990139, 'std_err': 0.0007005366909995069}, 34: {'feature_idx': (0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 37, 38, 40), 'cv_scores': array([0.84176239, 0.84394137, 0.84113397, 0.84218859, 0.84487215]), 'avg_score': 0.8427796949690372, 'feature_names': ('Available Extra Rooms in Hospital', 'Visitors with Patient', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_B', 'Ward_Facility_Code_C', 'Ward_Facility_Code_D', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Nathan', 'doctor_name_Dr Olivia', 'doctor_name_Dr Sam', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_71-80', 'Age_81-90', 'Age_91-100', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'Severity of Illness_Minor', 'Severity of Illness_Moderate', 'health_conditions_Diabetes', 'health_conditions_Heart disease', 'health_conditions_High Blood Pressure', 'Insurance_Yes'), 'ci_bound': 0.0018012958615181624, 'std_dev': 0.0014014693762018733, 'std_err': 0.0007007346881009366}, 35: {'feature_idx': (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 37, 38, 40), 'cv_scores': array([0.84176142, 0.84394176, 0.84113311, 0.84218763, 0.8448709 ]), 'avg_score': 0.8427789615641776, 'feature_names': ('Available Extra Rooms in Hospital', 'staff_available', 'Visitors with Patient', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_B', 'Ward_Facility_Code_C', 'Ward_Facility_Code_D', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Nathan', 'doctor_name_Dr Olivia', 'doctor_name_Dr Sam', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_71-80', 'Age_81-90', 'Age_91-100', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'Severity of Illness_Minor', 'Severity of Illness_Moderate', 'health_conditions_Diabetes', 'health_conditions_Heart disease', 'health_conditions_High Blood Pressure', 'Insurance_Yes'), 'ci_bound': 0.0018014433625029125, 'std_dev': 0.0014015841369791023, 'std_err': 0.000700792068489551}, 36: {'feature_idx': (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 37, 38, 39, 40), 'cv_scores': array([0.84175766, 0.84394198, 0.84113194, 0.84218758, 0.84487043]), 'avg_score': 0.8427779197388698, 'feature_names': ('Available Extra Rooms in Hospital', 'staff_available', 'Visitors with Patient', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_B', 'Ward_Facility_Code_C', 'Ward_Facility_Code_D', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Nathan', 'doctor_name_Dr Olivia', 'doctor_name_Dr Sam', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_71-80', 'Age_81-90', 'Age_91-100', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'Severity of Illness_Minor', 'Severity of Illness_Moderate', 'health_conditions_Diabetes', 'health_conditions_Heart disease', 'health_conditions_High Blood Pressure', 'health_conditions_Other', 'Insurance_Yes'), 'ci_bound': 0.0018023700056068854, 'std_dev': 0.0014023050967951049, 'std_err': 0.0007011525483975524}, 37: {'feature_idx': (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 37, 38, 39, 40), 'cv_scores': array([0.8417447 , 0.84394054, 0.8411334 , 0.84218957, 0.84487178]), 'avg_score': 0.8427759963463547, 'feature_names': ('Available Extra Rooms in Hospital', 'staff_available', 'Visitors with Patient', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_B', 'Ward_Facility_Code_C', 'Ward_Facility_Code_D', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Nathan', 'doctor_name_Dr Olivia', 'doctor_name_Dr Sam', 'doctor_name_Dr Sarah', 'doctor_name_Dr Simon', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_71-80', 'Age_81-90', 'Age_91-100', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'Severity of Illness_Minor', 'Severity of Illness_Moderate', 'health_conditions_Diabetes', 'health_conditions_Heart disease', 'health_conditions_High Blood Pressure', 'health_conditions_Other', 'Insurance_Yes'), 'ci_bound': 0.00180436460146003, 'std_dev': 0.0014038569601318282, 'std_err': 0.0007019284800659141}, 38: {'feature_idx': (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40), 'cv_scores': array([0.8417445 , 0.84393137, 0.84113339, 0.84218943, 0.84486828]), 'avg_score': 0.8427733960444744, 'feature_names': ('Available Extra Rooms in Hospital', 'staff_available', 'Visitors with Patient', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_B', 'Ward_Facility_Code_C', 'Ward_Facility_Code_D', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Nathan', 'doctor_name_Dr Olivia', 'doctor_name_Dr Sam', 'doctor_name_Dr Sarah', 'doctor_name_Dr Simon', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_71-80', 'Age_81-90', 'Age_91-100', 'gender_Other', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'Severity of Illness_Minor', 'Severity of Illness_Moderate', 'health_conditions_Diabetes', 'health_conditions_Heart disease', 'health_conditions_High Blood Pressure', 'health_conditions_Other', 'Insurance_Yes'), 'ci_bound': 0.0018011225528318932, 'std_dev': 0.0014013345361560902, 'std_err': 0.0007006672680780451}, 39: {'feature_idx': (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40), 'cv_scores': array([0.84174483, 0.84393285, 0.84112937, 0.84218622, 0.84485046]), 'avg_score': 0.8427687463538682, 'feature_names': ('Available Extra Rooms in Hospital', 'staff_available', 'Visitors with Patient', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_B', 'Ward_Facility_Code_C', 'Ward_Facility_Code_D', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr John', 'doctor_name_Dr Nathan', 'doctor_name_Dr Olivia', 'doctor_name_Dr Sam', 'doctor_name_Dr Sarah', 'doctor_name_Dr Simon', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_71-80', 'Age_81-90', 'Age_91-100', 'gender_Other', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'Severity of Illness_Minor', 'Severity of Illness_Moderate', 'health_conditions_Diabetes', 'health_conditions_Heart disease', 'health_conditions_High Blood Pressure', 'health_conditions_Other', 'Insurance_Yes'), 'ci_bound': 0.0017960998677658592, 'std_dev': 0.0013974267165375962, 'std_err': 0.000698713358268798}, 40: {'feature_idx': (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40), 'cv_scores': array([0.84174483, 0.84393285, 0.84112937, 0.84218622, 0.84485046]), 'avg_score': 0.8427687463538682, 'feature_names': ('Available Extra Rooms in Hospital', 'staff_available', 'Visitors with Patient', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_B', 'Ward_Facility_Code_C', 'Ward_Facility_Code_D', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr John', 'doctor_name_Dr Mark', 'doctor_name_Dr Nathan', 'doctor_name_Dr Olivia', 'doctor_name_Dr Sam', 'doctor_name_Dr Sarah', 'doctor_name_Dr Simon', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_71-80', 'Age_81-90', 'Age_91-100', 'gender_Other', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'Severity of Illness_Minor', 'Severity of Illness_Moderate', 'health_conditions_Diabetes', 'health_conditions_Heart disease', 'health_conditions_High Blood Pressure', 'health_conditions_Other', 'Insurance_Yes'), 'ci_bound': 0.0017960998677658809, 'std_dev': 0.001397426716537613, 'std_err': 0.0006987133582688064}, 41: {'feature_idx': (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40), 'cv_scores': array([0.84174483, 0.84393285, 0.84112937, 0.84218622, 0.84485046]), 'avg_score': 0.8427687463538682, 'feature_names': ('Available Extra Rooms in Hospital', 'staff_available', 'Visitors with Patient', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_B', 'Ward_Facility_Code_C', 'Ward_Facility_Code_D', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr John', 'doctor_name_Dr Mark', 'doctor_name_Dr Nathan', 'doctor_name_Dr Olivia', 'doctor_name_Dr Sam', 'doctor_name_Dr Sarah', 'doctor_name_Dr Simon', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_71-80', 'Age_81-90', 'Age_91-100', 'gender_Male', 'gender_Other', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'Severity of Illness_Minor', 'Severity of Illness_Moderate', 'health_conditions_Diabetes', 'health_conditions_Heart disease', 'health_conditions_High Blood Pressure', 'health_conditions_Other', 'Insurance_Yes'), 'ci_bound': 0.0017960998677653498, 'std_dev': 0.0013974267165371996, 'std_err': 0.0006987133582685998}}
# To plot the performance of the model with addition of each feature
from mlxtend.plotting import plot_sequential_feature_selection as plot_sfs
fig1 = plot_sfs(sfs.get_metric_dict(), kind="std_err", figsize=(15, 5))
plt.title("Sequential Forward Selection")
plt.xticks(rotation=90)
plt.show()
sfs.get_metric_dict()
{1: {'feature_idx': (5,), 'cv_scores': array([0.48930483, 0.48939957, 0.49302582, 0.48922331, 0.4984959 ]), 'avg_score': 0.49188988610314494, 'feature_names': ('Department_gynecology',), 'ci_bound': 0.004631493101657075, 'std_dev': 0.0036034589801322166, 'std_err': 0.0018017294900661083}, 2: {'feature_idx': (5, 6), 'cv_scores': array([0.60393142, 0.60360277, 0.60160635, 0.60347042, 0.61046924]), 'avg_score': 0.6046160397618378, 'feature_names': ('Department_gynecology', 'Department_radiotherapy'), 'ci_bound': 0.003903818905262394, 'std_dev': 0.0030373037338530533, 'std_err': 0.0015186518669265266}, 3: {'feature_idx': (5, 6, 23), 'cv_scores': array([0.64519204, 0.6466788 , 0.64365579, 0.64406087, 0.65136707]), 'avg_score': 0.6461909142668075, 'feature_names': ('Department_gynecology', 'Department_radiotherapy', 'Age_31-40'), 'ci_bound': 0.0035892679605198543, 'std_dev': 0.0027925724125139285, 'std_err': 0.0013962862062569643}, 4: {'feature_idx': (5, 6, 23, 24), 'cv_scores': array([0.70130514, 0.70174535, 0.69909054, 0.70011695, 0.70426948]), 'avg_score': 0.7013054914238064, 'feature_names': ('Department_gynecology', 'Department_radiotherapy', 'Age_31-40', 'Age_41-50'), 'ci_bound': 0.0022481465718417765, 'std_dev': 0.001749134409820941, 'std_err': 0.0008745672049104705}, 5: {'feature_idx': (5, 6, 22, 23, 24), 'cv_scores': array([0.73232311, 0.73311883, 0.73025991, 0.73047301, 0.73535986]), 'avg_score': 0.7323069421611198, 'feature_names': ('Department_gynecology', 'Department_radiotherapy', 'Age_21-30', 'Age_31-40', 'Age_41-50'), 'ci_bound': 0.0024068522854820544, 'std_dev': 0.0018726128467877884, 'std_err': 0.0009363064233938942}, 6: {'feature_idx': (5, 6, 21, 22, 23, 24), 'cv_scores': array([0.81759505, 0.82082637, 0.81776443, 0.81792308, 0.82156676]), 'avg_score': 0.8191351388509401, 'feature_names': ('Department_gynecology', 'Department_radiotherapy', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50'), 'ci_bound': 0.00218823495067042, 'std_dev': 0.0017025211331549424, 'std_err': 0.0008512605665774712}, 7: {'feature_idx': (4, 5, 6, 21, 22, 23, 24), 'cv_scores': array([0.82910903, 0.83188966, 0.82885568, 0.82915996, 0.8326271 ]), 'avg_score': 0.8303282862064949, 'feature_names': ('Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50'), 'ci_bound': 0.002051848579320233, 'std_dev': 0.0015964079027512008, 'std_err': 0.0007982039513756004}, 8: {'feature_idx': (4, 5, 6, 7, 21, 22, 23, 24), 'cv_scores': array([0.83841525, 0.84068 , 0.83778767, 0.83893676, 0.84171807]), 'avg_score': 0.8395075505200165, 'feature_names': ('Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50'), 'ci_bound': 0.0018835514590598713, 'std_dev': 0.0014654670255823212, 'std_err': 0.0007327335127911606}, 9: {'feature_idx': (4, 5, 6, 7, 18, 21, 22, 23, 24), 'cv_scores': array([0.8395841 , 0.84177861, 0.83891449, 0.84007992, 0.84276968]), 'avg_score': 0.8406253593745727, 'feature_names': ('Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'doctor_name_Dr Sarah', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50'), 'ci_bound': 0.0018390805560939815, 'std_dev': 0.0014308671522521225, 'std_err': 0.0007154335761260613}, 10: {'feature_idx': (4, 5, 6, 7, 18, 21, 22, 23, 24, 37), 'cv_scores': array([0.84042973, 0.84272043, 0.8397547 , 0.84087775, 0.84351761]), 'avg_score': 0.8414600432551443, 'feature_names': ('Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'doctor_name_Dr Sarah', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'health_conditions_Heart disease'), 'ci_bound': 0.0018295399651148694, 'std_dev': 0.0014234442483451405, 'std_err': 0.0007117221241725703}, 11: {'feature_idx': (0, 4, 5, 6, 7, 18, 21, 22, 23, 24, 37), 'cv_scores': array([0.84122914, 0.84335217, 0.84047911, 0.84174549, 0.84428045]), 'avg_score': 0.8422172722438954, 'feature_names': ('Available Extra Rooms in Hospital', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'doctor_name_Dr Sarah', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'health_conditions_Heart disease'), 'ci_bound': 0.0017961805756157043, 'std_dev': 0.0013974895099866864, 'std_err': 0.0006987447549933432}, 12: {'feature_idx': (0, 4, 5, 6, 7, 12, 18, 21, 22, 23, 24, 37), 'cv_scores': array([0.84133146, 0.84346648, 0.84059185, 0.84183307, 0.84440783]), 'avg_score': 0.8423261376358377, 'feature_names': ('Available Extra Rooms in Hospital', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_F', 'doctor_name_Dr Sarah', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'health_conditions_Heart disease'), 'ci_bound': 0.001806196554090374, 'std_dev': 0.001405282281515688, 'std_err': 0.0007026411407578439}, 13: {'feature_idx': (0, 4, 5, 6, 7, 12, 18, 21, 22, 23, 24, 32, 37), 'cv_scores': array([0.84143434, 0.84354144, 0.84068201, 0.84186707, 0.84445158]), 'avg_score': 0.8423952895273421, 'feature_names': ('Available Extra Rooms in Hospital', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_F', 'doctor_name_Dr Sarah', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Type of Admission_Trauma', 'health_conditions_Heart disease'), 'ci_bound': 0.0017883348234278233, 'std_dev': 0.0013913852482385265, 'std_err': 0.0006956926241192632}, 14: {'feature_idx': (0, 4, 5, 6, 7, 11, 12, 18, 21, 22, 23, 24, 32, 37), 'cv_scores': array([0.84149253, 0.84359962, 0.84074705, 0.84189538, 0.84454268]), 'avg_score': 0.8424554518867557, 'feature_names': ('Available Extra Rooms in Hospital', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Sarah', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Type of Admission_Trauma', 'health_conditions_Heart disease'), 'ci_bound': 0.0018017191642011664, 'std_dev': 0.001401798720070234, 'std_err': 0.0007008993600351169}, 15: {'feature_idx': (0, 4, 5, 6, 7, 9, 11, 12, 18, 21, 22, 23, 24, 32, 37), 'cv_scores': array([0.84151756, 0.84365472, 0.84083538, 0.84196472, 0.84463442]), 'avg_score': 0.8425213613275844, 'feature_names': ('Available Extra Rooms in Hospital', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_C', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Sarah', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Type of Admission_Trauma', 'health_conditions_Heart disease'), 'ci_bound': 0.0018094336278431788, 'std_dev': 0.0014078008348694048, 'std_err': 0.0007039004174347023}, 16: {'feature_idx': (0, 4, 5, 6, 7, 9, 11, 12, 18, 20, 21, 22, 23, 24, 32, 37), 'cv_scores': array([0.84157076, 0.84368731, 0.84088641, 0.84200681, 0.84467896]), 'avg_score': 0.8425660490400733, 'feature_names': ('Available Extra Rooms in Hospital', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_C', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Type of Admission_Trauma', 'health_conditions_Heart disease'), 'ci_bound': 0.0018036427274464757, 'std_dev': 0.00140329531762485, 'std_err': 0.0007016476588124251}, 17: {'feature_idx': (0, 4, 5, 6, 7, 9, 11, 12, 18, 20, 21, 22, 23, 24, 32, 36, 37), 'cv_scores': array([0.84163027, 0.84381662, 0.8409956 , 0.84209143, 0.84477371]), 'avg_score': 0.8426615241241737, 'feature_names': ('Available Extra Rooms in Hospital', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_C', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Type of Admission_Trauma', 'health_conditions_Diabetes', 'health_conditions_Heart disease'), 'ci_bound': 0.00181397721035091, 'std_dev': 0.0014113358964208509, 'std_err': 0.0007056679482104253}, 18: {'feature_idx': (0, 4, 5, 6, 7, 9, 11, 12, 18, 20, 21, 22, 23, 24, 32, 33, 36, 37), 'cv_scores': array([0.84165314, 0.84383133, 0.84104803, 0.84211931, 0.84478123]), 'avg_score': 0.8426866078058511, 'feature_names': ('Available Extra Rooms in Hospital', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_C', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'health_conditions_Diabetes', 'health_conditions_Heart disease'), 'ci_bound': 0.0017968945257366444, 'std_dev': 0.0013980449874360099, 'std_err': 0.0006990224937180049}, 19: {'feature_idx': (0, 3, 4, 5, 6, 7, 9, 11, 12, 18, 20, 21, 22, 23, 24, 32, 33, 36, 37), 'cv_scores': array([0.84165556, 0.84387059, 0.8410869 , 0.84214295, 0.84479122]), 'avg_score': 0.8427094423867043, 'feature_names': ('Available Extra Rooms in Hospital', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_C', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'health_conditions_Diabetes', 'health_conditions_Heart disease'), 'ci_bound': 0.0017944661334763027, 'std_dev': 0.0013961556157569975, 'std_err': 0.0006980778078784988}, 20: {'feature_idx': (0, 3, 4, 5, 6, 7, 9, 11, 12, 18, 20, 21, 22, 23, 24, 28, 32, 33, 36, 37), 'cv_scores': array([0.84169653, 0.84389602, 0.84109857, 0.84214173, 0.84481937]), 'avg_score': 0.8427304449567397, 'feature_names': ('Available Extra Rooms in Hospital', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_C', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_81-90', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'health_conditions_Diabetes', 'health_conditions_Heart disease'), 'ci_bound': 0.0017994745127761138, 'std_dev': 0.0014000523050033567, 'std_err': 0.0007000261525016782}, 21: {'feature_idx': (0, 3, 4, 5, 6, 7, 9, 11, 12, 18, 20, 21, 22, 23, 24, 26, 28, 32, 33, 36, 37), 'cv_scores': array([0.84171921, 0.84389189, 0.84112059, 0.84214082, 0.84482953]), 'avg_score': 0.8427404093833673, 'feature_names': ('Available Extra Rooms in Hospital', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_C', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_61-70', 'Age_81-90', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'health_conditions_Diabetes', 'health_conditions_Heart disease'), 'ci_bound': 0.001791726954504796, 'std_dev': 0.001394024441458253, 'std_err': 0.0006970122207291264}, 22: {'feature_idx': (0, 3, 4, 5, 6, 7, 9, 11, 12, 18, 20, 21, 22, 23, 24, 25, 26, 28, 32, 33, 36, 37), 'cv_scores': array([0.84172767, 0.84391208, 0.84112203, 0.84215577, 0.84483515]), 'avg_score': 0.8427505409320879, 'feature_names': ('Available Extra Rooms in Hospital', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_C', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_81-90', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'health_conditions_Diabetes', 'health_conditions_Heart disease'), 'ci_bound': 0.0017945182676311698, 'std_dev': 0.0013961961779006527, 'std_err': 0.0006980980889503263}, 23: {'feature_idx': (0, 3, 4, 5, 6, 7, 9, 11, 12, 15, 18, 20, 21, 22, 23, 24, 25, 26, 28, 32, 33, 36, 37), 'cv_scores': array([0.84173583, 0.84392189, 0.84113036, 0.84216222, 0.84483587]), 'avg_score': 0.8427572335567, 'feature_names': ('Available Extra Rooms in Hospital', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_C', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Nathan', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_81-90', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'health_conditions_Diabetes', 'health_conditions_Heart disease'), 'ci_bound': 0.0017921537624081036, 'std_dev': 0.0013943565125070934, 'std_err': 0.0006971782562535467}, 24: {'feature_idx': (0, 3, 4, 5, 6, 7, 9, 11, 12, 15, 18, 20, 21, 22, 23, 24, 25, 26, 28, 32, 33, 35, 36, 37), 'cv_scores': array([0.84174792, 0.84392579, 0.84114267, 0.84217859, 0.84485057]), 'avg_score': 0.8427691076570655, 'feature_names': ('Available Extra Rooms in Hospital', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_C', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Nathan', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_81-90', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'Severity of Illness_Moderate', 'health_conditions_Diabetes', 'health_conditions_Heart disease'), 'ci_bound': 0.0017908723320688754, 'std_dev': 0.0013933595161687736, 'std_err': 0.0006966797580843868}, 25: {'feature_idx': (0, 3, 4, 5, 6, 7, 8, 9, 11, 12, 15, 18, 20, 21, 22, 23, 24, 25, 26, 28, 32, 33, 35, 36, 37), 'cv_scores': array([0.84175575, 0.84392909, 0.84114694, 0.84217963, 0.84485894]), 'avg_score': 0.8427740700470843, 'feature_names': ('Available Extra Rooms in Hospital', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_B', 'Ward_Facility_Code_C', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Nathan', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_81-90', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'Severity of Illness_Moderate', 'health_conditions_Diabetes', 'health_conditions_Heart disease'), 'ci_bound': 0.0017919222433743585, 'std_dev': 0.0013941763828333774, 'std_err': 0.0006970881914166886}, 26: {'feature_idx': (0, 3, 4, 5, 6, 7, 8, 9, 11, 12, 15, 18, 20, 21, 22, 23, 24, 25, 26, 27, 28, 32, 33, 35, 36, 37), 'cv_scores': array([0.84176058, 0.84392785, 0.84114221, 0.84218609, 0.84486261]), 'avg_score': 0.8427758697137149, 'feature_names': ('Available Extra Rooms in Hospital', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_B', 'Ward_Facility_Code_C', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Nathan', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_71-80', 'Age_81-90', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'Severity of Illness_Moderate', 'health_conditions_Diabetes', 'health_conditions_Heart disease'), 'ci_bound': 0.0017928808477604453, 'std_dev': 0.0013949222096126944, 'std_err': 0.0006974611048063472}, 27: {'feature_idx': (0, 3, 4, 5, 6, 7, 8, 9, 11, 12, 15, 18, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 35, 36, 37), 'cv_scores': array([0.84175977, 0.84393144, 0.8411444 , 0.84218713, 0.84486661]), 'avg_score': 0.8427778696285013, 'feature_names': ('Available Extra Rooms in Hospital', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_B', 'Ward_Facility_Code_C', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Nathan', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_71-80', 'Age_81-90', 'Age_91-100', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'Severity of Illness_Moderate', 'health_conditions_Diabetes', 'health_conditions_Heart disease'), 'ci_bound': 0.0017945631293705157, 'std_dev': 0.0013962310818579646, 'std_err': 0.0006981155409289823}, 28: {'feature_idx': (0, 3, 4, 5, 6, 7, 8, 9, 11, 12, 15, 18, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 35, 36, 37, 38), 'cv_scores': array([0.84175803, 0.84393421, 0.8411476 , 0.8421875 , 0.84486829]), 'avg_score': 0.8427791238823727, 'feature_names': ('Available Extra Rooms in Hospital', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_B', 'Ward_Facility_Code_C', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Nathan', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_71-80', 'Age_81-90', 'Age_91-100', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'Severity of Illness_Moderate', 'health_conditions_Diabetes', 'health_conditions_Heart disease', 'health_conditions_High Blood Pressure'), 'ci_bound': 0.001795122470303991, 'std_dev': 0.0013966662681068583, 'std_err': 0.0006983331340534291}, 29: {'feature_idx': (0, 3, 4, 5, 6, 7, 8, 9, 11, 12, 15, 18, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 37, 38), 'cv_scores': array([0.84176298, 0.84393899, 0.84113744, 0.84219395, 0.84487037]), 'avg_score': 0.8427807461672205, 'feature_names': ('Available Extra Rooms in Hospital', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_B', 'Ward_Facility_Code_C', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Nathan', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_71-80', 'Age_81-90', 'Age_91-100', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'Severity of Illness_Minor', 'Severity of Illness_Moderate', 'health_conditions_Diabetes', 'health_conditions_Heart disease', 'health_conditions_High Blood Pressure'), 'ci_bound': 0.0017983700228782758, 'std_dev': 0.0013991929743396865, 'std_err': 0.0006995964871698432}, 30: {'feature_idx': (0, 3, 4, 5, 6, 7, 8, 9, 11, 12, 15, 18, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 37, 38, 40), 'cv_scores': array([0.84176214, 0.84393978, 0.84113795, 0.84219458, 0.84487021]), 'avg_score': 0.8427809322153685, 'feature_names': ('Available Extra Rooms in Hospital', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_B', 'Ward_Facility_Code_C', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Nathan', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_71-80', 'Age_81-90', 'Age_91-100', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'Severity of Illness_Minor', 'Severity of Illness_Moderate', 'health_conditions_Diabetes', 'health_conditions_Heart disease', 'health_conditions_High Blood Pressure', 'Insurance_Yes'), 'ci_bound': 0.0017984071199195594, 'std_dev': 0.0013992218370981139, 'std_err': 0.0006996109185490568}, 31: {'feature_idx': (0, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 15, 18, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 37, 38, 40), 'cv_scores': array([0.84176214, 0.84393978, 0.84113795, 0.84219458, 0.84487021]), 'avg_score': 0.8427809322153685, 'feature_names': ('Available Extra Rooms in Hospital', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_B', 'Ward_Facility_Code_C', 'Ward_Facility_Code_D', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Nathan', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_71-80', 'Age_81-90', 'Age_91-100', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'Severity of Illness_Minor', 'Severity of Illness_Moderate', 'health_conditions_Diabetes', 'health_conditions_Heart disease', 'health_conditions_High Blood Pressure', 'Insurance_Yes'), 'ci_bound': 0.0017984071199195896, 'std_dev': 0.0013992218370981369, 'std_err': 0.0006996109185490685}, 32: {'feature_idx': (0, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 15, 16, 18, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 37, 38, 40), 'cv_scores': array([0.84176214, 0.84393978, 0.84113795, 0.84219458, 0.84487021]), 'avg_score': 0.8427809322153685, 'feature_names': ('Available Extra Rooms in Hospital', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_B', 'Ward_Facility_Code_C', 'Ward_Facility_Code_D', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Nathan', 'doctor_name_Dr Olivia', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_71-80', 'Age_81-90', 'Age_91-100', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'Severity of Illness_Minor', 'Severity of Illness_Moderate', 'health_conditions_Diabetes', 'health_conditions_Heart disease', 'health_conditions_High Blood Pressure', 'Insurance_Yes'), 'ci_bound': 0.001798407119919604, 'std_dev': 0.0013992218370981484, 'std_err': 0.0006996109185490742}, 33: {'feature_idx': (0, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 37, 38, 40), 'cv_scores': array([0.84176243, 0.84394197, 0.84113616, 0.84218883, 0.8448723 ]), 'avg_score': 0.8427803380446244, 'feature_names': ('Available Extra Rooms in Hospital', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_B', 'Ward_Facility_Code_C', 'Ward_Facility_Code_D', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Nathan', 'doctor_name_Dr Olivia', 'doctor_name_Dr Sam', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_71-80', 'Age_81-90', 'Age_91-100', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'Severity of Illness_Minor', 'Severity of Illness_Moderate', 'health_conditions_Diabetes', 'health_conditions_Heart disease', 'health_conditions_High Blood Pressure', 'Insurance_Yes'), 'ci_bound': 0.0018007868937655249, 'std_dev': 0.0014010733819990139, 'std_err': 0.0007005366909995069}, 34: {'feature_idx': (0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 37, 38, 40), 'cv_scores': array([0.84176239, 0.84394137, 0.84113397, 0.84218859, 0.84487215]), 'avg_score': 0.8427796949690372, 'feature_names': ('Available Extra Rooms in Hospital', 'Visitors with Patient', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_B', 'Ward_Facility_Code_C', 'Ward_Facility_Code_D', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Nathan', 'doctor_name_Dr Olivia', 'doctor_name_Dr Sam', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_71-80', 'Age_81-90', 'Age_91-100', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'Severity of Illness_Minor', 'Severity of Illness_Moderate', 'health_conditions_Diabetes', 'health_conditions_Heart disease', 'health_conditions_High Blood Pressure', 'Insurance_Yes'), 'ci_bound': 0.0018012958615181624, 'std_dev': 0.0014014693762018733, 'std_err': 0.0007007346881009366}, 35: {'feature_idx': (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 37, 38, 40), 'cv_scores': array([0.84176142, 0.84394176, 0.84113311, 0.84218763, 0.8448709 ]), 'avg_score': 0.8427789615641776, 'feature_names': ('Available Extra Rooms in Hospital', 'staff_available', 'Visitors with Patient', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_B', 'Ward_Facility_Code_C', 'Ward_Facility_Code_D', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Nathan', 'doctor_name_Dr Olivia', 'doctor_name_Dr Sam', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_71-80', 'Age_81-90', 'Age_91-100', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'Severity of Illness_Minor', 'Severity of Illness_Moderate', 'health_conditions_Diabetes', 'health_conditions_Heart disease', 'health_conditions_High Blood Pressure', 'Insurance_Yes'), 'ci_bound': 0.0018014433625029125, 'std_dev': 0.0014015841369791023, 'std_err': 0.000700792068489551}, 36: {'feature_idx': (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 37, 38, 39, 40), 'cv_scores': array([0.84175766, 0.84394198, 0.84113194, 0.84218758, 0.84487043]), 'avg_score': 0.8427779197388698, 'feature_names': ('Available Extra Rooms in Hospital', 'staff_available', 'Visitors with Patient', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_B', 'Ward_Facility_Code_C', 'Ward_Facility_Code_D', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Nathan', 'doctor_name_Dr Olivia', 'doctor_name_Dr Sam', 'doctor_name_Dr Sarah', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_71-80', 'Age_81-90', 'Age_91-100', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'Severity of Illness_Minor', 'Severity of Illness_Moderate', 'health_conditions_Diabetes', 'health_conditions_Heart disease', 'health_conditions_High Blood Pressure', 'health_conditions_Other', 'Insurance_Yes'), 'ci_bound': 0.0018023700056068854, 'std_dev': 0.0014023050967951049, 'std_err': 0.0007011525483975524}, 37: {'feature_idx': (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 37, 38, 39, 40), 'cv_scores': array([0.8417447 , 0.84394054, 0.8411334 , 0.84218957, 0.84487178]), 'avg_score': 0.8427759963463547, 'feature_names': ('Available Extra Rooms in Hospital', 'staff_available', 'Visitors with Patient', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_B', 'Ward_Facility_Code_C', 'Ward_Facility_Code_D', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Nathan', 'doctor_name_Dr Olivia', 'doctor_name_Dr Sam', 'doctor_name_Dr Sarah', 'doctor_name_Dr Simon', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_71-80', 'Age_81-90', 'Age_91-100', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'Severity of Illness_Minor', 'Severity of Illness_Moderate', 'health_conditions_Diabetes', 'health_conditions_Heart disease', 'health_conditions_High Blood Pressure', 'health_conditions_Other', 'Insurance_Yes'), 'ci_bound': 0.00180436460146003, 'std_dev': 0.0014038569601318282, 'std_err': 0.0007019284800659141}, 38: {'feature_idx': (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40), 'cv_scores': array([0.8417445 , 0.84393137, 0.84113339, 0.84218943, 0.84486828]), 'avg_score': 0.8427733960444744, 'feature_names': ('Available Extra Rooms in Hospital', 'staff_available', 'Visitors with Patient', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_B', 'Ward_Facility_Code_C', 'Ward_Facility_Code_D', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr Nathan', 'doctor_name_Dr Olivia', 'doctor_name_Dr Sam', 'doctor_name_Dr Sarah', 'doctor_name_Dr Simon', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_71-80', 'Age_81-90', 'Age_91-100', 'gender_Other', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'Severity of Illness_Minor', 'Severity of Illness_Moderate', 'health_conditions_Diabetes', 'health_conditions_Heart disease', 'health_conditions_High Blood Pressure', 'health_conditions_Other', 'Insurance_Yes'), 'ci_bound': 0.0018011225528318932, 'std_dev': 0.0014013345361560902, 'std_err': 0.0007006672680780451}, 39: {'feature_idx': (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40), 'cv_scores': array([0.84174483, 0.84393285, 0.84112937, 0.84218622, 0.84485046]), 'avg_score': 0.8427687463538682, 'feature_names': ('Available Extra Rooms in Hospital', 'staff_available', 'Visitors with Patient', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_B', 'Ward_Facility_Code_C', 'Ward_Facility_Code_D', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr John', 'doctor_name_Dr Nathan', 'doctor_name_Dr Olivia', 'doctor_name_Dr Sam', 'doctor_name_Dr Sarah', 'doctor_name_Dr Simon', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_71-80', 'Age_81-90', 'Age_91-100', 'gender_Other', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'Severity of Illness_Minor', 'Severity of Illness_Moderate', 'health_conditions_Diabetes', 'health_conditions_Heart disease', 'health_conditions_High Blood Pressure', 'health_conditions_Other', 'Insurance_Yes'), 'ci_bound': 0.0017960998677658592, 'std_dev': 0.0013974267165375962, 'std_err': 0.000698713358268798}, 40: {'feature_idx': (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40), 'cv_scores': array([0.84174483, 0.84393285, 0.84112937, 0.84218622, 0.84485046]), 'avg_score': 0.8427687463538682, 'feature_names': ('Available Extra Rooms in Hospital', 'staff_available', 'Visitors with Patient', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_B', 'Ward_Facility_Code_C', 'Ward_Facility_Code_D', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr John', 'doctor_name_Dr Mark', 'doctor_name_Dr Nathan', 'doctor_name_Dr Olivia', 'doctor_name_Dr Sam', 'doctor_name_Dr Sarah', 'doctor_name_Dr Simon', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_71-80', 'Age_81-90', 'Age_91-100', 'gender_Other', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'Severity of Illness_Minor', 'Severity of Illness_Moderate', 'health_conditions_Diabetes', 'health_conditions_Heart disease', 'health_conditions_High Blood Pressure', 'health_conditions_Other', 'Insurance_Yes'), 'ci_bound': 0.0017960998677658809, 'std_dev': 0.001397426716537613, 'std_err': 0.0006987133582688064}, 41: {'feature_idx': (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40), 'cv_scores': array([0.84174483, 0.84393285, 0.84112937, 0.84218622, 0.84485046]), 'avg_score': 0.8427687463538682, 'feature_names': ('Available Extra Rooms in Hospital', 'staff_available', 'Visitors with Patient', 'Admission_Deposit', 'Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Ward_Facility_Code_B', 'Ward_Facility_Code_C', 'Ward_Facility_Code_D', 'Ward_Facility_Code_E', 'Ward_Facility_Code_F', 'doctor_name_Dr John', 'doctor_name_Dr Mark', 'doctor_name_Dr Nathan', 'doctor_name_Dr Olivia', 'doctor_name_Dr Sam', 'doctor_name_Dr Sarah', 'doctor_name_Dr Simon', 'doctor_name_Dr Sophia', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50', 'Age_51-60', 'Age_61-70', 'Age_71-80', 'Age_81-90', 'Age_91-100', 'gender_Male', 'gender_Other', 'Type of Admission_Trauma', 'Type of Admission_Urgent', 'Severity of Illness_Minor', 'Severity of Illness_Moderate', 'health_conditions_Diabetes', 'health_conditions_Heart disease', 'health_conditions_High Blood Pressure', 'health_conditions_Other', 'Insurance_Yes'), 'ci_bound': 0.0017960998677653498, 'std_dev': 0.0013974267165371996, 'std_err': 0.0006987133582685998}}
Observations:
- We can observe that the performance increases till the 8th feature and then becomes constant.
- The decision to choose the k_features now depends on the R2 vs the complexity of the model.
- With 8 features, we are getting an R2 of 0.840.
- With 20 features, we are getting an R2 of 0.844.
- With 42 features, we are getting an R2 of 0.843.
- The increase in R2 is not very significant as we are getting approximately the same values with a less complex model.
- So we'll use 8 features only to build the Linear Regression model, but you can experiment by taking a different number.
- Number of features chosen can also depend on the business context and use case of the model.
Let's run the Sequential Feature Selector again to find the best 8 features for the model.
reg = LinearRegression()
# Forward feature selection with 8 features
sfs = SFS(
reg,
k_features=8,
forward=True,
floating=False,
scoring="r2",
n_jobs=-1,
verbose=2,
cv=5,
)
# Perform SFFS
sfs = sfs.fit(x_train, y_train)
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 37 tasks | elapsed: 49.2s [Parallel(n_jobs=-1)]: Done 41 out of 41 | elapsed: 52.9s finished [2024-05-06 09:41:52] Features: 1/8 -- score: 0.49188988610314494[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 37 tasks | elapsed: 1.1min [Parallel(n_jobs=-1)]: Done 40 out of 40 | elapsed: 1.2min finished [2024-05-06 09:43:04] Features: 2/8 -- score: 0.6046160397618378[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 39 out of 39 | elapsed: 1.1min finished [2024-05-06 09:44:09] Features: 3/8 -- score: 0.6461909142668075[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 38 out of 38 | elapsed: 1.2min finished [2024-05-06 09:45:21] Features: 4/8 -- score: 0.7013054914238064[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 37 out of 37 | elapsed: 1.2min finished [2024-05-06 09:46:35] Features: 5/8 -- score: 0.7323069421611198[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 36 out of 36 | elapsed: 1.3min finished [2024-05-06 09:47:56] Features: 6/8 -- score: 0.8191351388509401[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 35 out of 35 | elapsed: 1.4min finished [2024-05-06 09:49:22] Features: 7/8 -- score: 0.8303282862064949[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers. [Parallel(n_jobs=-1)]: Done 34 out of 34 | elapsed: 1.4min finished [2024-05-06 09:50:46] Features: 8/8 -- score: 0.8395075505200165
# Selecting the features which are important for the model
feat_cols = list(sfs.k_feature_idx_)
print(feat_cols)
[4, 5, 6, 7, 21, 22, 23, 24]
# Checking the names of the important features
x_train.columns[feat_cols]
Index(['Department_anesthesia', 'Department_gynecology', 'Department_radiotherapy', 'Department_surgery', 'Age_11-20', 'Age_21-30', 'Age_31-40', 'Age_41-50'], dtype='object')
Now, we will fit the Linear Regression model using these 8 features only.
# Creating the new x_train data
x_train_final = x_train[x_train.columns[feat_cols]]
# Creating the new x_test data
x_test_final = x_test[x_train_final.columns]
# Fitting Linear Regression model on the new training data
lin_reg_model2 = LinearRegression()
lin_reg_model2.fit(x_train_final, y_train)
LinearRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
LinearRegression()
# Checking model performance on the training data
lin_reg_model2_train_perf = model_performance_regression(lin_reg_model2, x_train_final, y_train)
lin_reg_model2_train_perf
RMSE | MAE | R-squared | Adj. R-squared | MAPE | |
---|---|---|---|---|---|
0 | 3.167762 | 2.16747 | 0.83952 | 0.839516 | 19.769004 |
# Checking model performance on the testing data
lin_reg_model2_test_perf = model_performance_regression(lin_reg_model2, x_test_final, y_test)
lin_reg_model2_test_perf
RMSE | MAE | R-squared | Adj. R-squared | MAPE | |
---|---|---|---|---|---|
0 | 3.175516 | 2.174951 | 0.839871 | 0.839858 | 19.83425 |
Observations:
- The performance looks approximately the same as the previous model with all the variables.
- Let's compare the two models we built.
# Training performance comparison
models_train_comp_df = pd.concat(
[linear_reg.T, lin_reg_model2_train_perf.T], axis=1,
)
models_train_comp_df.columns = [
"Linear Regression Scikit-learn",
"Linear Regression Scikit-learn (SFS features)",
]
print("Training performance comparison:")
models_train_comp_df
Training performance comparison:
Linear Regression Scikit-learn | Linear Regression Scikit-learn (SFS features) | |
---|---|---|
RMSE | 3.135093 | 3.167762 |
MAE | 2.146244 | 2.167470 |
R-squared | 0.842813 | 0.839520 |
Adj. R-squared | 0.842796 | 0.839516 |
MAPE | 19.591833 | 19.769004 |
# Testing performance comparison
models_test_comp_df = pd.concat(
[linear_reg_test.T, lin_reg_model2_test_perf.T], axis=1,
)
models_test_comp_df.columns = [
"Linear Regression Scikit-learn",
"Linear Regression Scikit-learn (SFS features)",
]
print("Test performance comparison:")
models_test_comp_df
Test performance comparison:
Linear Regression Scikit-learn | Linear Regression Scikit-learn (SFS features) | |
---|---|---|
RMSE | 3.144055 | 3.175516 |
MAE | 2.155765 | 2.174951 |
R-squared | 0.843028 | 0.839871 |
Adj. R-squared | 0.842964 | 0.839858 |
MAPE | 19.676966 | 19.834250 |
- The new model (lin_reg_model2) uses 8 features in comparison to 42 features for the previous model (linear_reg), i.e., the number of features has reduced by ~81%.
- The performance of the new model, however, is very close to our previous model.
- Depending upon time sensitivity and storage restrictions, we can choose between the models.
Next Steps¶
We have explored building a Linear Regression model for this problem statement of predicting the likely length of stay of a patient for a hospital visit, and we've also identifies the most important features for the model, and trained the model using only those features, without compromising the model performance by much.
However, being a linear model, it is more interpretable than a model with high predictive power. The performance metrics of our attempt at prediction can be improved with more complex and non-linear models.
In the coming section, we will explore building models on more complex regularized versions of Linear Regression, and also get into non-linear tree-based regression models, to see if we can improve on the model's predictive performance.
# Convert notebook to html
!jupyter nbconvert --to html "/content/drive/My Drive/Colab Notebooks/Copy of FDS_Project_LearnerNotebook_FullCode.ipynb"