TY - JOUR
T1 - A comparison of machine learning techniques for the detection of type-2 diabetes mellitus
T2 - Experiences from Bangladesh
AU - Uddin, Md Jamal
AU - Ahamad, Md Martuza
AU - Hoque, Md. Nesarul
AU - Walid, Md. Abul Ala
AU - Aktar, Sakifa
AU - Alotaibi, Naif
AU - Alyami, Salem A.
AU - Kabir, Ashad
AU - Moni, Mohammad Ali
N1 - Publisher Copyright:
© 2023 by the authors.
PY - 2023/7
Y1 - 2023/7
N2 - Diabetes is a chronic disease caused by a persistently high blood sugar level, causing other chronic diseases, including cardiovascular, kidney, eye, and nerve damage. Prompt detection plays a vital role in reducing the risk and severity associated with diabetes, and identifying key risk factors can help individuals become more mindful of their lifestyles. In this study, we conducted a questionnaire-based survey utilizing standard diabetes risk variables to examine the prevalence of diabetes in Bangladesh. To enable prompt detection of diabetes, we compared different machine learning techniques and proposed an ensemble-based machine learning framework that incorporated algorithms such as decision tree, random forest, and extreme gradient boost algorithms. In order to address class imbalance within the dataset, we initially applied the synthetic minority oversampling technique (SMOTE) and random oversampling (ROS) techniques. We evaluated the performance of various classifiers, including decision tree (DT), logistic regression (LR), support vector machine (SVM), gradient boost (GB), extreme gradient boost (XGBoost), random forest (RF), and ensemble technique (ET), on our diabetes datasets. Our experimental results showed that the ET outperformed other classifiers; to further enhance its effectiveness, we fine-tuned and evaluated the hyperparameters of the ET. Using statistical and machine learning techniques, we also ranked features and identified that age, extreme thirst, and diabetes in the family are significant features that prove instrumental in the detection of diabetes patients. This method has great potential for clinicians to effectively identify individuals at risk of diabetes, facilitating timely intervention and care.
AB - Diabetes is a chronic disease caused by a persistently high blood sugar level, causing other chronic diseases, including cardiovascular, kidney, eye, and nerve damage. Prompt detection plays a vital role in reducing the risk and severity associated with diabetes, and identifying key risk factors can help individuals become more mindful of their lifestyles. In this study, we conducted a questionnaire-based survey utilizing standard diabetes risk variables to examine the prevalence of diabetes in Bangladesh. To enable prompt detection of diabetes, we compared different machine learning techniques and proposed an ensemble-based machine learning framework that incorporated algorithms such as decision tree, random forest, and extreme gradient boost algorithms. In order to address class imbalance within the dataset, we initially applied the synthetic minority oversampling technique (SMOTE) and random oversampling (ROS) techniques. We evaluated the performance of various classifiers, including decision tree (DT), logistic regression (LR), support vector machine (SVM), gradient boost (GB), extreme gradient boost (XGBoost), random forest (RF), and ensemble technique (ET), on our diabetes datasets. Our experimental results showed that the ET outperformed other classifiers; to further enhance its effectiveness, we fine-tuned and evaluated the hyperparameters of the ET. Using statistical and machine learning techniques, we also ranked features and identified that age, extreme thirst, and diabetes in the family are significant features that prove instrumental in the detection of diabetes patients. This method has great potential for clinicians to effectively identify individuals at risk of diabetes, facilitating timely intervention and care.
KW - diabetes
KW - mellitus
KW - machine learning
KW - survey
KW - feature selection
KW - feature importance
KW - diabetes mellitus
UR - http://www.scopus.com/inward/record.url?scp=85166253894&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85166253894&partnerID=8YFLogxK
U2 - 10.3390/info14070376
DO - 10.3390/info14070376
M3 - Article
AN - SCOPUS:85166253894
SN - 2078-2489
VL - 14
SP - 1
EP - 19
JO - Information (Basel)
JF - Information (Basel)
IS - 7
M1 - 376
ER -