TY - JOUR
T1 - Deep learning for predicting the onset of type 2 diabetes
T2 - enhanced ensemble classifier using modified t-SNE
AU - Pokharel, Monima
AU - Alsadoon, Abeer
AU - Nguyen, Tran Quoc Vinh
AU - Al-Dala’in, Thair
AU - Pham, Duong Thu Hang
AU - Prasad, P. W.C.
AU - Mai, Ha Thi
N1 - Funding Information:
This research is partially supported by The University of Da Nang - University of Science and Education, Vietnam, under the grant “T2020-TD-03-BS”.
Publisher Copyright:
© 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2022/8
Y1 - 2022/8
N2 - Several methods have been used for detecting Type 2 diabetes mellitus (T2DM), but deep learning has not been successfully used to predict T2DM due to the low accuracy and performance. Using a traditional method like the synthetic minority over-sampling technique (SMOTE) affects the system’s accuracy. This study proposed an enhanced embedding technique that aims to increase the accuracy of predicting T2DM with minimum error. The proposed system uses the t-distributed Stochastic Neighbor Embedding (t-SNE), which visualizes the high dimension data with imbalanced and insufficient data to improve the accuracy, sensitivity, and specificity of T2DM production. It consists of three components: Pre-processing, feature extraction and selection, and classification. Pima Indians diabetics, Polarity, and Luzhou, are three datasets used for this proposed solution. The proposed system increased the overall performance of the model. It provides an accuracy of 85.34% from 83.96%, a sensitivity of 33.06% from 31.22%, and a specificity of 97.26% from 96.00% compared to the state-of-the-art. The proposed system reduced the overfitting problem, which affects the model’s accuracy. It also uses a non-linear technique for dimension reduction that is used for the visualization of high dimension datasets to deal with large, insufficient, and inconsistent datasets.
AB - Several methods have been used for detecting Type 2 diabetes mellitus (T2DM), but deep learning has not been successfully used to predict T2DM due to the low accuracy and performance. Using a traditional method like the synthetic minority over-sampling technique (SMOTE) affects the system’s accuracy. This study proposed an enhanced embedding technique that aims to increase the accuracy of predicting T2DM with minimum error. The proposed system uses the t-distributed Stochastic Neighbor Embedding (t-SNE), which visualizes the high dimension data with imbalanced and insufficient data to improve the accuracy, sensitivity, and specificity of T2DM production. It consists of three components: Pre-processing, feature extraction and selection, and classification. Pima Indians diabetics, Polarity, and Luzhou, are three datasets used for this proposed solution. The proposed system increased the overall performance of the model. It provides an accuracy of 85.34% from 83.96%, a sensitivity of 33.06% from 31.22%, and a specificity of 97.26% from 96.00% compared to the state-of-the-art. The proposed system reduced the overfitting problem, which affects the model’s accuracy. It also uses a non-linear technique for dimension reduction that is used for the visualization of high dimension datasets to deal with large, insufficient, and inconsistent datasets.
KW - Dimension reduction
KW - Embedding
KW - Overfitting
KW - Prediction of type 2 diabetes mellitus
KW - t-distributed Stochastic Neighbor Embedding (t-SNE)
KW - Wide and deep learning
UR - http://www.scopus.com/inward/record.url?scp=85127335925&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85127335925&partnerID=8YFLogxK
U2 - 10.1007/s11042-022-12950-9
DO - 10.1007/s11042-022-12950-9
M3 - Article
AN - SCOPUS:85127335925
SN - 1380-7501
VL - 81
SP - 27837
EP - 27852
JO - Multimedia Tools and Applications
JF - Multimedia Tools and Applications
IS - 19
ER -