TY - JOUR
T1 - Enhancing the prediction of type 2 diabetes mellitus using sparse balanced SVM
AU - Shrestha, Bibek
AU - Alsadoon, Abeer
AU - Prasad, P. W.C.
AU - Al-Naymat, Ghazi
AU - Al-Dala’in, Thair
AU - Rashid, Tarik A.
AU - Alsadoon, Omar Hisham
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2022/11
Y1 - 2022/11
N2 - The natural population-based prediction of type 2 diabetes is costly since it needs a high number of resources. Even though much research has used machine learning algorithms to predict type II diabetes, it could not obtain a sufficient sensitivity range due to imbalanced and sparse data. This research aims to utilize noninvasive features from electronic health records with a machine-learning algorithm, namely Sparse Balance- Support Vector Machine (SB-SVM), to handle the imbalanced data and achieve high precision. The proposed system uses SB-SVM to create sparsity and implicitly to select the highest relevant features from the imbalanced data. Initially, we preprocess the data using different baseline variables and filters. Secondly, different features are extracted from the preprocessed data using inclusion and exclusion criteria as filters. Thirdly, we selected 12 highly relevant features to diabetes prediction using statistical analysis and logistic regression. Then, we train and test the proposed model using the nested stratified cross-validation method. Finally, the optimal model performance is evaluated based on the test set. The proposed model predicts type 2 diabetes mellitus using the noninvasive features, with enhanced sensitivity and less processing time. Our solution outperforms the state-of-the-art in most performance metrics. Accuracy, precision, recall, and Area Under the Curve (AUC) of the best solution are 67.22%, 62.93%, 69.96%, and 69.96%, respectively. In comparison, our solution achieved Accuracy, precision, recall, and AUC of 76.39%, 66.86%, 76.74%, and 85.08%, respectively. The average processing time is decreased from 40 ~ 85 folds/sec to 8.9 ~ 10.7 folds/sec. To conclude, the proposed system improves the precision and sensitivity of diabetes prediction with minimal processing time.
AB - The natural population-based prediction of type 2 diabetes is costly since it needs a high number of resources. Even though much research has used machine learning algorithms to predict type II diabetes, it could not obtain a sufficient sensitivity range due to imbalanced and sparse data. This research aims to utilize noninvasive features from electronic health records with a machine-learning algorithm, namely Sparse Balance- Support Vector Machine (SB-SVM), to handle the imbalanced data and achieve high precision. The proposed system uses SB-SVM to create sparsity and implicitly to select the highest relevant features from the imbalanced data. Initially, we preprocess the data using different baseline variables and filters. Secondly, different features are extracted from the preprocessed data using inclusion and exclusion criteria as filters. Thirdly, we selected 12 highly relevant features to diabetes prediction using statistical analysis and logistic regression. Then, we train and test the proposed model using the nested stratified cross-validation method. Finally, the optimal model performance is evaluated based on the test set. The proposed model predicts type 2 diabetes mellitus using the noninvasive features, with enhanced sensitivity and less processing time. Our solution outperforms the state-of-the-art in most performance metrics. Accuracy, precision, recall, and Area Under the Curve (AUC) of the best solution are 67.22%, 62.93%, 69.96%, and 69.96%, respectively. In comparison, our solution achieved Accuracy, precision, recall, and AUC of 76.39%, 66.86%, 76.74%, and 85.08%, respectively. The average processing time is decreased from 40 ~ 85 folds/sec to 8.9 ~ 10.7 folds/sec. To conclude, the proposed system improves the precision and sensitivity of diabetes prediction with minimal processing time.
KW - Machine learning
KW - Noninvasive attributes
KW - Screening
KW - Support Vector Machine
KW - Type 2 diabetes mellitus
UR - http://www.scopus.com/inward/record.url?scp=85128832850&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85128832850&partnerID=8YFLogxK
U2 - 10.1007/s11042-022-13087-5
DO - 10.1007/s11042-022-13087-5
M3 - Article
AN - SCOPUS:85128832850
SN - 1380-7501
VL - 81
SP - 38945
EP - 38969
JO - Multimedia Tools and Applications
JF - Multimedia Tools and Applications
IS - 27
ER -