TY - JOUR
T1 - A novel modified undersampling (MUS) technique for software defect prediction
AU - Lingden, P.
AU - Alsadoon, Abeer
AU - Prasad, P. W.C.
AU - Alsadoon, Omar Hisham
AU - Ali, Rasha S.
AU - Nguyen, Vinh Tran Quoc
PY - 2019/11
Y1 - 2019/11
N2 - Background and aim: Many sophisticated data mining and machine learning algorithms have been used for software defect prediction (SDP) to enhance the quality of software. However, real-world SDP data sets suffer from class imbalance, which leads to a biased classifier and reduces the performance of existing classification algorithms resulting in an inaccurate classification and prediction. This work aims to improve the class imbalance nature of data sets to increase the accuracy of defect prediction and decrease the processing time. Methodology: The proposed model focuses on balancing the class of data sets to increase the accuracy of prediction and decrease processing time. It consists of a modified undersampling method and a correlation feature selection (CFS) method. Results: The results from ten open source project data sets showed that the proposed model improves the accuracy in terms of F1-score to 0.52 ∼ 0.96, and hence it is proximity reached best F1-score value in 0.96 near to 1 then it is given a perfect performance in the prediction process. Conclusion: The proposed model focuses on balancing the class of data sets to increase the accuracy of prediction and decrease processing time using the proposed model.
AB - Background and aim: Many sophisticated data mining and machine learning algorithms have been used for software defect prediction (SDP) to enhance the quality of software. However, real-world SDP data sets suffer from class imbalance, which leads to a biased classifier and reduces the performance of existing classification algorithms resulting in an inaccurate classification and prediction. This work aims to improve the class imbalance nature of data sets to increase the accuracy of defect prediction and decrease the processing time. Methodology: The proposed model focuses on balancing the class of data sets to increase the accuracy of prediction and decrease processing time. It consists of a modified undersampling method and a correlation feature selection (CFS) method. Results: The results from ten open source project data sets showed that the proposed model improves the accuracy in terms of F1-score to 0.52 ∼ 0.96, and hence it is proximity reached best F1-score value in 0.96 near to 1 then it is given a perfect performance in the prediction process. Conclusion: The proposed model focuses on balancing the class of data sets to increase the accuracy of prediction and decrease processing time using the proposed model.
KW - correlation feature selection
KW - machine learning
KW - modified undersampling
KW - software defect prediction
UR - http://www.scopus.com/inward/record.url?scp=85069811147&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85069811147&partnerID=8YFLogxK
U2 - 10.1111/coin.12229
DO - 10.1111/coin.12229
M3 - Article
AN - SCOPUS:85069811147
SN - 0824-7935
VL - 35
SP - 1003
EP - 1020
JO - Computational Intelligence
JF - Computational Intelligence
IS - 4
ER -