A novel modified undersampling (MUS) technique for software defect prediction

P. Lingden, Abeer Alsadoon, P. W.C. Prasad, Omar Hisham Alsadoon, Rasha S. Ali, Vinh Tran Quoc Nguyen

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Background and aim: Many sophisticated data mining and machine learning algorithms have been used for software defect prediction (SDP) to enhance the quality of software. However, real-world SDP data sets suffer from class imbalance, which leads to a biased classifier and reduces the performance of existing classification algorithms resulting in an inaccurate classification and prediction. This work aims to improve the class imbalance nature of data sets to increase the accuracy of defect prediction and decrease the processing time. Methodology: The proposed model focuses on balancing the class of data sets to increase the accuracy of prediction and decrease processing time. It consists of a modified undersampling method and a correlation feature selection (CFS) method. Results: The results from ten open source project data sets showed that the proposed model improves the accuracy in terms of F1-score to 0.52 ∼ 0.96, and hence it is proximity reached best F1-score value in 0.96 near to 1 then it is given a perfect performance in the prediction process. Conclusion: The proposed model focuses on balancing the class of data sets to increase the accuracy of prediction and decrease processing time using the proposed model.

Original languageEnglish
JournalComputational Intelligence
DOIs
Publication statusE-pub ahead of print - 18 Jul 2019

Fingerprint

Defects
Software
Prediction
Balancing
Decrease
Processing
Classification Algorithm
Inaccurate
Model
Open Source
Feature Selection
Learning algorithms
Proximity
Biased
Data mining
Learning systems
Feature extraction
Learning Algorithm
Data Mining
Machine Learning

Cite this

Lingden, P. ; Alsadoon, Abeer ; Prasad, P. W.C. ; Alsadoon, Omar Hisham ; Ali, Rasha S. ; Nguyen, Vinh Tran Quoc. / A novel modified undersampling (MUS) technique for software defect prediction. In: Computational Intelligence. 2019.
@article{1096cf47c190442bbf2b0288edf19123,
title = "A novel modified undersampling (MUS) technique for software defect prediction",
abstract = "Background and aim: Many sophisticated data mining and machine learning algorithms have been used for software defect prediction (SDP) to enhance the quality of software. However, real-world SDP data sets suffer from class imbalance, which leads to a biased classifier and reduces the performance of existing classification algorithms resulting in an inaccurate classification and prediction. This work aims to improve the class imbalance nature of data sets to increase the accuracy of defect prediction and decrease the processing time. Methodology: The proposed model focuses on balancing the class of data sets to increase the accuracy of prediction and decrease processing time. It consists of a modified undersampling method and a correlation feature selection (CFS) method. Results: The results from ten open source project data sets showed that the proposed model improves the accuracy in terms of F1-score to 0.52 ∼ 0.96, and hence it is proximity reached best F1-score value in 0.96 near to 1 then it is given a perfect performance in the prediction process. Conclusion: The proposed model focuses on balancing the class of data sets to increase the accuracy of prediction and decrease processing time using the proposed model.",
keywords = "correlation feature selection, machine learning, modified undersampling, software defect prediction",
author = "P. Lingden and Abeer Alsadoon and Prasad, {P. W.C.} and Alsadoon, {Omar Hisham} and Ali, {Rasha S.} and Nguyen, {Vinh Tran Quoc}",
year = "2019",
month = "7",
day = "18",
doi = "10.1111/coin.12229",
language = "English",
journal = "Computational Intelligence",
issn = "0824-7935",
publisher = "Wiley-Blackwell",

}

A novel modified undersampling (MUS) technique for software defect prediction. / Lingden, P.; Alsadoon, Abeer; Prasad, P. W.C.; Alsadoon, Omar Hisham; Ali, Rasha S.; Nguyen, Vinh Tran Quoc.

In: Computational Intelligence, 18.07.2019.

Research output: Contribution to journalArticle

TY - JOUR

T1 - A novel modified undersampling (MUS) technique for software defect prediction

AU - Lingden, P.

AU - Alsadoon, Abeer

AU - Prasad, P. W.C.

AU - Alsadoon, Omar Hisham

AU - Ali, Rasha S.

AU - Nguyen, Vinh Tran Quoc

PY - 2019/7/18

Y1 - 2019/7/18

N2 - Background and aim: Many sophisticated data mining and machine learning algorithms have been used for software defect prediction (SDP) to enhance the quality of software. However, real-world SDP data sets suffer from class imbalance, which leads to a biased classifier and reduces the performance of existing classification algorithms resulting in an inaccurate classification and prediction. This work aims to improve the class imbalance nature of data sets to increase the accuracy of defect prediction and decrease the processing time. Methodology: The proposed model focuses on balancing the class of data sets to increase the accuracy of prediction and decrease processing time. It consists of a modified undersampling method and a correlation feature selection (CFS) method. Results: The results from ten open source project data sets showed that the proposed model improves the accuracy in terms of F1-score to 0.52 ∼ 0.96, and hence it is proximity reached best F1-score value in 0.96 near to 1 then it is given a perfect performance in the prediction process. Conclusion: The proposed model focuses on balancing the class of data sets to increase the accuracy of prediction and decrease processing time using the proposed model.

AB - Background and aim: Many sophisticated data mining and machine learning algorithms have been used for software defect prediction (SDP) to enhance the quality of software. However, real-world SDP data sets suffer from class imbalance, which leads to a biased classifier and reduces the performance of existing classification algorithms resulting in an inaccurate classification and prediction. This work aims to improve the class imbalance nature of data sets to increase the accuracy of defect prediction and decrease the processing time. Methodology: The proposed model focuses on balancing the class of data sets to increase the accuracy of prediction and decrease processing time. It consists of a modified undersampling method and a correlation feature selection (CFS) method. Results: The results from ten open source project data sets showed that the proposed model improves the accuracy in terms of F1-score to 0.52 ∼ 0.96, and hence it is proximity reached best F1-score value in 0.96 near to 1 then it is given a perfect performance in the prediction process. Conclusion: The proposed model focuses on balancing the class of data sets to increase the accuracy of prediction and decrease processing time using the proposed model.

KW - correlation feature selection

KW - machine learning

KW - modified undersampling

KW - software defect prediction

UR - http://www.scopus.com/inward/record.url?scp=85069811147&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85069811147&partnerID=8YFLogxK

U2 - 10.1111/coin.12229

DO - 10.1111/coin.12229

M3 - Article

AN - SCOPUS:85069811147

JO - Computational Intelligence

JF - Computational Intelligence

SN - 0824-7935

ER -