Addressing class imbalance and cost sensitivity in software defect prediction by combining domain costs and balancing costs

Michael Siers, Md Zahidul Islam

Research output: Book chapter/Published conference paperConference paperpeer-review

8 Citations (Scopus)
3 Downloads (Pure)

Abstract

Effective methods for identification of software defects help minimize the business costs of software development. Classification methods can be used to perform software defect prediction. When cost-sensitive methods are used, the predictions are optimized for business cost. The data sets used as input for these methods typically suffer from the class imbalance problem. That is, there are many more defect-free code examples than defective code examples to learn from. This negatively impacts the classifier’s ability to correctly predict defective code examples. Cost-sensitive classification can also be used to mitigate the affects of the class imbalance problem by setting the costs to reflect the level of imbalance in the training data set. Through an experimental process, we have developed a method for combining these two different types of costs. We demonstrate that by using our proposed approach, we can produce more cost effective predictions than several recent cost-sensitive methods used for software defect prediction. Furthermore, we examine the software defect prediction models built by our method and present the discovered insights.
Original languageEnglish
Title of host publicationProceedings of the 12th International Conference on Advanced Data Mining and Applications, ADMA 2016
Place of PublicationSwitzerland
PublisherSpringer
Pages156-171
Number of pages16
Volume10086
ISBN (Print)9783319495859
DOIs
Publication statusPublished - 2016
EventAdvanced Data Mining and Applications (ADMA) 12th International Conference - Mantra Legends Hotel, Surfers Paradise, Gold Coast, Australia
Duration: 12 Dec 201615 Dec 2016
https://cs.adelaide.edu.au/~adma2016/

Conference

ConferenceAdvanced Data Mining and Applications (ADMA) 12th International Conference
Country/TerritoryAustralia
CitySurfers Paradise, Gold Coast
Period12/12/1615/12/16
Internet address

Fingerprint

Dive into the research topics of 'Addressing class imbalance and cost sensitivity in software defect prediction by combining domain costs and balancing costs'. Together they form a unique fingerprint.

Cite this