Addressing class imbalance and cost sensitivity in software defect prediction by combining domain costs and balancing costs

Michael Siers, Md Zahidul Islam

Research output: Book chapter/Published conference paperConference paperpeer-review

8 Citations (Scopus)
4 Downloads (Pure)


Effective methods for identification of software defects help minimize the business costs of software development. Classification methods can be used to perform software defect prediction. When cost-sensitive methods are used, the predictions are optimized for business cost. The data sets used as input for these methods typically suffer from the class imbalance problem. That is, there are many more defect-free code examples than defective code examples to learn from. This negatively impacts the classifier’s ability to correctly predict defective code examples. Cost-sensitive classification can also be used to mitigate the affects of the class imbalance problem by setting the costs to reflect the level of imbalance in the training data set. Through an experimental process, we have developed a method for combining these two different types of costs. We demonstrate that by using our proposed approach, we can produce more cost effective predictions than several recent cost-sensitive methods used for software defect prediction. Furthermore, we examine the software defect prediction models built by our method and present the discovered insights.
Original languageEnglish
Title of host publicationAdvanced data mining and applications
Subtitle of host publication12th international conference, ADMA 2016, proceedings
EditorsJinyan Li, Shuliang Wang, Xue Li , Jianxin Li, Quan Z. Sheng
Place of PublicationSwitzerland
Number of pages16
ISBN (Electronic)9783319495866
ISBN (Print)9783319495859
Publication statusPublished - 2016
EventAdvanced Data Mining and Applications (ADMA) 12th International Conference - Mantra Legends Hotel, Gold Coast, Australia
Duration: 12 Dec 201615 Dec 2016 (Conference program)

Publication series

NameLecture Notes in Computer Science
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


ConferenceAdvanced Data Mining and Applications (ADMA) 12th International Conference
CityGold Coast
OtherThe year 2016 marks the 12th aniversary of the International Conference on Advanced Data Mining and Applications (ADMA 2016).
The conference aims at bringing together the experts on data mining from around the world, and providing a leading international forum for the dissemination of original research findings in data mining, spanning applications, algorithms, software and systems, as well as different applied disciplines with potential in data mining, such as smartphone and social network mining, bio-medical science and green computing. ADMA 2016 will promote the same close interaction and collaboration among practitioners and researchers. Published papers will go through a full peer review process.
Internet address


Dive into the research topics of 'Addressing class imbalance and cost sensitivity in software defect prediction by combining domain costs and balancing costs'. Together they form a unique fingerprint.

Cite this