Data science for class imbalanced and cost-sensitive data and its application to software defect prediction

Michael Siers

Research output: ThesisDoctoral Thesis

130 Downloads (Pure)


Class imbalance and cost-sensitivity are two prominent challenges in classification. The overwhelming majority of techniques which address these issues only focus on predictive performance rather than suitability for knowledge discovery. This thesis focuses on addressing both issues. This thesis proposes the design for an approach with four important characteristics. Firstly, a cost-sensitive decision forest is generated which avoids the negative effects of class imbalance. Secondly, the forest is generated using the entirety of the original training dataset which means that the knowledge it contains directly matches the original data. Thirdly, a clear process is proposed which automatically extracts, ranks, and values the forest’s discovered knowledge. Lastly, the resulting classifier achieves competitive performance compared to several existing techniques. The knowledge discovery approach is demonstrated by discovering patterns in software bugs present in several NASA programs (National Aeronautics and Space Administration). The conceptual design of a tool for real-time integration of the proposed techniques into the software development process is also presented at the end of this thesis.
Original languageEnglish
QualificationDoctor of Philosophy
Awarding Institution
  • Charles Sturt University
  • Islam, Zahid, Principal Supervisor
  • Bossomaier, Terry, Co-Supervisor
Award date21 Apr 2019
Place of PublicationAustralia
Publication statusPublished - 2019

Fingerprint Dive into the research topics of 'Data science for class imbalanced and cost-sensitive data and its application to software defect prediction'. Together they form a unique fingerprint.

Cite this