Abstract
Class imbalance and cost-sensitivity are two prominent challenges in classification. The overwhelming majority of techniques which address these issues only focus on predictive performance rather than suitability for knowledge discovery. This thesis focuses on addressing both issues. This thesis proposes the design for an approach with four important characteristics. Firstly, a cost-sensitive decision forest is generated which avoids the negative effects of class imbalance. Secondly, the forest is generated using the entirety of the original training dataset which means that the knowledge it contains directly matches the original data. Thirdly, a clear process is proposed which automatically extracts, ranks, and values the forest’s discovered knowledge. Lastly, the resulting classifier achieves competitive performance compared to several existing techniques. The knowledge discovery approach is demonstrated by discovering patterns in software bugs present in several NASA programs (National Aeronautics and Space Administration). The conceptual design of a tool for real-time integration of the proposed techniques into the software development process is also presented at the end of this thesis.
Original language | English |
---|---|
Qualification | Doctor of Philosophy |
Awarding Institution |
|
Supervisors/Advisors |
|
Award date | 21 Apr 2019 |
Place of Publication | Australia |
Publisher | |
Publication status | Published - 2019 |