Abstract
Software defect prediction (SDP) involves using machine learning to locate bugs in source code. Datasets used for SDP are typically affected by an issue called class imbalance. Traditional learning algorithms do not perform well on class imbalanced datasets. Cost-sensitive learning has been used in SDP to minimise the monetary costs incurred by predictions. We propose a framework which produces cost-sensitive predictions and also mitigates class imbalance. Since our algorithm builds a decision forest classifier, knowledge can be extracted by manual inspection of the individual decision trees. To enhance this knowledge discovery process, we propose an algorithm for extracting the most interesting patterns from a decision forest. Our algorithm calculates interestingness as the potential financial gain of knowing the pattern. We then present a process which combines the above-mentioned techniques into an end-to-end cost-sensitive knowledge discovery process. This process is demonstrated by extracting knowledge from four software projects undertaken by the National Aeronautics and Space Administration (NASA).
Original language | English |
---|---|
Pages (from-to) | 53-70 |
Number of pages | 18 |
Journal | Information Sciences |
Volume | 459 |
Early online date | May 2018 |
DOIs | |
Publication status | Published - Aug 2018 |