Optimizing the number of trees in a decision forest to discover a subforest with high ensemble accuracy using a genetic algorithm

Md Nasim Adnan, Md Zahidul Islam

Research output: Contribution to journalArticlepeer-review

67 Citations (Scopus)

Abstract

A decision forest is an ensemble of decision trees, and it is often built to discover more patterns (i.e. logic rules) and predict/classify class values more accurately than a single decision tree. Existing decision forest algorithms are typically used for building huge numbers of decision trees, involving large memory and computational overhead, in order to achieve high accuracy. Generally, many of the trees do not contribute to improving the ensemble accuracy of a forest. As a result, ensemble pruning algorithms aim to get rid of those trees while generating a subforest in order to achieve higher (or comparable) ensemble accuracy than the original forest. The objectives are two fold: select as small number of trees as possible, and maintain the ensemble accuracy of the subforest as high as possible. An optimal subforest can be found by exhaustive search; however it is not practical for any standard-sized forest as the number of candidate subforests grows exponentially. In order to avoid the computational burden of an exhaustive search, many greedy and genetic algorithm-based subforest selection techniques have been proposed in literature. In this paper, we propose a subforest selection technique that achieves small size as well as high accuracy. We use a genetic algorithm where we carefully select high quality individual trees for the initial population of the genetic algorithm in order to improve the final output of the algorithm. Experiments are conducted on 20 data sets from the UCI Machine Learning Repository to compare the proposed technique with several existing state-of-the-art techniques. The results indicate that the proposed technique can select effective subforests which are significantly smaller than original forests while achieving better (or comparable) accuracy than the original forests.
Original languageEnglish
Pages (from-to)86-97
Number of pages12
JournalKnowledge-Based Systems
Volume110
DOIs
Publication statusPublished - 2016

Fingerprint

Dive into the research topics of 'Optimizing the number of trees in a decision forest to discover a subforest with high ensemble accuracy using a genetic algorithm'. Together they form a unique fingerprint.

Cite this