Structurally diverse decision trees are important for knowledge discovery and classification/prediction accuracy. Over the years, researchers have devoted much effort to the development of algorithms to increase diversity among the trees within an ensemble. While Kappa is commonly used to measure diversity among the decision trees, it does not measure the ability of the tree building algorithms to introduce diversity. Further, Kappa does not consider the structural diversity amongst the trees. Instead, Kappa measures the diversity of the predictions made from the trees produced, and are dependent on the datasets used. This paper presents a novel data-independent metric, called R index, for measuring the diversity that can be introduced by a decision forest algorithm without building the entire decision forest. The proposed measure is applied to five well-known algorithms that involve bagging and random subspacing. An efficient practical approach for calculating the R index empirically – R finder – is also proposed, and is implemented. Both R finder and Kappa were applied to thirty-two publicly available benchmark datasets under various algorithms to estimate the resulting diversity. The results indicate a generally strong negative correlation between R finder and Kappa, implying that R finder is effective at estimating the diversity of trees without the added computational costs associated with calculating Kappa.
Original languageEnglish
Article number111435
JournalKnowledge-Based Systems
Publication statusPublished - 28 Feb 2024


Dive into the research topics of 'Estimating the structural diversity introduced by decision forest algorithms : A probabilistic approach'. Together they form a unique fingerprint.

Cite this