Rank forest: Systematic attribute sub-spacing in decision forest

Zaheer Babar, Md Zahidul Islam, Sameen Mansha

Research output: Book chapter/Published conference paperConference paperpeer-review

Abstract

Decision Trees are well known classification algorithms that are also appreciated for their capacity for knowledge discovery. In the literature two major shortcomings of decision trees have been pointed out: (1) instability, and (2) high computational cost. These problems have been addressed to some extent through ensemble learning techniques such as Random Forest. Unlike decision trees where the whole attribute space of a dataset is used to discover the best test attribute for a node, in Random Forest a random subspace of attributes is first selected from which the test attribute for a node is then identified. The property that randomly selects an attribute subspace can cause the selection of all/many poor quality attributes in a subspace resulting in an individual tree with low accuracy. Therefore, in this paper we propose a probabilistic selection of attributes (instead of a random selection) where the probability of the selection of an attribute is proportionate to its quality. Although we developed this approach independently, after the research was completed we discovered that some existing techniques also took the same approach. While in this paper we use mutual information as a measure of an attribute quality, the papers in the literature used information gain ratio and a t-test as the measure. The proposed technique has been evaluated using nine different datasets and a stable performance can be seen in terms of the accuracy (ensemble accuracy and individual tree accuracy) and efficiency.
Original languageEnglish
Title of host publicationProceedings of the 15th Australasian Data Mining Conference (AusDM 2017)
EditorsYee Ling Boo, David Stirling, Lianhua Chi, Lin Liu, Kok-Leong Ong, Graham Williams
Place of PublicationSingapore
PublisherSpringer
Pages24-37
Number of pages14
ISBN (Electronic)9789811302923
ISBN (Print)9789811302916
DOIs
Publication statusPublished - 01 Jan 2018
Event15th Australasian Data Mining Conference: AusDM 2017 - Crown Metropol, Melbourne, Australia
Duration: 19 Aug 201725 Aug 2017
http://ausdm17.azurewebsites.net/ (Conference website)

Publication series

NameCommunications in Computer and Information Science
PublisherSpringer
Volume845
ISSN (Electronic)1865-0929

Conference

Conference15th Australasian Data Mining Conference
CountryAustralia
CityMelbourne
Period19/08/1725/08/17
OtherThe Australasian Data Mining Conference has established itself as the premier Australasian meeting for both practitioners and researchers in data mining. It is devoted to the art and science of intelligent analysis of (usually big) data sets for meaningful (and previously unknown) insights. This conference will enable the sharing and learning of research and progress in the local context and new breakthroughs in data mining algorithms and their applications across all industries.

Since AusDM’02 the conference has showcased research in data mining, providing a forum for presenting and discussing the latest research and developments. Since 2006, all proceedings have been printed as volumes in the CRPIT series. Built on this tradition, AusDM’17 will facilitate the cross-disciplinary exchange of ideas, experience and potential research directions. Specifically, the conference seeks to showcase: Research Prototypes; Industry Case Studies; Practical Analytics Technology; and Research Student Projects. AusDM’16 will be a meeting place for pushing forward the frontiers of data mining in academia and industry. This year, AusDM’17 is proud to be co-located with numerous conferences including IJCAI, AAI, KSEM and IFIP in Melbourne, Australia.
Internet address

Fingerprint Dive into the research topics of 'Rank forest: Systematic attribute sub-spacing in decision forest'. Together they form a unique fingerprint.

Cite this