Decision Trees are well known classification algorithms that are also appreciated for their capacity for knowledge discovery. In the literature two major shortcomings of decision trees have been pointed out: (1) instability, and (2) high computational cost. These problems have been addressed to some extent through ensemble learning techniques such as Random Forest. Unlike decision trees where the whole attribute space of a dataset is used to discover the best test attribute for a node, in Random Forest a random subspace of attributes is first selected from which the test attribute for a node is then identified. The property that randomly selects an attribute subspace can cause the selection of all/many poor quality attributes in a subspace resulting in an individual tree with low accuracy. Therefore, in this paper we propose a probabilistic selection of attributes (instead of a random selection) where the probability of the selection of an attribute is proportionate to its quality. Although we developed this approach independently, after the research was completed we discovered that some existing techniques also took the same approach. While in this paper we use mutual information as a measure of an attribute quality, the papers in the literature used information gain ratio and a t-test as the measure. The proposed technique has been evaluated using nine different datasets and a stable performance can be seen in terms of the accuracy (ensemble accuracy and individual tree accuracy) and efficiency.
|Title of host publication||Proceedings of the 15th Australasian Data Mining Conference (AusDM 2017)|
|Editors||Yee Ling Boo, David Stirling, Lianhua Chi, Lin Liu, Kok-Leong Ong, Graham Williams|
|Place of Publication||Singapore|
|Number of pages||14|
|Publication status||Published - 01 Jan 2018|
|Event||The 15th Australasian Data Mining Conference: AusDM 2017 - Crown Metropol, Melbourne, Australia|
Duration: 19 Aug 2017 → 25 Aug 2017
https://web.archive.org/web/20170725000803/http://ausdm17.azurewebsites.net/ (Conference website)
|Name||Communications in Computer and Information Science|
|Conference||The 15th Australasian Data Mining Conference|
|Period||19/08/17 → 25/08/17|
|Other||The Australasian Data Mining Conference has established itself as the premier Australasian meeting for both practitioners and researchers in data mining. It is devoted to the art and science of intelligent analysis of (usually big) data sets for meaningful (and previously unknown) insights. This conference will enable the sharing and learning of research and progress in the local context and new breakthroughs in data mining algorithms and their applications across all industries. |
Since AusDM’02 the conference has showcased research in data mining, providing a forum for presenting and discussing the latest research and developments. Since 2006, all proceedings have been printed as volumes in the CRPIT series. Built on this tradition, AusDM’17 will facilitate the cross-disciplinary exchange of ideas, experience and potential research directions. Specifically, the conference seeks to showcase: Research Prototypes; Industry Case Studies; Practical Analytics Technology; and Research Student Projects. AusDM’16 will be a meeting place for pushing forward the frontiers of data mining in academia and industry. This year, AusDM’17 is proud to be co-located with numerous conferences including IJCAI, AAI, KSEM and IFIP in Melbourne, Australia.