Abstract
Decision trees are a popular method of data-mining and knowledge discovery, capable of extracting hidden information from datasets consisting of both nominal and numerical attributes. However, their need to test the suitability of every attribute at every tree node, in addition to testing every possible split-point for every numerical attribute can be expensive computationally, particularly for datasets with high dimensionality. This paper proposes a method for speeding up the decision tree induction process called SPAARC, consisting of two components to address these issues – sampling of the numeric attribute tree-node split-points and dynamically adjusting the node attribute selection space. Further, these methods can be applied to almost any decision tree algorithm. To confirm its validity, SPAARC has been tested and compared against an implementation of the CART algorithm using 18 freely-available datasets from the UCI data repository. Results from this testing indicate the two components of SPAARC combined have minimal effect on decision tree classification accuracy yet reduce model build times by as much as 69%.
Original language | English |
---|---|
Title of host publication | Data mining |
Subtitle of host publication | 16th Australasian conference, AusDM 2018, revised selected papers |
Editors | Rafiqul Islam, Yun Sing Koh, Yanchang Zhao, Graco Warwick, David Stirling, Chang-Tsun Li, Zahidul Islam |
Place of Publication | Singapore |
Publisher | Springer |
Pages | 43-55 |
Number of pages | 13 |
Volume | 996 |
ISBN (Electronic) | 9789811366611 |
ISBN (Print) | 9789811366604 |
DOIs | |
Publication status | Published - 2019 |
Event | 16th Australasian Data Mining Conference (AusDM 2018) - Charles Sturt University, Bathurst, Australia Duration: 28 Nov 2018 → 30 Nov 2018 https://web.archive.org/web/20181122224709/https://ausdm18.ausdm.org/ (Conference website) https://www.springer.com/us/book/9789811366604 (link to conf proceedings book) |
Publication series
Name | Communications in Computer and Information Science |
---|---|
Publisher | Springer |
Volume | 996 |
ISSN (Print) | 1865-0929 |
ISSN (Electronic) | 1865-0937 |
Conference
Conference | 16th Australasian Data Mining Conference (AusDM 2018) |
---|---|
Country/Territory | Australia |
City | Bathurst |
Period | 28/11/18 → 30/11/18 |
Other | The Australasian Data Mining Conference (AusDM) has established itself as the premier Australasian meeting for both practitioners and researchers in data mining. It is devoted to the art and science of intelligent analysis of (usually big) data sets for meaningful (and previously unknown) insights. This conference will enable the sharing and learning of research and progress in the local context and new breakthroughs in data mining algorithms and their applications across all industries. |
Internet address |
|