Abstract

Decision trees are a popular method of data-mining and knowledge discovery, capable of extracting hidden information from datasets consisting of both nominal and numerical attributes. However, their need to test the suitability of every attribute at every tree node, in addition to testing every possible split-point for every numerical attribute can be expensive computationally, particularly for datasets with high dimensionality. This paper proposes a method for speeding up the decision tree induction process called SPAARC, consisting of two components to address these issues – sampling of the numeric attribute tree-node split-points and dynamically adjusting the node attribute selection space. Further, these methods can be applied to almost any decision tree algorithm. To confirm its validity, SPAARC has been tested and compared against an implementation of the CART algorithm using 18 freely-available datasets from the UCI data repository. Results from this testing indicate the two components of SPAARC combined have minimal effect on decision tree classification accuracy yet reduce model build times by as much as 69%.
Original languageEnglish
Title of host publicationData mining
Subtitle of host publication16th Australasian conference, AusDM 2018, revised selected papers
EditorsRafiqul Islam, Yun Sing Koh, Yanchang Zhao, Graco Warwick, David Stirling, Chang-Tsun Li, Zahidul Islam
Place of PublicationSingapore
PublisherSpringer
Pages43-55
Number of pages13
Volume996
ISBN (Electronic)9789811366611
ISBN (Print)9789811366604
DOIs
Publication statusPublished - 2019
Event16th Australasian Data Mining Conference (AusDM 2018) - Charles Sturt University, Bathurst, Australia
Duration: 28 Nov 201830 Nov 2018
https://web.archive.org/web/20181122224709/https://ausdm18.ausdm.org/ (Conference website)
https://www.springer.com/us/book/9789811366604 (link to conf proceedings book)

Publication series

NameCommunications in Computer and Information Science
PublisherSpringer
Volume996
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference16th Australasian Data Mining Conference (AusDM 2018)
Country/TerritoryAustralia
CityBathurst
Period28/11/1830/11/18
OtherThe Australasian Data Mining Conference (AusDM) has established itself as the premier Australasian meeting for both practitioners and researchers in data mining. It is devoted to the art and science of intelligent analysis of (usually big) data sets for meaningful (and previously unknown) insights. This conference will enable the sharing and learning of research and progress in the local context and new breakthroughs in data mining algorithms and their applications across all industries.
Internet address

Fingerprint

Dive into the research topics of 'SPAARC: A fast decision tree algorithm'. Together they form a unique fingerprint.

Cite this