Knowledge Discovery through SysFor: A Systematically Developed Forest of Multiple Decision Trees

Md Zahidul Islam, Helen Giggins

Research output: Book chapter/Published conference paperConference paperpeer-review

37 Citations (Scopus)
8 Downloads (Pure)

Abstract

Decision tree based classification algorithms like C4.5 and Explore build a single tree from a data set. The two main purposes of building a decision tree are to extract various patterns/logic-rules existing in a dataset, and to predict the class attribute value of an unlabeled record. Sometimes a set of decision trees, rather than just a single tree, is also generated from a dataset. A set of multiple trees, when used wisely, typically have better prediction accuracy on unlabeled records. Existing multiple tree techniques are catered for high dimensional data sets and therefore unable to build many trees from low dimensional data sets. In this paper we present a novel technique called Sys-For that can build many trees even from a low dimensional data set. Another strength of the technique is that instead of building multiple trees using any attribute (good or bad) it uses only those attributes that have high classification capabilities. We also present two novel voting techniques in order to predict the class value of an unlabeled record through the collective use of multiple trees. Experimental results demonstrate that SysFor is suitable for multiple pattern extraction and knowledge discovery from both low dimensional and high dimensional data sets by building a number of good quality decision trees.Moreover, it also has prediction accuracy higher than the accuracy of several existing techniques that have previously been shown as having high performance.
Original languageEnglish
Title of host publication9th Australasian Data Mining Conference
Subtitle of host publicationAusDM 2011
EditorsV Estivill-Castro, S Simoff
Place of PublicationSydney, Australia
PublisherAustralian Computer Society Inc
Pages195-204
Number of pages6
Volume121
ISBN (Print)978-192177002-9
Publication statusPublished - 2011
EventThe 9th Australasian Data Mining Conference: AusDM 2011 - University of Ballarat, Ballarat, Australia
Duration: 01 Dec 201102 Dec 2011

Publication series

NameConferences in Research and Practice in Information Technology Series
PublisherAustralian Computer Society
Volume121
ISSN (Print)1445-1336

Conference

ConferenceThe 9th Australasian Data Mining Conference
Country/TerritoryAustralia
CityBallarat
Period01/12/1102/12/11

Fingerprint

Dive into the research topics of 'Knowledge Discovery through SysFor: A Systematically Developed Forest of Multiple Decision Trees'. Together they form a unique fingerprint.

Cite this