26 Citations (Scopus)
483 Downloads (Pure)

Abstract

Data pre-processing plays a vital role in data mining for ensuring good quality of data. In general data pre-processing tasks include imputation of missing values, identification of outliers, smoothening out of noisy data and correction of inconsistent data. In this paper, we present an efficient missing value imputation technique called DMI, which makes use of a decision tree and expectation maximization (EM) algorithm. We argue that the correlations among attributes within a horizontal partition of a data set can be higher than the correlations over the whole data set. For some existing algorithms such as EM based imputation (EMI) accuracy of imputation is expected to be better for a data set having higher correlations than a data set having lower correlations. Therefore, our technique (DMI) applies EMI on various horizontal segments (of a data set) where correlations among attributes are high. We evaluate DMI on two publicly available natural data sets by comparing its performance with the performance of EMI. We use various patterns of missing values each having different missing ratios up to 10%. Several evaluation criteria such as coefficient of determination (R 2), Index of agreement (d_2) and root mean squared error (RMSE) are used. Our initial experimental results indicate that DMI performs significantly better than EMI.
Original languageEnglish
Title of host publication9th Australasian Data Mining Conference
Subtitle of host publicationAusDM 2011
EditorsV Estivill-Castro, S Simoff
Place of PublicationSydney, Australia
PublisherAustralian Computer Society Inc
Pages41-50
Number of pages10
Volume121
ISBN (Electronic)9781921770029
Publication statusPublished - 2011
EventThe 9th Australasian Data Mining Conference: AusDM 2011 - University of Ballarat, Ballarat, Australia
Duration: 01 Dec 201102 Dec 2011

Publication series

NameConferences in Research and Practice in Information Technology Series
PublisherAustralian Computer Society
Volume121
ISSN (Print)1445-1336

Conference

ConferenceThe 9th Australasian Data Mining Conference
Country/TerritoryAustralia
CityBallarat
Period01/12/1102/12/11

Fingerprint

Dive into the research topics of 'A Decision Tree-based Missing Value Imputation Technique for Data Pre-processing'. Together they form a unique fingerprint.

Cite this