Abstract

In this study we present a novel framework that uses two layers/steps of imputation namely the Early-Imputation step and the Advanced-Imputation step. In the early imputation step we first impute the missing values (both numerical and categorical) using existing techniques. The main goal of this step is to carry out an initial imputation and thereby refine the records having missing values so that they can be used in the second layer of imputation through an existing technique called DMI. The original DMI ignores the records having missing values. Therefore, we argue that if a data set has a huge number of missing values then the imputation accuracy of DMI may suffer significantly since it ignores a huge number of records. In this study we present four versions of the framework and compare them with three existing techniques on two natural data sets that are publicly available. We use four evaluation criteria and two statistical significance analyses. Our experimental results indicate a clear superiority of the proposed framework over the existing techniques.
Original languageEnglish
Title of host publicationAusDM 2013
Subtitle of host publicationProceedings of the Eleventh Australasian Data Mining Conference (AusDM 13)
Place of PublicationAustralia
PublisherAustralian Computer Society Inc
Pages1-12
Number of pages12
Volume146
Publication statusPublished - 2013
EventThe 11th Australian Data Mining Conference: AusDM 2013 - Australian National University, Canberra, Australia
Duration: 13 Nov 201315 Nov 2013

Conference

ConferenceThe 11th Australian Data Mining Conference
Country/TerritoryAustralia
CityCanberra
Period13/11/1315/11/13

Cite this