Abstract
In this study we present a novel framework that uses two layers/steps of imputation namely the Early-Imputation step and the Advanced-Imputation step. In the early imputation step we first impute the missing values (both numerical and categorical) using existing techniques. The main goal of this step is to carry out an initial imputation and thereby refine the records having missing values so that they can be used in the second layer of imputation through an existing technique called DMI. The original DMI ignores the records having missing values. Therefore, we argue that if a data set has a huge number of missing values then the imputation accuracy of DMI may suffer significantly since it ignores a huge number of records. In this study we present four versions of the framework and compare them with three existing techniques on two natural data sets that are publicly available. We use four evaluation criteria and two statistical significance analyses. Our experimental results indicate a clear superiority of the proposed framework over the existing techniques.
Original language | English |
---|---|
Title of host publication | AusDM 2013 |
Subtitle of host publication | Proceedings of the Eleventh Australasian Data Mining Conference (AusDM 13) |
Place of Publication | Australia |
Publisher | Australian Computer Society Inc |
Pages | 1-12 |
Number of pages | 12 |
Volume | 146 |
Publication status | Published - 2013 |
Event | The 11th Australian Data Mining Conference: AusDM 2013 - Australian National University, Canberra, Australia Duration: 13 Nov 2013 → 15 Nov 2013 http://ausdm13.ausdm.org/ |
Conference
Conference | The 11th Australian Data Mining Conference |
---|---|
Country/Territory | Australia |
City | Canberra |
Period | 13/11/13 → 15/11/13 |
Internet address |