Data Quality Improvement by Imputation of Missing Values

Research output: Book chapter/Published conference paperConference paper

237 Downloads (Pure)

Abstract

Having missing values in a data set is very common due to various reasons including human error, misunderstanding and equipment malfunctioning. Therefore, imputation of missing values is important to improve the quality of a data set. In our previous study we presented an imputation technique called DMI, which we then found better than an existing technique called EMI in terms of a few commonly used imputation evaluation techniques namely co-efficient of determination ( ), index of agreement ( ), root mean squared error ( ) and mean absolute error ( ). These evaluation methods compare an imputed value with the actual value that is assumed to be missing for the sake of the assessment of imputation techniques. However, it is also important to directly evaluate the effectiveness of an imputation technique in producing a data set that is useful for various data mining tasks including classification and clustering. In this study we compare the effectiveness of three imputation techniques called DMI, EMI and SRD in producing data sets that are useful for data mining tasks such as classification. We use two natural data sets called Pima and Credit Approval, introduce artificial missing values (using 32 missing combinations to simulate a range of possible scenarios), impute them separately by the three techniques resulting in three imputed data sets, build decision trees from the imputed data sets, and finally apply the trees on a previously unseen testing data set. Our initial experiments indicate that trees obtained from DMI imputed data sets generally have higher prediction accuracies than the trees obtained from data sets imputed by SRD and EMI. Therefore, the results suggest the effectiveness of DMI for supporting data mining tasks such as classification by decision trees.
Original languageEnglish
Title of host publicationCSIT 2013
Place of PublicationGermany
PublisherSpringer-Verlag London Ltd.
Pages82-88
Number of pages7
ISBN (Electronic)9789793812205
Publication statusPublished - 2013
EventInternational Conference on Computer Science and Information Technology - Yogyakarta, Indonesia, Indonesia
Duration: 16 Jun 201318 Jun 2013

Conference

ConferenceInternational Conference on Computer Science and Information Technology
CountryIndonesia
Period16/06/1318/06/13

Fingerprint Dive into the research topics of 'Data Quality Improvement by Imputation of Missing Values'. Together they form a unique fingerprint.

Cite this