Abstract
Having missing values in a data set is very common due to various reasons including human error, misunderstanding and equipment malfunctioning. Therefore, imputation of missing values is important to improve the quality of a data set. In our previous study we presented an imputation technique called DMI, which we then found better than an existing technique called EMI in terms of a few commonly used imputation evaluation techniques namely co-efficient of determination ( ), index of agreement ( ), root mean squared error ( ) and mean absolute error ( ). These evaluation methods compare an imputed value with the actual value that is assumed to be missing for the sake of the assessment of imputation techniques. However, it is also important to directly evaluate the effectiveness of an imputation technique in producing a data set that is useful for various data mining tasks including classification and clustering. In this study we compare the effectiveness of three imputation techniques called DMI, EMI and SRD in producing data sets that are useful for data mining tasks such as classification. We use two natural data sets called Pima and Credit Approval, introduce artificial missing values (using 32 missing combinations to simulate a range of possible scenarios), impute them separately by the three techniques resulting in three imputed data sets, build decision trees from the imputed data sets, and finally apply the trees on a previously unseen testing data set. Our initial experiments indicate that trees obtained from DMI imputed data sets generally have higher prediction accuracies than the trees obtained from data sets imputed by SRD and EMI. Therefore, the results suggest the effectiveness of DMI for supporting data mining tasks such as classification by decision trees.
Original language | English |
---|---|
Title of host publication | CSIT 2013 |
Place of Publication | Germany |
Publisher | Springer-Verlag London Ltd. |
Pages | 82-88 |
Number of pages | 7 |
ISBN (Electronic) | 9789793812205 |
Publication status | Published - 2013 |
Event | International Conference on Computer Science and Information Technology - Yogyakarta, Indonesia, Indonesia Duration: 16 Jun 2013 → 18 Jun 2013 |
Conference
Conference | International Conference on Computer Science and Information Technology |
---|---|
Country/Territory | Indonesia |
Period | 16/06/13 → 18/06/13 |