Abstract
Data pre-processing and cleansing play a vital role in data mining for ensuring good quality of data. Data cleansing tasks include imputation of missing values, and identification and correction of incorrect/noisy data. In this paper, we present a novel approach called Co-appearance based Analysis for Incorrect Records and Attribute-values Detection (CAIRAD). For a data set having incorrect/noisy values CAIRAD separates the noisy records from the clean records. It thereby produces two data sets; a clean data set and a data set having all noisy records. It also reports noisy attribute values of each noisy record. We evaluate CAIRAD on four publicly available natural data sets by comparing its performance with the performance of two high quality existing techniques namely RDCL and EDIR. We use various patterns (of noisy values) each having different noise levels. Several evaluation criteria such as error recall (ER), error precision (EP), F-measure, record removal ratio (rRR), and area under a receiver operating characteristics curve (AUC) are used. Our experimental results indicate that CAIRAD performs significantly better (based on t-test analysis) than RDCL and EDIR.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2012 International Joint Conference on Neural Networks (IJCNN) |
Place of Publication | United States |
Publisher | IEEE, Institute of Electrical and Electronics Engineers |
Pages | 2190-2199 |
Number of pages | 10 |
ISBN (Electronic) | 9781467314886 |
DOIs | |
Publication status | Published - 2012 |
Event | IEEE International Joint Conference on Neural Networks: IJCNN 2012 - Brisbane Convention Centre, Brisbane, Australia Duration: 10 Jun 2012 → 15 Jun 2012 https://web.archive.org/web/20120510232940/http://www.ieee-wcci2012.org/ (Conference website) |
Publication series
Name | |
---|---|
ISSN (Print) | 2161-4393 |
Conference
Conference | IEEE International Joint Conference on Neural Networks |
---|---|
Country/Territory | Australia |
City | Brisbane |
Period | 10/06/12 → 15/06/12 |
Other | The 2012 IEEE World Congress on Computational Intelligence (IEEE WCCI 2012) is the largest technical event in the field of computational intelligence. It will host three conferences: the 2012 International Joint Conference on Neural Networks (IJCNN 2012), the 2012 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2012), and the 2012 IEEE Congress on Evolutionary Computation (IEEE CEC 2012). IEEE WCCI 2012 will be held in Brisbane, a awe-inspiring city situated along the Brisbane River and the eastern coast line of Australia. The congress will provide a stimulating forum for scientists, engineers, educators, and students from all over the world to discuss and present their research findings on computational intelligence. |
Internet address |
|