Abstract

Data pre-processing and cleansing play a vital role in data mining for ensuring good quality of data. Data cleansing tasks include imputation of missing values, and identification and correction of incorrect/noisy data. In this paper, we present a novel approach called Co-appearance based Analysis for Incorrect Records and Attribute-values Detection (CAIRAD). For a data set having incorrect/noisy values CAIRAD separates the noisy records from the clean records. It thereby produces two data sets; a clean data set and a data set having all noisy records. It also reports noisy attribute values of each noisy record. We evaluate CAIRAD on four publicly available natural data sets by comparing its performance with the performance of two high quality existing techniques namely RDCL and EDIR. We use various patterns (of noisy values) each having different noise levels. Several evaluation criteria such as error recall (ER), error precision (EP), F-measure, record removal ratio (rRR), and area under a receiver operating characteristics curve (AUC) are used. Our experimental results indicate that CAIRAD performs significantly better (based on t-test analysis) than RDCL and EDIR.
Original languageEnglish
Title of host publicationProceedings of the 2012 International Joint Conference on Neural Networks (IJCNN)
Place of PublicationUnited States
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages2190-2199
Number of pages10
ISBN (Electronic)9781467314886
DOIs
Publication statusPublished - 2012
EventIEEE International Joint Conference on Neural Networks: IJCNN 2012 - Brisbane Convention Centre, Brisbane, Australia
Duration: 10 Jun 201215 Jun 2012
https://web.archive.org/web/20120510232940/http://www.ieee-wcci2012.org/ (Conference website)

Publication series

Name
ISSN (Print)2161-4393

Conference

ConferenceIEEE International Joint Conference on Neural Networks
Country/TerritoryAustralia
CityBrisbane
Period10/06/1215/06/12
OtherThe 2012 IEEE World Congress on Computational Intelligence (IEEE WCCI 2012) is the largest technical event in the field of computational intelligence. It will host three conferences: the 2012 International Joint Conference on Neural Networks (IJCNN 2012), the 2012 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2012), and the 2012 IEEE Congress on Evolutionary Computation (IEEE CEC 2012). IEEE WCCI 2012 will be held in Brisbane, a awe-inspiring city situated along the Brisbane River and the eastern coast line of Australia. The congress will provide a stimulating forum for scientists, engineers, educators, and students from all over the world to discuss and present their research findings on computational intelligence.
Internet address

Fingerprint

Dive into the research topics of 'CAIRAD: A Co-appearance based analysis for incorrect records and attribute-values detection'. Together they form a unique fingerprint.

Cite this