Abstract

Data pre-processing and cleansing play a vital role in data mining by ensuring good quality of data. Data cleansing tasks include imputation of missing values, identification of outliers, and identification and correction of noisy data. In this paper, we present a novel technique called \textit{A \textbf{F}uzzy \textbf{E}xpectation \textbf{M}aximisation and Fuzzy Clustering based Missing Value \textbf{I}mputation Framework for Data Pre-processing (FEMI)}. It imputes numerical and categorical missing values by making an educated guess based on records that are similar to the record having a missing value. While identifying a group of similar records and making a guess based on the group it applies a fuzzy clustering approach and our novel fuzzy expectation maxmisation algorithm. We evaluate FEMI on eight publicly available natural data sets by comparing its performance with the performance of two high quality existing techniques namely EMI and IBLLS. We use thirty two types (patterns) of missing values for each data set. Several evaluation criteria namely co-efficient of determination ($R 2$), index of agreement ($d_2$), root mean squared error ($RMSE$), and mean absolute error ($MAE$) are used. Our experimental results indicate (according to a confidence interval and t-test analysis) that FEMI performs significantly better than EMI and IBLLS.
Original languageEnglish
Pages (from-to)389-422
Number of pages34
JournalKnowledge and Information Systems
Volume46
Issue number2
Early online date2015
DOIs
Publication statusPublished - 2016

Fingerprint

Dive into the research topics of 'Missing value imputation using a fuzzy clustering-based EM approach'. Together they form a unique fingerprint.

Cite this