Missing value imputation using a fuzzy clustering-based EM approach

Research output: Contribution to journalArticlepeer-review

49 Citations (Scopus)

Abstract

Data pre-processing and cleansing play a vital role in data mining by ensuring good quality of data. Data cleansing tasks include imputation of missing values, identification of outliers, and identification and correction of noisy data. In this paper, we present a novel technique called \textit{A \textbf{F}uzzy \textbf{E}xpectation \textbf{M}aximisation and Fuzzy Clustering based Missing Value \textbf{I}mputation Framework for Data Pre-processing (FEMI)}. It imputes numerical and categorical missing values by making an educated guess based on records that are similar to the record having a missing value. While identifying a group of similar records and making a guess based on the group it applies a fuzzy clustering approach and our novel fuzzy expectation maxmisation algorithm. We evaluate FEMI on eight publicly available natural data sets by comparing its performance with the performance of two high quality existing techniques namely EMI and IBLLS. We use thirty two types (patterns) of missing values for each data set. Several evaluation criteria namely co-efficient of determination ($R 2$), index of agreement ($d_2$), root mean squared error ($RMSE$), and mean absolute error ($MAE$) are used. Our experimental results indicate (according to a confidence interval and t-test analysis) that FEMI performs significantly better than EMI and IBLLS.
Original language English 389-422 34 Knowledge and Information Systems 46 2 2015 https://doi.org/10.1007/s10115-015-0822-y Published - 2016

Fingerprint

Dive into the research topics of 'Missing value imputation using a fuzzy clustering-based EM approach'. Together they form a unique fingerprint.