Unique neighborhood set parameter independent density-based clustering with outlier detection

Md Anisur Rahman, Li-Minn Ang, Kah Phooi Seng

Research output: Contribution to journalArticlepeer-review

22 Citations (Scopus)
36 Downloads (Pure)

Abstract

Machine learning algorithms such as clustering, classification, and regression typically require a set of parameters to be provided by the user before the algorithms can perform well. In this paper, we present parameter independent density-based clustering algorithms by utilizing two novel concepts for neighborhood functions which we term as Unique Closest Neighbor (UCN) and Unique Neighborhood Set (UNS). We discuss two derivatives of the proposed parameter independent density-based clustering (PIDC) algorithms, termed PIDC-WO and PIDC-O. PIDC-WO has been designed for datasets that do not contain explicit outliers whereas PIDC-O provides very good performance even on datasets with the presence of outliers. PIDC-O uses a two-stage processing where the first stage identifies and removes outliers before passing the records to the second stage to perform the density-based clustering. The PIDC algorithms are extensively evaluated and compared with other well-known clustering algorithms on several datasets using three cluster evaluation criteria (F-measure, entropy and purity) used in the literature, and are shown to perform effectively both for the clustering and outlier detection objectives.
Original languageEnglish
Pages (from-to)44707-44717
Number of pages11
JournalIEEE Access
Volume6
Issue number1
DOIs
Publication statusPublished - 13 Aug 2018

Fingerprint

Dive into the research topics of 'Unique neighborhood set parameter independent density-based clustering with outlier detection'. Together they form a unique fingerprint.

Cite this