Data Convexity and Parameter Independent Clustering for Biomedical Datasets

Md Anisur Rahman, Li-Minn Ang, Kah Phooi Seng

Research output: Contribution to journalArticle

Abstract

In machine learning, the nature of the dataset itself such as convexity of the data point sets affects the right choice of clustering algorithm to give good performance. This brief paper first focuses on how data convexity influences the clustering performance on biomedical datasets. Then it addresses the main challenges of two well-known clustering groups which are centroid-based and density-based clustering. These techniques typically require a set of parameters to be provided by the user before the algorithms can perform well in terms of good clustering and give the optimal number of clusters. Two parameter independent clustering techniques utilizing unique neighborhood sets (UNSs) called Parameter Independent Convex Centroid-based Clustering (ConvexClust) for convex-dominated datasets and Parameter Independent Non-Convex Density-based Clustering (NonConvexClust) for nonconvex-dominated datasets are introduced. The ConvexClust and NonConvex Clust algorithms are extensively evaluated on real-world biomedical datasets. Their performances are also compared with other clustering algorithms using evaluation criteria such as SSE, entropy and purity. The results have revealed the good performance of the proposed parameter-independent clustering techniques and also shown that most of the biomedical datasets in the experiments demonstrated their tendency towards convex-dominated data point sets.
Original languageEnglish
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
DOIs
Publication statusPublished - 06 Mar 2020

Fingerprint Dive into the research topics of 'Data Convexity and Parameter Independent Clustering for Biomedical Datasets'. Together they form a unique fingerprint.

  • Cite this