Abstract
In machine learning, the nature of the dataset itself such as convexity of the data point sets affects the right choice of clustering algorithm to give good performance. This brief paper first focuses on how data convexity influences the clustering performance on biomedical datasets. Then it addresses the main challenges of two well-known clustering groups which are centroid-based and density-based clustering. These techniques typically require a set of parameters to be provided by the user before the algorithms can perform well in terms of good clustering and give the optimal number of clusters. Two parameter independent clustering techniques utilizing unique neighborhood sets (UNSs) called Parameter Independent Convex Centroid-based Clustering (ConvexClust) for convex-dominated datasets and Parameter Independent Non-Convex Density-based Clustering (NonConvexClust) for nonconvex-dominated datasets are introduced. The ConvexClust and NonConvex Clust algorithms are extensively evaluated on real-world biomedical datasets. Their performances are also compared with other clustering algorithms using evaluation criteria such as SSE, entropy and purity. The results have revealed the good performance of the proposed parameter-independent clustering techniques and also shown that most of the biomedical datasets in the experiments demonstrated their tendency towards convex-dominated data point sets.
Original language | English |
---|---|
Pages (from-to) | 765-772 |
Number of pages | 8 |
Journal | IEEE/ACM Transactions on Computational Biology and Bioinformatics |
Volume | 18 |
Issue number | 2 |
Early online date | 06 Mar 2020 |
DOIs | |
Publication status | Published - 01 Apr 2021 |