Clustering Biomedical and Gene Expression Datasets with Kernel Density and Unique Neighborhood Set based Vein Detection  

Md Anisur Rahman, Li-Minn Ang, Kah Phooi Seng

Research output: Contribution to journalArticle

Abstract

It is a crucial need for a clustering technique to produce high-quality clusters from biomedical and gene expression datasets without requiring any user inputs. Therefore, in this paper we present a clustering technique called KUVClust that produces high-quality clusters when applied on biomedical and gene expression datasets without requiring any user inputs. The KUVClust algorithm uses three concepts namely multivariate kernel density estimation, unique closest neighborhood set and vein-based clustering. Although these concepts are known in the literature, KUVClust combines the concepts in a novel manner to achieve high-quality clustering results. The performance of KUVClust is compared with established clustering techniques on real-world biomedical and gene expression datasets. The comparisons were evaluated in terms of three criteria (purity, entropy, and sum of squared error (SSE)). Experimental results demonstrated the superiority of the proposed technique over the existing techniques for clustering both the low dimensional biomedical and high dimensional gene expressions datasets used in the experiments.
Original languageEnglish
JournalInformation Systems
Publication statusAccepted/In press - 07 Jan 2020

Fingerprint

Gene expression
Entropy
Experiments

Cite this

@article{64b63d1032a847c2a6a94e9e190142e3,
title = "Clustering Biomedical and Gene Expression Datasets with Kernel Density and Unique Neighborhood Set based Vein Detection  ",
abstract = "It is a crucial need for a clustering technique to produce high-quality clusters from biomedical and gene expression datasets without requiring any user inputs. Therefore, in this paper we present a clustering technique called KUVClust that produces high-quality clusters when applied on biomedical and gene expression datasets without requiring any user inputs. The KUVClust algorithm uses three concepts namely multivariate kernel density estimation, unique closest neighborhood set and vein-based clustering. Although these concepts are known in the literature, KUVClust combines the concepts in a novel manner to achieve high-quality clustering results. The performance of KUVClust is compared with established clustering techniques on real-world biomedical and gene expression datasets. The comparisons were evaluated in terms of three criteria (purity, entropy, and sum of squared error (SSE)). Experimental results demonstrated the superiority of the proposed technique over the existing techniques for clustering both the low dimensional biomedical and high dimensional gene expressions datasets used in the experiments.",
author = "Rahman, {Md Anisur} and Li-Minn Ang and Seng, {Kah Phooi}",
year = "2020",
month = "1",
day = "7",
language = "English",
journal = "Information Systems",
issn = "0306-4379",
publisher = "Elsevier",

}

TY - JOUR

T1 - Clustering Biomedical and Gene Expression Datasets with Kernel Density and Unique Neighborhood Set based Vein Detection  

AU - Rahman, Md Anisur

AU - Ang, Li-Minn

AU - Seng, Kah Phooi

PY - 2020/1/7

Y1 - 2020/1/7

N2 - It is a crucial need for a clustering technique to produce high-quality clusters from biomedical and gene expression datasets without requiring any user inputs. Therefore, in this paper we present a clustering technique called KUVClust that produces high-quality clusters when applied on biomedical and gene expression datasets without requiring any user inputs. The KUVClust algorithm uses three concepts namely multivariate kernel density estimation, unique closest neighborhood set and vein-based clustering. Although these concepts are known in the literature, KUVClust combines the concepts in a novel manner to achieve high-quality clustering results. The performance of KUVClust is compared with established clustering techniques on real-world biomedical and gene expression datasets. The comparisons were evaluated in terms of three criteria (purity, entropy, and sum of squared error (SSE)). Experimental results demonstrated the superiority of the proposed technique over the existing techniques for clustering both the low dimensional biomedical and high dimensional gene expressions datasets used in the experiments.

AB - It is a crucial need for a clustering technique to produce high-quality clusters from biomedical and gene expression datasets without requiring any user inputs. Therefore, in this paper we present a clustering technique called KUVClust that produces high-quality clusters when applied on biomedical and gene expression datasets without requiring any user inputs. The KUVClust algorithm uses three concepts namely multivariate kernel density estimation, unique closest neighborhood set and vein-based clustering. Although these concepts are known in the literature, KUVClust combines the concepts in a novel manner to achieve high-quality clustering results. The performance of KUVClust is compared with established clustering techniques on real-world biomedical and gene expression datasets. The comparisons were evaluated in terms of three criteria (purity, entropy, and sum of squared error (SSE)). Experimental results demonstrated the superiority of the proposed technique over the existing techniques for clustering both the low dimensional biomedical and high dimensional gene expressions datasets used in the experiments.

M3 - Article

JO - Information Systems

JF - Information Systems

SN - 0306-4379

ER -