A fast algorithm for finding correlation clusters in noise data

Li. Jiuyong, Xiaodi Huang, Clinton Selke, Jianming Yong

Research output: Book chapter/Published conference paperConference paperpeer-review

8 Citations (Scopus)
31 Downloads (Pure)

Abstract

Noise signi'cantly affects cluster quality. Conventional clustering methods hardly detect clusters in a data set containing a large amount of noise. Projected clustering sheds light on identifying correlation clusters in such a dataset. In order to exclude noise points which are usually scattered in a subspace, data points are projected to form dense areas in the subspace that are regardedas correlation clusters. However, we found that the existing methods for the projected clustering did not work very well with noise data, since they employ randomly generated seeds (micro clusters) to trade-off the clustering quality. In this paper, we propose a divisive method for the projected clustering that does not relyon random seeds. The proposed algorithm is capable of producing higher quality correlation clusters from noise data in a more ef'cient way than an agglomeration projected algorithm. We experimentally show that our algorithm captures correlation clusters in noise data better than a well-known projected clustering method.
Original languageEnglish
Title of host publicationPAKDD 2007
Place of PublicationBerlin
PublisherSpringer-Verlag London Ltd.
Pages639-647
Number of pages9
ISBN (Electronic)9783540717003
DOIs
Publication statusPublished - 2007
EventPacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) - Nanjing, China, China
Duration: 22 May 200725 May 2007

Conference

ConferencePacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD)
Country/TerritoryChina
Period22/05/0725/05/07

Fingerprint

Dive into the research topics of 'A fast algorithm for finding correlation clusters in noise data'. Together they form a unique fingerprint.

Cite this