1 Citation (Scopus)
5 Downloads (Pure)

Abstract

Partitioning methods, such as k-means, are popular and useful for clustering. Recently we proposed a new partitioning method for clustering categorical data: using the transfer algorithm to optimize an objective function called within-cluster dispersion. Preliminary experimental results showed that this method outperforms a standard method called k-modes, in terms of the average quality of clustering results. In this paper, we make more advanced efforts to compare the performance of objective functions for categorical data. First we analytically compare the quality of three objective functions: k-medoids, k-modes and within-cluster dispersion. Secondly we measure how well these objectives find true structures in real data sets, by finding their global optima, which we argue is a better measurement than average clustering results. The conclusion is that within-cluster dispersion is generally a better objective for discovering cluster structures. Moreover, we evaluate the performance of various distance measures on within-cluster dispersion, and give some useful observations.
Original languageEnglish
Title of host publicationPKAW 2014
EditorsYang Sok Kim, Byeong Ho Kang, Deborah Richards
Place of PublicationGermany
PublisherSpringer International Publishing AG
Pages16-28
Number of pages13
Volume8863
ISBN (Electronic)9783319133324
ISBN (Print)9783319133317
DOIs
Publication statusPublished - 2014
EventPacific Rim Knowledge Acquisition Workshop - Gold Coast, Australia
Duration: 01 Dec 201402 Dec 2014

Publication series

NameLecture Notes in Computer Science
PublisherSpringer, Cham
Volume8863
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Workshop

WorkshopPacific Rim Knowledge Acquisition Workshop
Country/TerritoryAustralia
Period01/12/1402/12/14

Fingerprint

Dive into the research topics of 'The performance of objective functions for clustering categorical data'. Together they form a unique fingerprint.

Cite this