Abstract
We present a novel fuzzy clustering technique called CRUDAW that allows a data miner to assign weights on the attributes of a data set based on their importance (to the data miner) for clustering. The technique uses a novel approach to select initial seeds deterministically (not randomly) using the density of the records of a data set. CRUDAW also selects the initial fuzzy membership degrees deterministically. Moreover, it uses a novel approach for measuring distance considering the user defined weights of the attributes. While measuring the distance between the values of a categorical attribute the technique takes the similarity of the values into consideration instead of considering the distance to be either 0 or 1. Complete algorithm for CRUDAW is presented in the paper. We experimentally compare our technique with a few existing techniques ' namely SABC, GFCM, and KL-FCM-GM based on various evaluation criteria called Silhouette coefficient, Fmeasure, purity and entropy. We also use t-test, confidence interval test and time complexity in evaluating the performance of our technique. Four data sets available from UCI machine learning repository are used in the experiments. Our experimental results indicate that CRUDAW performs significantly better than the existing techniques in producing high quality clusters.
Original language | English |
---|---|
Title of host publication | Conferences in Research and Practice in Information Technology Series |
Subtitle of host publication | AusDM 2012 |
Editors | Peter Christen Peter Christen, Y Zhao, J Li, PJ Kennedy |
Place of Publication | Sydney, NSW |
Publisher | Australian Computer Society Inc |
Pages | 27-42 |
Number of pages | 16 |
Volume | 134 |
ISBN (Electronic) | 9781921770142 |
Publication status | Published - 2013 |
Event | The 10th Australasian Data Mining Conference: AusDM 2012 - Sydney Harbour Marriott Hotel, Sydney, Australia Duration: 05 Dec 2012 → 07 Dec 2012 http://www.wikicfp.com/cfp/servlet/event.showcfp?eventid=23833©ownerid=2 |
Publication series
Name | Conferences in Research and Practice in Information Technology Series |
---|---|
Publisher | Australian Computer Society |
Volume | 134 |
ISSN (Print) | 1445-1336 |
Conference
Conference | The 10th Australasian Data Mining Conference |
---|---|
Country/Territory | Australia |
City | Sydney |
Period | 05/12/12 → 07/12/12 |
Internet address |