Clustering is an important data mining task which is a process of grouping similar records into one cluster and dissimilar records into different clusters. It is used in various fields for knowledge discovery and decision making. There are many existing clustering techniques. However, many of them have a number of limitations such as the requirement of various user inputs (such as the number of clusters) and getting stuck at local optima. It can be difficult for a user to provide the user inputs in advance. There is also room for further improvement of the quality of the clusters produced by the techniques. Since clustering is widely used in many fields it is important to produce clustering techniques that produce better quality clustering results. In this study we propose clustering techniques that produce high quality clusters without requiring any user input on the number of clusters. The proposed techniques produce high quality initial seeds that are then fed into K-Means to produce high quality clusters. We argue that the user should be allowed to assign (if he/she wants to) attribute weights in order to satisfy his/her clustering purpose. While our techniques allow the user to assign weights they also permit the user to perform clustering without any input on the weights. Moreover, we propose a technique that automatically selects attribute weights. Finally we propose a technique called GenClust that does not require any user input and produces better quality clusters than many existing techniques in terms of six cluster evaluation criteria over the 20 datasets that we used in the experiments.
|Qualification||Doctor of Philosophy|
|Award date||01 Mar 2014|
|Place of Publication||Australia|
|Publication status||Published - 2014|