Automatic Selection of High Quality Initial Seeds for Generating High Quality Clusters without requiring any User Inputs

Research output: ThesisDoctoral Thesis

81 Downloads (Pure)

Abstract

Clustering is an important data mining task which is a process of grouping similar records into one cluster and dissimilar records into different clusters. It is used in various fields for knowledge discovery and decision making. There are many existing clustering techniques. However, many of them have a number of limitations such as the requirement of various user inputs (such as the number of clusters) and getting stuck at local optima. It can be difficult for a user to provide the user inputs in advance. There is also room for further improvement of the quality of the clusters produced by the techniques. Since clustering is widely used in many fields it is important to produce clustering techniques that produce better quality clustering results. In this study we propose clustering techniques that produce high quality clusters without requiring any user input on the number of clusters. The proposed techniques produce high quality initial seeds that are then fed into K-Means to produce high quality clusters. We argue that the user should be allowed to assign (if he/she wants to) attribute weights in order to satisfy his/her clustering purpose. While our techniques allow the user to assign weights they also permit the user to perform clustering without any input on the weights. Moreover, we propose a technique that automatically selects attribute weights. Finally we propose a technique called GenClust that does not require any user input and produces better quality clusters than many existing techniques in terms of six cluster evaluation criteria over the 20 datasets that we used in the experiments.
Original languageEnglish
QualificationDoctor of Philosophy
Awarding Institution
  • Charles Sturt University
Supervisors/Advisors
  • Islam, Zahid, Principal Supervisor
  • Bossomaier, Terry, Co-Supervisor
  • Zia, Tanveer, Co-Supervisor
Award date01 Mar 2014
Place of PublicationAustralia
Publisher
Publication statusPublished - 2014

Fingerprint

Data mining
Decision making
Experiments

Cite this

@phdthesis{2dc5d93953da436284e3a636e3dce739,
title = "Automatic Selection of High Quality Initial Seeds for Generating High Quality Clusters without requiring any User Inputs",
abstract = "Clustering is an important data mining task which is a process of grouping similar records into one cluster and dissimilar records into different clusters. It is used in various fields for knowledge discovery and decision making. There are many existing clustering techniques. However, many of them have a number of limitations such as the requirement of various user inputs (such as the number of clusters) and getting stuck at local optima. It can be difficult for a user to provide the user inputs in advance. There is also room for further improvement of the quality of the clusters produced by the techniques. Since clustering is widely used in many fields it is important to produce clustering techniques that produce better quality clustering results. In this study we propose clustering techniques that produce high quality clusters without requiring any user input on the number of clusters. The proposed techniques produce high quality initial seeds that are then fed into K-Means to produce high quality clusters. We argue that the user should be allowed to assign (if he/she wants to) attribute weights in order to satisfy his/her clustering purpose. While our techniques allow the user to assign weights they also permit the user to perform clustering without any input on the weights. Moreover, we propose a technique that automatically selects attribute weights. Finally we propose a technique called GenClust that does not require any user input and produces better quality clusters than many existing techniques in terms of six cluster evaluation criteria over the 20 datasets that we used in the experiments.",
author = "Rahman, {Md Anisur}",
year = "2014",
language = "English",
publisher = "Charles Sturt University",
address = "Australia",
school = "Charles Sturt University",

}

Automatic Selection of High Quality Initial Seeds for Generating High Quality Clusters without requiring any User Inputs. / Rahman, Md Anisur.

Australia : Charles Sturt University, 2014. 326 p.

Research output: ThesisDoctoral Thesis

TY - THES

T1 - Automatic Selection of High Quality Initial Seeds for Generating High Quality Clusters without requiring any User Inputs

AU - Rahman, Md Anisur

PY - 2014

Y1 - 2014

N2 - Clustering is an important data mining task which is a process of grouping similar records into one cluster and dissimilar records into different clusters. It is used in various fields for knowledge discovery and decision making. There are many existing clustering techniques. However, many of them have a number of limitations such as the requirement of various user inputs (such as the number of clusters) and getting stuck at local optima. It can be difficult for a user to provide the user inputs in advance. There is also room for further improvement of the quality of the clusters produced by the techniques. Since clustering is widely used in many fields it is important to produce clustering techniques that produce better quality clustering results. In this study we propose clustering techniques that produce high quality clusters without requiring any user input on the number of clusters. The proposed techniques produce high quality initial seeds that are then fed into K-Means to produce high quality clusters. We argue that the user should be allowed to assign (if he/she wants to) attribute weights in order to satisfy his/her clustering purpose. While our techniques allow the user to assign weights they also permit the user to perform clustering without any input on the weights. Moreover, we propose a technique that automatically selects attribute weights. Finally we propose a technique called GenClust that does not require any user input and produces better quality clusters than many existing techniques in terms of six cluster evaluation criteria over the 20 datasets that we used in the experiments.

AB - Clustering is an important data mining task which is a process of grouping similar records into one cluster and dissimilar records into different clusters. It is used in various fields for knowledge discovery and decision making. There are many existing clustering techniques. However, many of them have a number of limitations such as the requirement of various user inputs (such as the number of clusters) and getting stuck at local optima. It can be difficult for a user to provide the user inputs in advance. There is also room for further improvement of the quality of the clusters produced by the techniques. Since clustering is widely used in many fields it is important to produce clustering techniques that produce better quality clustering results. In this study we propose clustering techniques that produce high quality clusters without requiring any user input on the number of clusters. The proposed techniques produce high quality initial seeds that are then fed into K-Means to produce high quality clusters. We argue that the user should be allowed to assign (if he/she wants to) attribute weights in order to satisfy his/her clustering purpose. While our techniques allow the user to assign weights they also permit the user to perform clustering without any input on the weights. Moreover, we propose a technique that automatically selects attribute weights. Finally we propose a technique called GenClust that does not require any user input and produces better quality clusters than many existing techniques in terms of six cluster evaluation criteria over the 20 datasets that we used in the experiments.

M3 - Doctoral Thesis

PB - Charles Sturt University

CY - Australia

ER -