Abstract
In this dissertation we are concerned with the problem of subspace clustering. That is, clustering data points, based on their underlying subspace structure. This is a complex and challenging problem as the subspace structure is usually unknown and can vary within a single application. This is further compounded by imperfections in the data such as noise and outliers. In general, clustering algorithms are usually heuristic and thus it is hard to prove theoretical correctness. Fortunately two algorithms, Sparse Subspace Clustering (SSC) and Low-Rank Representation (LRR), have been demonstrated to correctly segment data under reasonable conditions. In this dissertation we provide a number of algorithms to address some of the drawbacks and to improve the performance of these two algorithms. First we discuss how to incorporate additional hints in the form of ambient structure to improve clustering accuracy. Second we present a method to fuse multiple observations of a single phenomenon or data point for more accurate and robust clustering. Third we discuss how to extend subspace clustering to non-linear subspaces or non-Euclidean data. Last we discuss how to reduce the computational and memory requirements of SSC, while provably maintaining clustering accuracy. Moreover we provide the community with a number of open source libraries of code for all the algorithms developed for this dissertation and algorithms that we have tested against.
Original language | English |
---|---|
Qualification | Doctor of Philosophy |
Awarding Institution |
|
Award date | 01 Aug 2017 |
Publisher | |
Publication status | Published - 2017 |