Date of Award

Fall 2005

Document Type

Thesis

Degree Name

Master of Science in Computer Engineering - (M.S.)

Department

Electrical and Computer Engineering

First Advisor

Atam P. Dhawan

Second Advisor

Constantine N. Manikopoulos

Third Advisor

Edwin Hou

Abstract

Online clustering is of significant interest for real-time data analysis. Generic offline clustering methods such as K-Means, C-Means and others are computationally expensive. The computational burden of these methods increases non-linearly with the size of the data set. In addition these methods usually require a good amount of supervised knowledge yielding a non-unique solution. For real-time data analysis, there is an important tradeoff between accuracy and computational efficiency. An unsupervised one-pass clustering method that efficiently adapts to data distribution and evaluation is proposed. This method, Topology-Based Fuzzy Clustering (TFC), uses the topology of data to discover clusters. TFC uses the method of Growing Neural Gas (GNG) method of creating linked sub-clusters and extends GNG by assigning a fuzzy membership to the sub-clusters, noting the link structure for creating clusters and influencing the learning nodes at each sub-clusters. This also gives a fuzzy estimation of data distribution within each cluster. The computational burden for TFC is proportional to the size of the initial data set and increases linearly with the addition of new data.

As TFC is based on GNG, it is an unsupervised algorithm. A supervised learning method is proposed that can be used in conjunction with TFC, to increases its accuracy with minimum computational burden. This adaptive algorithm is called the Adaptive Topology-Based Fuzzy Clustering (ATFC). In this study, the performance of ATFC and TFC is also evaluated against standard datasets.

Share

COinS