Clustering data stream is an active research area that has recently emerged to discover knowledge from large amounts of continuously generated data. Several clustering algorithms have been proposed for static data. Nevertheless, data stream clustering imposes several challenges to be addressed, such as dealing with dynamic data that arrive in an online fashion, capable of performing fast and incremental processing of data objects, suitably addressing time and memory limitations, and how to handle the evolving patterns that are important characteristics of streaming data with dynamic distributions. In this paper, we propose an algorithm that extends Affinity Propagation (AP) to handle evolving data steam with dynamic distribution. Affinity Propagation was proposed as a clustering algorithm extracted a set of exemplars that best represent the dataset using a message passing method. We present a semisupervised clustering technique (SSAP) that incorporates labeled exemplars into the AP algorithm to deal with changes in the data distribution, which requires the stream model to be updated as soon as possible. Experimental results with state-of-the-art data stream clustering methods demonstrate the effectiveness and efficiency of the proposed method.
Atwa, Walid and Li, Kan
"Affinity Propagation-based Clustering For Data Streams,"
Applied Mathematics & Information Sciences: Vol. 09
, Article 58.
Available at: https://dc.naturalspublishing.com/amis/vol09/iss4/58