Clustering analysis seeks to partition a given dataset into groups or clusters so that the data objects within a cluster are more similar to each other than the objects in different clusters. A very rich literature on clustering analysis has developed over the past three decades. But a crucial question still remains unanswered: how many clusters are contained in the population on earth when only an observed set of samples is available? The goal of this paper is to provide a comprehensive review of approaches on determining the ”correct” number of clusters. In particular, we divide these approaches into three categories: internal measures, external measures, and clustering stability based methods. Then, we introduce several representative examples, and present specific challenges pertinent to each category. Finally, the promising trends are suggested in this field.
Digital Object Identifier (DOI)
Xu, Shuo; Qiao, Xiaodong; Zhu, Lijun; Zhang, Yunliang; Xue, Chunxiang; and Li, Lin
"Reviews on Determining the Number of Clusters,"
Applied Mathematics & Information Sciences: Vol. 10
, Article 28.
Available at: https://dc.naturalspublishing.com/amis/vol10/iss4/28