华东师范大学学报(自然科学版) ›› 2014, Vol. 2014 ›› Issue (6): 73-80.doi: 10.3969/j.issn.10005641.2014.06.011

• 地理学 河口海岸学 • 上一篇    下一篇

自适应确定K-means算法的聚类数:以遥感图像聚类为例

袁周米琪1, 周坚华2   

  1. 1.华东师范大学 地理系,上海 200241 2. 华东师范大学 地理信息科学教育部重点实验室,上海 200241
  • 出版日期:2014-11-25 发布日期:2015-02-07
  • 通讯作者: 周坚华,女,副教授,硕士生导师,研究方向为遥感图像分析和生态遥感 E-mail:jhzhou@geo.ecnu.edu.cn
  • 作者简介:第一作者:袁周米琪,女,本科生. E-mail: miki_yz@126.com.
  • 基金资助:

    国家自然科学基金项目(41071275)资助;国家自然科学基金:国家理科基地科研训练及科研能力提高项目(J1310028)资助

Adaptively determining clustering number of K-means: A case study on the clustering from remotely sensed imagery

YUAN  Zhou-Mi-Qi1, ZHOU  Jian-Hua2   

  1. 1. Department of Geography, East China Normal University, Shanghai 200241, China; 2. Key Laboratory of Geographic Information Science, Ministry of Education, East China Normal University, Shanghai 200241, China
  • Online:2014-11-25 Published:2015-02-07

摘要: 聚类数直接关系到聚类算法的聚类质量,但在Kmeans等经典聚类算法中,对于聚类数的确定目前尚无合适的理论,一般凭经验或试凑指定. 这样不仅需要较多的人机交互和耗费较多的试算开销,并且由于最优聚类数常常难以获得,而影响聚类结果的精度. 本文提出一种自适应逼近最佳聚类数的算法ADNC(adaptively determining the number of clusters),可以通过自适应方法逼近最优聚类数. 逼近是一个反复迭代聚类的过程. 每迭代一次,对输出的聚类评估分类空间各图像特征值(输入向量各分量)标准差的平均误差,并构成多特征综合误差;根据梯度下降原理调整聚类数,即在使多特征综合误差逐步减小的同时,逼近最优聚类数. 这个最优聚类数一般出现在多特征综合误差开始震荡之前最邻近的位置. 以这个聚类数做K-means聚类,可以使同类间特征值异质性降到最小,取得理想的聚类结果. 与此同时,还提出了较不适宜聚类数的概念,即可能使聚类误差最大的聚类数. 实验表明,最适宜和较不适宜的聚类数两个概念对于改善聚类精度都有实践意义. 

关键词: K-means, 聚类数, 自适应

Abstract: A new algorithm, named adaptively determining the number of clusters (ADNC), has been proposed. By using ADNC, the optimal clustering number for Kmeans clustering, usually determined by human conjecture or manual try, can now be determined by computer in a self-adaptive way.ADNC typically is an iterative process including the adjustment of clustering number and the assessment of average standard deviation during the iteration. The adjustment will refer the assessment following the principle of gradient descent, namely, to get a better clustering number and to reduce the deviation in the same time. The optimal clustering number most likely locates at the point just before the deviation begins to oscillate. The clustering results will be perfectly reasonable with the clustering number decided by ADNC because the feature heterogeneity in a class will be reduced to the minimum. By the way, the concept of inappropriate clustering number, by using which the deviation may increase to the maximum, has been proposed as a try. It has been revealed by experiment that both the optimal and the inappropriate clustering numbers have practical significance to improve the clustering accuracy.

Key words: K-means, clustering number, self-adaptation

中图分类号: