Journal of East China Normal University(Natural Sc ›› 2018, Vol. 2018 ›› Issue (5): 164-171.doi: 10.3969/j.issn.1000-5641.2018.05.014

Previous Articles     Next Articles

A K-Means based clustering algorithm with balanced constraints

TANG Hai-bo1, LIN Yu-ming1, LI You2   

  1. 1. Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin Guangxi 541004, China;
    2. Guangxi Key Laboratory of Automatic Testing and Instrument, Guilin University of Electronic Technology, Guilin Guangxi 541004, China
  • Received:2018-07-04 Online:2018-09-25 Published:2018-09-26

Abstract: Clustering is a type of core data processing technology, which has been applied widely in various applications. Traditional clustering algorithms, however, cannot guarantee the balance of clustering results, which is inconsistent with the needs of real-world applications. To address this problem, this paper proposes a K-Means based clustering algorithm with balanced constraints. The proposed clustering algorithm changes the data point assignment process, in each iteration, to achieve balanced data blocks. The maximum cluster size threshold can be customized to meet different division requirements. The algorithm is simple and efficient-only two parameters need to be specified:the number of clusters and the threshold for the maximum cluster size. A series of experiments carried on six real UCI (University of California Irvine) datasets show that the algorithm proposed in this paper has better clustering performance and efficiency compared to other balanced clustering algorithms.

Key words: balanced constraints, clustering, greedy algorithm, data management

CLC Number: