A K-Means based clustering algorithm with balanced constraints

TANG Hai-bo; LIN Yu-ming; LI You

doi:10.3969/j.issn.1000-5641.2018.05.014

Journal of East China Normal University(Natural Science) >

2018 , Vol. 2018 >Issue 5: 164 - 171

DOI: https://doi.org/10.3969/j.issn.1000-5641.2018.05.014

A K-Means based clustering algorithm with balanced constraints

TANG Hai-bo ,
LIN Yu-ming ,
LI You

Expand

1. Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin Guangxi 541004, China;
2. Guangxi Key Laboratory of Automatic Testing and Instrument, Guilin University of Electronic Technology, Guilin Guangxi 541004, China

Received date: 2018-07-04

Online published: 2018-09-26

Fold

Abstract

Clustering is a type of core data processing technology, which has been applied widely in various applications. Traditional clustering algorithms, however, cannot guarantee the balance of clustering results, which is inconsistent with the needs of real-world applications. To address this problem, this paper proposes a K-Means based clustering algorithm with balanced constraints. The proposed clustering algorithm changes the data point assignment process, in each iteration, to achieve balanced data blocks. The maximum cluster size threshold can be customized to meet different division requirements. The algorithm is simple and efficient-only two parameters need to be specified:the number of clusters and the threshold for the maximum cluster size. A series of experiments carried on six real UCI (University of California Irvine) datasets show that the algorithm proposed in this paper has better clustering performance and efficiency compared to other balanced clustering algorithms.

Key words： balanced constraints; clustering; greedy algorithm; data management

Cite this article

TANG Hai-bo , LIN Yu-ming , LI You . A K-Means based clustering algorithm with balanced constraints[J]. Journal of East China Normal University(Natural Science), 2018 , 2018(5) : 164 -171 . DOI: 10.3969/j.issn.1000-5641.2018.05.014

References

[1] AGGARWAL C C, REDDY C K. Data Clustering:Algorithms and Applications[M]. Raton:Chapman & Hall/CRC, 2013.
[2] VISWANATH P, PINKESH R. l-DBSCAN:A fast hybrid density based clustering method[C]//Proceedings of the 18th International Conference on Pattern Recognition. NJ:IEEE, 2006:912-915.
[3] PEIZHUANG W. Pattern recognition with fuzzy objective function algorithms (James C. Bezdek)[J]. SIAM Review, 1983, 25(3):442-442.
[4] MALINEN M I, FRÄNTI P. Balanced k-means for clustering[C]//Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR). Berlin:Springer, 2014:32-41.
[5] LIU H, HAN J, NIE F, et al. Balanced clustering with least square regression[C]//Proceedings of the 31th AAAI Conference on Artificial Intelligence. CA:AAAI, 2017:2231-2237.
[6] BRADLEY P S, BENNETT K P, DEMIRIZ A. Constrained k-means clustering[J]. Microsoft Research Redmond, 2000, 59(1):1-34.
[7] KUHN H W. The Hungarian method for the assignment problem[J]. Naval Research Logistics, 2005, 52(1):7-21.
[8] WU B, ZHOU Y, YUAN P, et al. Semstore:A semantic-preserving distributed rdf triple store[C]//Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. NY:ACM, 2014:509-518.
[9] BANERJEE A, GHOSH J. On scaling up balanced clustering algorithms[C]//Proceedings of the 2002 SIAM International Conference on Data Mining. PA:Society for Industrial and Applied Mathematics, 2002:333-349.
[10] BANERJEE A, GHOSH J. Frequency-sensitive competitive learning for scalable balanced clustering on highdimensional hyperspheres[J]. IEEE Transactions on Neural Networks, 2004, 15(3):702-719.
[11] NG A Y, JORDAN M I, WEISS Y. On spectral clustering:Analysis and an algorithm[C]//Proceedings of the neural information processing systems. Massachusetts:MIT Press, 2002:849-856.
[12] CAI D, HE X, HAN J. Document clustering using locality preserving indexing[J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(12):1624-1637.
[13] STREHL A, CHOSH J. Knowledge reuse framework for combining multiple partitions[J]. Journal of Machine learning Research, 2002, 33(3):583-617.

Options

Outlines

模态框（Modal）标题

Abstract

Cite this article

References