Journal of East China Normal University(Natural Sc ›› 2018, Vol. 2018 ›› Issue (5): 56-66.doi: 10.3969/j.issn.1000-5641.2018.05.005

Previous Articles     Next Articles

Implementation of the parallel GroupBy and Aggregation functions in a distributed database system

XU Shi-lei, WEI Xing, JIANG Hong, QIAN Wei-ning, ZHOU Ao-ying   

  1. School of Computer Science and Software Engineering, East China Normal University, Shanghai 200062, China
  • Received:2018-07-04 Online:2018-09-25 Published:2018-09-26

Abstract: With the increase in demand for data statistics and analysis in new Internet applications, data grouping and aggregation have become amongst the most common operations in data analysis applications. This paper analyzes the operating principles of the Aggregation and GroupBy functions commonly used in analytical applications. Based on the disadvantages of sort grouping for general-transactional databases, two kinds of Hash GroupBy implementations are proposed; in addition,a strategy for dynamically determining the number of Hash buckets and Hash GroupBy schemes, based on statistical information, is proposed. Based on the characteristics of distributed clusters, implementation of the Hash GroupBy operator push down is proposed. Experiments have shown that the use of statistical information to dynamically determine the Hash group option improves efficiency.

Key words: OceanBase, GroupBy, Hash, Data distribution

CLC Number: