Journal of East China Normal University(Natural Sc ›› 2017, Vol. 2017 ›› Issue (5): 40-51.doi: 10.3969/j.issn.1000-5641.2017.05.005

• Data Management • Previous Articles     Next Articles

An outer join algorithm based on Cuckoo filter

YU Yang, ZHOU Min-qi, FANG Zhu-he   

  1. School of Data Science and Engineering, East China Normal University, Shanghai 200062, China
  • Received:2017-06-19 Online:2017-09-25 Published:2017-09-25

Abstract: In recent years, due to the development of the Internet, data size has been increased rapidly. At the era of big data, the analysis efficiency of distributed database system needs to be optimized urgently. Nevertheless, the join operation is the main performance bottleneck of a distributed database system. Join operations are mainly divided into inner join and outer join, and outer join is widely used in the business situations. Distributed join algorithm involves a large amount of network transmission, which affects the performance of the system severely. Although there are some studies in the literature of inner join optimized, these optimization methods cannot be directly applied to outer join. This paper proposes a distributed outer join algorithm based on Cuckoo filter. By building a Cuckoo filter with replication subdivision technology for data filtering and allocation, it reduces the amount of data transmission and improves the degree of parallelism accordingly. Finally, it improves the query performance. We implement this algorithm in Ginkgo. Based on the given extensive experimental verification, the algorithm largely improves the efficiency of the outer join.

Key words: Cuckoo filter, Ginkgo, outer join

CLC Number: