华东师范大学学报(自然科学版) ›› 2017, Vol. 2017 ›› Issue (5): 40-51.doi: 10.3969/j.issn.1000-5641.2017.05.005

• 数据管理 • 上一篇    下一篇

基于布谷鸟过滤器的外连接算法

于洋, 周敏奇, 方祝和   

  1. 华东师范大学 数据科学与工程学院, 上海 200062
  • 收稿日期:2017-06-19 出版日期:2017-09-25 发布日期:2017-09-25
  • 通讯作者: 周敏奇,男,副教授,研究方向为对等计算、云计算、分布式数据管理、内存数据库管理系统.E-mail:mqzhou@sei.ecnu.edu.cn E-mail:mqzhou@sei.ecnu.edu.cn
  • 作者简介:于洋,男,硕士研究生,研究方向为内存数据库系统.E-mail:youngfish93@hotmail.com
  • 基金资助:
    国家自然科学基金重点项目(61332006)

An outer join algorithm based on Cuckoo filter

YU Yang, ZHOU Min-qi, FANG Zhu-he   

  1. School of Data Science and Engineering, East China Normal University, Shanghai 200062, China
  • Received:2017-06-19 Online:2017-09-25 Published:2017-09-25

摘要: 近十几年,由于互联网的发展异常迅猛,数据规模不断增加,分布式数据库的分析效率亟待优化,其中连接操作更是分布式数据库的主要性能瓶颈.外连接在商业中运用非常广泛.分布式外连接算法涉及到大量的网络传输,严重影响系统性能,虽然有一些研究针对内连接进行了优化,但这些优化方法并不能直接应用于外连接.文章中基于Cuckoofilter(布谷鸟过滤器)的分布式外连接算法,通过构建Cuckoofilter对数据进行筛选和分配,减少数据传输量的同时,提高执行的并行度,使得查询性能得到提升.通过在Ginkgo上实现该算法,并加以充分实验,验证得出该算法提高了分布式外连接操作的效率.

关键词: Cuckoo filter, Ginkgo, 外连接

Abstract: In recent years, due to the development of the Internet, data size has been increased rapidly. At the era of big data, the analysis efficiency of distributed database system needs to be optimized urgently. Nevertheless, the join operation is the main performance bottleneck of a distributed database system. Join operations are mainly divided into inner join and outer join, and outer join is widely used in the business situations. Distributed join algorithm involves a large amount of network transmission, which affects the performance of the system severely. Although there are some studies in the literature of inner join optimized, these optimization methods cannot be directly applied to outer join. This paper proposes a distributed outer join algorithm based on Cuckoo filter. By building a Cuckoo filter with replication subdivision technology for data filtering and allocation, it reduces the amount of data transmission and improves the degree of parallelism accordingly. Finally, it improves the query performance. We implement this algorithm in Ginkgo. Based on the given extensive experimental verification, the algorithm largely improves the efficiency of the outer join.

Key words: Cuckoo filter, Ginkgo, outer join

中图分类号: