计算机科学

面向分布式数据库的相关子查询优化策略

  • 毛思语 ,
  • 张利军 ,
  • 张小芳 ,
  • 高锦涛 ,
  • 李战怀
展开
  • 西北工业大学 计算机学院, 西安 710129

收稿日期: 2016-06-24

  网络出版日期: 2016-11-29

基金资助

中央高校基本科研业务费专项资金资助(3102015JSJ0004)

Optimization strategies of correlated subquery for distributed database

  • MAO Si-yu ,
  • ZHANG Li-jun ,
  • ZHANG Xiao-fang ,
  • GAO Jin-tao ,
  • LI Zhan-huai
Expand
  • School of Computer, Northwestern Polytechnical University, Xi’an 710129, China

Received date: 2016-06-24

  Online published: 2016-11-29

摘要

子查询是指查询语句作为另一个语句的查询条件出现, 相关子查询是指子查询的查询条件依赖于父查询. 相关子查询要对子查询反复求值, 需要多次访问磁盘, 尤其是在分布式的环境中还会产生大量的通信开销, 导致执行效率低下. 在对现有相关子查询优化策略分析研究的基础上, 综合分布式的特点, 将子查询展开、无用子树切除、聚集函数消除等策略应用于分布式关系数据库系统中, 并在开源分布式关系数据库 OceanBase 中应用这些策略实现对谓词EXISTS的相关子查询的优化. 实验表明这些策略能够明显改善相关子查询的查询性能.

本文引用格式

毛思语 , 张利军 , 张小芳 , 高锦涛 , 李战怀 . 面向分布式数据库的相关子查询优化策略[J]. 华东师范大学学报(自然科学版), 2016 , 2016(5) : 56 -66 . DOI: 10.3969/j.issn.1000-5641.2016.05.007

Abstract

A query which occurs in another query as a filter is called subquery, and if the filtering condition of a subquery depends on its parent query, it is called correlated
subquery. Generally, the execution cost of query with correlated subquery is high due to that subquery would be executed multiply, which leads to multiple disk access and extra communications in distributed system. Based on the investigation of the classical optimization strategies of correlated subquery, and according to the characteristics of distributed system, we adopt pulling up subquery, removing useless tree and eliminating aggregation function to optimize correlated subquery in distributed database system. And we implement these strategies in the distributed relational database OceanBase for the correlated subquery predicate EXIST. Experiment results show that these strategies can significantly improve the performance of a correlated subquery.

参考文献

[ 1 ] KIM W. On optimizing an SQL-like nested query[J]. ACM Transactions on Database Systems (TODS), 1982, 7(3): 443-469.
[ 2 ] 萨师煊, 王珊. 数据库系统概论[M]. 北京: 高等教育出版社, 2000.
[ 3 ] 李海翔. 数据库查询优化器的艺术[M]. 北京: 机械工业出版社, 2014.
[ 4 ] SILBERSCHATZ A, KORTH H F, SUDARSHAN S. Database System Concepts[M]. New York: McGraw-Hill, 1997.
[ 5 ] CAO B. Optimization of complex nested queries in relational databases[C]//Proceedings of 22nd International Conference on Data Engineering Workshops. [S.l.]: IEEE, 2006: X137.
[ 6 ] RAO J, ROSS K A. Reusing invariants: A new strategy for correlated queries [C]//SIGMOD, 1998, 27(2): 37-48.
[ 7 ] BELLAMKONDA S, AHMED R, WITKOWSKI A, et al. Enhanced subquery optimizations in oracle[C]//Proceedings of the VLDB Endowment. Germany: DBLP, 2009, 2(2): 1366-1377.
[ 8 ] 彭智勇. PostgreSQL数据库内核分析[M]. 北京: 机械工业出版社, 2012.
[ 9 ] KHAN M, KHAN M N A. Exploring query optimization techniques in relational databases[J]. International Journal of Database Theory & Application, 2013, 6(3): 11-20.
[10] 魏士伟, 黄文明, 康业娜, 等. 分布式数据库中基于半连接的查询优化算法研究[J].计算机应用, 2007, 27(B06): 34-36.
[11] SHIOI T, HATANO K. Query processing optimization using disk-based row-store and column-store[C]//Proceedings of the 17th International Conference on Information Integration and Web-based Applications & Services. New York: ACM, 2015: 69.
[12] CHEN G, WU Y, LIU J, et al. Optimization of sub-query processing in distributed data integration systems[J]. Journal of Network and Computer Applications, 2011, 34(4): 1035-1042.
[13] GALINDO-LEGARIA C, JOSHI M. Orthogonal optimization of subqueries and aggregation[C]//ACM SIGMOD Record. New York: ACM, 2001, 30(2): 571-581.

文章导航

/