华东师范大学学报(自然科学版)

• 计算机科学 • 上一篇    下一篇

面向分布式数据库的相关子查询优化策略

毛思语, 张利军, 张小芳, 高锦涛, 李战怀   

  1. 西北工业大学 计算机学院, 西安 710129
  • 收稿日期:2016-06-24 出版日期:2016-09-25 发布日期:2016-11-29
  • 通讯作者: 张利军, 男, 讲师, 研究方向为数据库理论, 数据管理技术. E-mail: zhanglijun@nwpu.edu.cn.
  • 基金资助:

    中央高校基本科研业务费专项资金资助(3102015JSJ0004)

Optimization strategies of correlated subquery for distributed database

MAO Si-yu, ZHANG Li-jun, ZHANG Xiao-fang, GAO Jin-tao, LI Zhan-huai   

  1. School of Computer, Northwestern Polytechnical University, Xi’an 710129, China
  • Received:2016-06-24 Online:2016-09-25 Published:2016-11-29

摘要:

子查询是指查询语句作为另一个语句的查询条件出现, 相关子查询是指子查询的查询条件依赖于父查询. 相关子查询要对子查询反复求值, 需要多次访问磁盘, 尤其是在分布式的环境中还会产生大量的通信开销, 导致执行效率低下. 在对现有相关子查询优化策略分析研究的基础上, 综合分布式的特点, 将子查询展开、无用子树切除、聚集函数消除等策略应用于分布式关系数据库系统中, 并在开源分布式关系数据库 OceanBase 中应用这些策略实现对谓词EXISTS的相关子查询的优化. 实验表明这些策略能够明显改善相关子查询的查询性能.

关键词: 分布式数据库, 相关子查询, 子查询优化

Abstract:

A query which occurs in another query as a filter is called subquery, and if the filtering condition of a subquery depends on its parent query, it is called correlated
subquery. Generally, the execution cost of query with correlated subquery is high due to that subquery would be executed multiply, which leads to multiple disk access and extra communications in distributed system. Based on the investigation of the classical optimization strategies of correlated subquery, and according to the characteristics of distributed system, we adopt pulling up subquery, removing useless tree and eliminating aggregation function to optimize correlated subquery in distributed database system. And we implement these strategies in the distributed relational database OceanBase for the correlated subquery predicate EXIST. Experiment results show that these strategies can significantly improve the performance of a correlated subquery.

Key words: distributed database, correlated subquery, subquery optimization