区块链系统与数据管理

查询优化器连接顺序评估

  • 陈婷 ,
  • 项兆坤 ,
  • 徐金凯 ,
  • 张蓉
展开
  • 华东师范大学 数据科学与工程学院, 上海 200062

收稿日期: 2022-07-11

  网络出版日期: 2022-09-26

基金资助

国家科学自然基金 (62072179); CCF-华为数据库创新研究计划

Benchmarking join order selection of query optimizers

  • Ting CHEN ,
  • Zhaokun XIANG ,
  • Jinkai XU ,
  • Rong ZHANG
Expand
  • School of Data Science and Engineering, East China Normal University, Shanghai 200062, China

Received date: 2022-07-11

  Online published: 2022-09-26

摘要

连接顺序选择问题, 即从连接顺序搜索空间中选出性能最优的连接顺序, 是关键的查询优化问题. 然而, 连接顺序的选择存在庞大的搜索空间, 导致其成为难点问题, 优化器往往无法确保找到最佳的连接顺序. 虽然目前存在许多连接顺序选择策略, 但是, 现有的评测基准不适用于评估各种连接顺序选择策略的优劣. 为了有效地评估优化器在连接顺序选择方面的优化效果, 本文基于确定性数据生成方法, 采用适用于不同连接形状的连接模板生成算法和基于结果导向的参数实例化方法, 生成评测场景的数据与负载, 实现了通用的优化器连接顺序选择评估工具. 通过对OceanBase和PostgreSQL进行评测, 表明本文所提出的工具能够全面且有效地评估查询优化器的连接顺序选择功能的性能.

本文引用格式

陈婷 , 项兆坤 , 徐金凯 , 张蓉 . 查询优化器连接顺序评估[J]. 华东师范大学学报(自然科学版), 2022 , 2022(5) : 48 -60 . DOI: 10.3969/j.issn.1000-5641.2022.05.005

Abstract

Join order selection, i.e., the determination of the cheapest join order from available alternatives, is one of the most critical tasks in query optimization. The enormous search space of a join order makes it difficult to find an optimal join order in an efficient manner. Although there are many optimization algorithms for join order selection, existing benchmarks are unsuitable for evaluating these join order selection strategies because they cannot configure the depths of the joins or cover all join styles. To effectively evaluate the quality of join order selection algorithms used in an optimizer, a generic evaluation tool for join order selection is implemented in this study. The tool takes the primary key-based deterministic data generation method for portable application scenario migration, a join order sampling algorithm to reduce the investigated join spaces, and a result-guided parameter instantiation algorithm to support a valid query generation. We applied the tool on OceanBase and PostgreSQL, and the experiment results show its effectiveness in evaluating the performance of join order selection in query optimizers in a generic and efficient manner.

参考文献

1 VANCE B. Join-order optimization with cartesian products [D]. Beaverton: Oregon Graduate Institute of Science and Technology, 1998.
2 IBARAKI T, KAMEDA T. On the optimal nesting order for computing N-relational joins . ACM Transactions on Database Systems, 1984, 9 (3): 482- 502.
3 NEUMANN T. Query simplification: Graceful degradation for join-order optimization [C]// Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. 2009: 403-414.
4 IOANNIDIS Y E, KANG Y C. Left-deep vs. bushy trees: An analysis of strategy spaces and its implications for query optimization [C]// Proceedings of the 1991 ACM SIGMOD International Conference on Management of Data. 1991: 168-177.
5 SELINGER P G, ASTRAHAN M M, CHAMBERLIN D D, et al. Access path selection in a relational database management system [M]// Readings in Artificial Intelligence and Databases. San Francisco: Morgan Kaufmann, 1989: 511-522.
6 CHANDE S V, SINHA M. Genetic optimization for the join ordering problem of database queries [C]// 2011 Annual IEEE India Conference. IEEE, 2011: 12506652.
7 FEGARAS L. A new heuristic for optimizing large queries [C]// International Conference on Database and Expert Systems Applications. 1998: 726-735.
8 WAAS F, PELLENKOFT A. Join order selection (good enough is easy) [C]// British National Conference on Databases. 2000: 51-67.
9 LANG H, NEUMANN T, KEMPER A, et al. Performance-optimal filtering: Bloom overtakes cuckoo at high throughput. Proceedings of the VLDB Endowment, 2019, 12 (5): 502- 515.
10 MARCUS R, PAPAEMMANOUIL O. Deep reinforcement learning for join order enumeration [C]// Proceedings of the First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management. 2018: 3.
11 KRISHNAN S, YANG Z H, GOLDBERG K, et al. Learning to optimize join queries with deep reinforcement learning [EB/OL]. (2019-01-10)[2022-07-01]. https://arxiv.org/pdf/1808.03196.pdf.
12 TRUMMER I, WANG J X, WEI Z Y, et al. SkinnerDB: Regret-bounded query evaluation via reinforcement learning. ACM Transactions on Database Systems, 2021, 46 (3): 9.
13 YU X, LI G L, CHAI C L, et al. Reinforcement learning with Tree-LSTM for join order selection [C]// 2020 IEEE 36th International Conference on Data Engineering. IEEE, 2020: 1297-1308.
14 WANG X Y, QU C B, WU W Y, et al. Are we ready for learned cardinality estimation?. Proceedings of the VLDB Endowment, 2021, 14 (9): 1640- 1654.
15 O’NEIL P, O’NEIL B, CHEN X D. Star schema benchmark revision 3 [EB/OL]. (2009-06-05)[2022-07-05]. https://www.cs.umb.edu/~poneil/StarSchemaB.PDF.
16 Transaction Processing Performance Council. TPC benchmarkTM H [EB/OL]. (2017-09-21)[2022-07-03]. https://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v3.0.1.pdf.
17 NAMBIAR R O, POESS M. The making of TPC-DS [C]// Proceedings of the VLDB Endowment. 2006: 1049-1058.
18 LEIS V, GUBICHEV A, MIRCHEV A, et al. How good are query optimizers, really?. Proceedings of the VLDB Endowment, 2015, 9 (3): 204- 215.
19 KIPF A, KIPF T, RADKE B, et al. Learned cardinalities: Estimating correlated joins with deep learning [EB/OL]. (2018-12-18) [2022-07-03]. https://arxiv.org/pdf/1809.00677.pdf.
20 YANG Z H, KAMSETTY A, LUAN S F, et al. NeuroCard: One cardinality estimator for all tables. Proceedings of the VLDB Endowment, 2021, 14 (1): 61- 73.
21 ZHU R, WU Z N, HAN Y X, et al. FLAT: Fast, lightweight and accurate method for cardinality estimation. Proceedings of the VLDB Endowment, 2021, 14 (9): 1489- 1502.
22 HAN Y X, WU Z N, WU P Z, et al. Cardinality estimation in DBMS: A comprehensive benchmark evaluation [EB/OL]. (2021-09-15)[2022-07-01]. https://arxiv.org/pdf/2109.05877.pdf.
23 NEGI P, MARCUS R, KIPF A, et al. Flow-Loss: Learning cardinality estimates that matter. Proceedings of the VLDB Endowment, 2021, 14 (11): 2019- 2032.
24 GU Z X, SOLIMAN M A, WAAS F M. Testing the accuracy of query optimizers [C]// Proceedings of the Fifth International Workshop on Testing Database Systems. 2012: 11.
25 QIN Y, SALEM K, GOEL A K. Towards adaptive costing of database access methods [C]// 2007 IEEE 23rd International Conference on Data Engineering Workshop. 2007: 469-477.
26 LI Z, PAPAEMMANOUIL O, CHERNIACK M. OptMark: A toolkit for benchmarking query optimizers [C]// Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 2016: 2155-2160.
27 MI K M, ZHANG C X, QIAN W N, et al. Artemis: An automatic test suite generator for large scale OLAP database [C]// International Symposium on Benchmarking, Measuring and Optimization. 2020: 74-89.
28 项兆坤, 陈婷, 苏仟, 等. 面向 OLAP 数据库查询处理功能的模糊测试工具. 华东师范大学学报 (自然科学版), 2021, (5): 74- 83.
29 SINGH A S, MASUKU M B. Sampling techniques & determination of sample size in applied statistics research: An overview [EB/OL]. (2014-11-15) [2022-06-27]. https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.678.1300&rep=rep1&type=pdf.
文章导航

/