查询优化器连接顺序评估

doi:10.3969/j.issn.1000-5641.2022.05.005

摘要/Abstract

摘要：

连接顺序选择问题, 即从连接顺序搜索空间中选出性能最优的连接顺序, 是关键的查询优化问题. 然而, 连接顺序的选择存在庞大的搜索空间, 导致其成为难点问题, 优化器往往无法确保找到最佳的连接顺序. 虽然目前存在许多连接顺序选择策略, 但是, 现有的评测基准不适用于评估各种连接顺序选择策略的优劣. 为了有效地评估优化器在连接顺序选择方面的优化效果, 本文基于确定性数据生成方法, 采用适用于不同连接形状的连接模板生成算法和基于结果导向的参数实例化方法, 生成评测场景的数据与负载, 实现了通用的优化器连接顺序选择评估工具. 通过对OceanBase和PostgreSQL进行评测, 表明本文所提出的工具能够全面且有效地评估查询优化器的连接顺序选择功能的性能.

关键词: 分析型数据库, 查询处理, 查询优化, 连接顺序选择

Abstract:

Join order selection, i.e., the determination of the cheapest join order from available alternatives, is one of the most critical tasks in query optimization. The enormous search space of a join order makes it difficult to find an optimal join order in an efficient manner. Although there are many optimization algorithms for join order selection, existing benchmarks are unsuitable for evaluating these join order selection strategies because they cannot configure the depths of the joins or cover all join styles. To effectively evaluate the quality of join order selection algorithms used in an optimizer, a generic evaluation tool for join order selection is implemented in this study. The tool takes the primary key-based deterministic data generation method for portable application scenario migration, a join order sampling algorithm to reduce the investigated join spaces, and a result-guided parameter instantiation algorithm to support a valid query generation. We applied the tool on OceanBase and PostgreSQL, and the experiment results show its effectiveness in evaluating the performance of join order selection in query optimizers in a generic and efficient manner.

Key words: online analytical processing database, query processing, query optimization, join order selection

中图分类号:

TP392

陈婷, 项兆坤, 徐金凯, 张蓉. 查询优化器连接顺序评估[J]. 华东师范大学学报（自然科学版）, 2022, 2022(5): 48-60.

Ting CHEN, Zhaokun XIANG, Jinkai XU, Rong ZHANG. Benchmarking join order selection of query optimizers[J]. Journal of East China Normal University(Natural Science), 2022, 2022(5): 48-60.

图/表 8

图1

图2

图3

表1

表2

图4

图6

图5

参考文献 29

1	VANCE B. Join-order optimization with cartesian products [D]. Beaverton: Oregon Graduate Institute of Science and Technology, 1998.
2	IBARAKI T, KAMEDA T. On the optimal nesting order for computing N-relational joins . ACM Transactions on Database Systems, 1984, 9 (3): 482- 502.
3	NEUMANN T. Query simplification: Graceful degradation for join-order optimization [C]// Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. 2009: 403-414.
4	IOANNIDIS Y E, KANG Y C. Left-deep vs. bushy trees: An analysis of strategy spaces and its implications for query optimization [C]// Proceedings of the 1991 ACM SIGMOD International Conference on Management of Data. 1991: 168-177.
5	SELINGER P G, ASTRAHAN M M, CHAMBERLIN D D, et al. Access path selection in a relational database management system [M]// Readings in Artificial Intelligence and Databases. San Francisco: Morgan Kaufmann, 1989: 511-522.
6	CHANDE S V, SINHA M. Genetic optimization for the join ordering problem of database queries [C]// 2011 Annual IEEE India Conference. IEEE, 2011: 12506652.
7	FEGARAS L. A new heuristic for optimizing large queries [C]// International Conference on Database and Expert Systems Applications. 1998: 726-735.
8	WAAS F, PELLENKOFT A. Join order selection (good enough is easy) [C]// British National Conference on Databases. 2000: 51-67.
9	LANG H, NEUMANN T, KEMPER A, et al. Performance-optimal filtering: Bloom overtakes cuckoo at high throughput. Proceedings of the VLDB Endowment, 2019, 12 (5): 502- 515.
10	MARCUS R, PAPAEMMANOUIL O. Deep reinforcement learning for join order enumeration [C]// Proceedings of the First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management. 2018: 3.
11	KRISHNAN S, YANG Z H, GOLDBERG K, et al. Learning to optimize join queries with deep reinforcement learning [EB/OL]. (2019-01-10)[2022-07-01]. https://arxiv.org/pdf/1808.03196.pdf.
12	TRUMMER I, WANG J X, WEI Z Y, et al. SkinnerDB: Regret-bounded query evaluation via reinforcement learning. ACM Transactions on Database Systems, 2021, 46 (3): 9.
13	YU X, LI G L, CHAI C L, et al. Reinforcement learning with Tree-LSTM for join order selection [C]// 2020 IEEE 36th International Conference on Data Engineering. IEEE, 2020: 1297-1308.
14	WANG X Y, QU C B, WU W Y, et al. Are we ready for learned cardinality estimation?. Proceedings of the VLDB Endowment, 2021, 14 (9): 1640- 1654.
15	O’NEIL P, O’NEIL B, CHEN X D. Star schema benchmark revision 3 [EB/OL]. (2009-06-05)[2022-07-05]. https://www.cs.umb.edu/~poneil/StarSchemaB.PDF.
16	Transaction Processing Performance Council. TPC benchmark^TM H [EB/OL]. (2017-09-21)[2022-07-03]. https://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v3.0.1.pdf.
17	NAMBIAR R O, POESS M. The making of TPC-DS [C]// Proceedings of the VLDB Endowment. 2006: 1049-1058.
18	LEIS V, GUBICHEV A, MIRCHEV A, et al. How good are query optimizers, really?. Proceedings of the VLDB Endowment, 2015, 9 (3): 204- 215.
19	KIPF A, KIPF T, RADKE B, et al. Learned cardinalities: Estimating correlated joins with deep learning [EB/OL]. (2018-12-18) [2022-07-03]. https://arxiv.org/pdf/1809.00677.pdf.
20	YANG Z H, KAMSETTY A, LUAN S F, et al. NeuroCard: One cardinality estimator for all tables. Proceedings of the VLDB Endowment, 2021, 14 (1): 61- 73.
21	ZHU R, WU Z N, HAN Y X, et al. FLAT: Fast, lightweight and accurate method for cardinality estimation. Proceedings of the VLDB Endowment, 2021, 14 (9): 1489- 1502.
22	HAN Y X, WU Z N, WU P Z, et al. Cardinality estimation in DBMS: A comprehensive benchmark evaluation [EB/OL]. (2021-09-15)[2022-07-01]. https://arxiv.org/pdf/2109.05877.pdf.
23	NEGI P, MARCUS R, KIPF A, et al. Flow-Loss: Learning cardinality estimates that matter. Proceedings of the VLDB Endowment, 2021, 14 (11): 2019- 2032.
24	GU Z X, SOLIMAN M A, WAAS F M. Testing the accuracy of query optimizers [C]// Proceedings of the Fifth International Workshop on Testing Database Systems. 2012: 11.
25	QIN Y, SALEM K, GOEL A K. Towards adaptive costing of database access methods [C]// 2007 IEEE 23rd International Conference on Data Engineering Workshop. 2007: 469-477.
26	LI Z, PAPAEMMANOUIL O, CHERNIACK M. OptMark: A toolkit for benchmarking query optimizers [C]// Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 2016: 2155-2160.
27	MI K M, ZHANG C X, QIAN W N, et al. Artemis: An automatic test suite generator for large scale OLAP database [C]// International Symposium on Benchmarking, Measuring and Optimization. 2020: 74-89.
28	项兆坤, 陈婷, 苏仟, 等. 面向 OLAP 数据库查询处理功能的模糊测试工具. 华东师范大学学报 (自然科学版), 2021, (5): 74- 83.
29	SINGH A S, MASUKU M B. Sampling techniques & determination of sample size in applied statistics research: An overview [EB/OL]. (2014-11-15) [2022-06-27]. https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.678.1300&rep=rep1&type=pdf.

表数目/个	只考虑左深树的搜索空间/个	考虑浓密树的搜索空间/个
1	1	1
2	2	2
3	6	12
4	24	120
5	120	1680
6	720	30240
7	5040	665280
8	40320	17297280

表数量/个	链式		星型		树型		环型
表数量/个	OceanBase	PostgreSQL	OceanBase	PostgreSQL	OceanBase	PostgreSQL	OceanBase	PostgreSQL
3	0.59	0.60					0.35	0.40
4	0.41	0.47	0.18	0.14	0.36	0.38	0.20	0.42
5	0.20	0.40	0.23	0.25	0.21	0.32	0.14	0.32
6	0.16	0.46	0.18	0.18	0.22	0.45	0.13	0.40
7	0.12	0.55	0.13	0.16	0.13	0.41	0.15	0.56
8	0.13	0.75	0.12	0.11	0.08	0.74	0.11	0.65
9	0.08	0.80	0.14	0.13	0.12	0.56	0.09	0.84

[1]	简炜, 胡梓锐, 张蓉. OLAP查询基数预估能力评估[J]. 华东师范大学学报（自然科学版）, 2024, 2024(5): 141-151.
[2]	张帅, 胡卉芪, 徐尧强, 周烜. 面向异常检测的时序数据库查询优化[J]. 华东师范大学学报（自然科学版）, 2023, 2023(2): 119-131.
[3]	刘梦男, 许建秋. 基于查询代价的两级轨迹数据划分算法[J]. 华东师范大学学报（自然科学版）, 2022, 2022(5): 184-194.
[4]	项兆坤, 陈婷, 苏仟, 张蓉. 面向OLAP数据库查询处理功能的模糊测试工具[J]. 华东师范大学学报（自然科学版）, 2021, 2021(5): 74-83.
[5]	方创新, 宋浩, 林煜明, 周娅. CPU-GPU异构环境下的大规模商品知识查询处理[J]. 华东师范大学学报（自然科学版）, 2021, 2021(5): 157-168.
[6]	张敬伟, 丁志均, 杨青, 张会兵, 张海涛, 周娅. 异构Redis集群大规模评论数据存储负载均衡设计[J]. 华东师范大学学报(自然科学版), 2017, 2017(5): 20-29.
[7]	毛思语, 张利军, 张小芳, 高锦涛, 李战怀. 面向分布式数据库的相关子查询优化策略[J]. 华东师范大学学报(自然科学版), 2016, 2016(5): 56-66.