教育数据管理

OLAP查询基数预估能力评估

  • 简炜 ,
  • 胡梓锐 ,
  • 张蓉
展开
  • 华东师范大学 数据科学与工程学院, 上海 200062

收稿日期: 2024-07-04

  录用日期: 2024-07-28

  网络出版日期: 2024-09-23

基金资助

国家自然科学基金(62072179); OceanBase联合实验室项目

Online analytical processing query cardinality estimation capability evaluation

  • Wei JIAN ,
  • Zirui HU ,
  • Rong ZHANG
Expand
  • School of Data Science and Engineering, East China Normal University, Shanghai 200062, China

Received date: 2024-07-04

  Accepted date: 2024-07-28

  Online published: 2024-09-23

摘要

查询优化可以显著提升联机分析处理 (online analytical processing, OLAP) 数据库系统对海量教育数据的分析效率, 为智能教学系统提供快速、精准的数据支持. 优化器主要包含基数预估、空间枚举和代价模型3个模块. 其中, 基数预估决定代价模型的结果, 并指导查询计划的选择. 因此, 优化器的基数预估模块评估对OLAP数据库系统优化起到积极的推动作用. 本文设计并实现了一套基于主键驱动的、构造多样化数据分布和数据关联关系的有效负载生成工具, 包含自定义关联关系的数据生成技术、基于有限状态机的负载模版生成技术和目标基数驱动的参数实例化技术. 并在3个数据库OceanBase、TiDB和PostgreSQL上进行了实验, 分析了3个数据库优化器存在的问题, 并给出了建议.

本文引用格式

简炜 , 胡梓锐 , 张蓉 . OLAP查询基数预估能力评估[J]. 华东师范大学学报(自然科学版), 2024 , 2024(5) : 141 -151 . DOI: 10.3969/j.issn.1000-5641.2024.05.013

Abstract

Query optimization can significantly enhance the analysis efficiency of online analytical processing (OLAP) database systems for massive educational data, providing fast and accurate data support for intelligent educational systems. The optimizer mainly consists of three modules: cardinality estimation, space enumeration, and cost models. Specifically, cardinality estimation determines the results of the cost model and guides the selection of query plans. Therefore, the evaluation of the cardinality estimation module of the optimizer plays a crucial role in the optimization of OLAP database systems. This study designs and implements an effective workload generation tool based on primary key-driven diversified data distribution and data relationship construction. The tool includes data generation technology with custom relationships, workload template generation technology based on finite state machines, and parameter instantiation technology driven by target cardinality. Experiments were conducted on three databases: OceanBase, TiDB, and PostgreSQL, analyzing the issues of their optimizers and providing suggestions.

参考文献

1 KOCAKO? I D, ERDEM S.. Business intelligence applications in retail business: OLAP, data mining & reporting services. Journal of Information & Knowledge Management, 2010, 9 (2): 171- 181.
2 LAAK K J, ARU J. AI and personalized learning: Bridging the gap with modern educational goals [EB/OL]. (2024-04-03)[2024-07-30]. https://doi.org/10.48550/arXiv.2404.02798.
3 ALRAWASHDEH G S, FYFFE S, AZEVEDO R F L, et al.. Exploring the impact of personalized and adaptive learning technologies on reading literacy: A global meta-analysis. Educational Research Review, 2024, 42, 100587.
4 LIU Y. The application of OLAP and Web technology in the evaluation of higher educational quality and the design of management system [C]// International Conference on Educational Technology and Administration. Cham: Springer International Publishing, 2022: 223-232.
5 PLATTNER H. A common database approach for OLTP and OLAP using an in-memory column database [C]// Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. 2009: 1-2.
6 LEHMAN T J, CAREY M J. Query processing in main memory database management systems [C]// Proceedings of the 1986 ACM SIGMOD International Conference on Management of Data. 1986: 239-250.
7 LAN H, BAO Z, PENG Y.. A survey on advancing the dbms query optimizer: Cardinality estimation, cost model, and plan enumeration. Data Science and Engineering, 2021, (6): 86- 101.
8 SELINGER P G, ASTRAHAN M M, CHAMBERLIN D D, et al. Access path selection in a relational database management system [C]// Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data. 1979: 23-34.
9 VANCE B, MAIER D.. Rapid bushy join-order optimization with cartesian products. ACM SIGMOD Record, 1996, 25 (2): 35- 46.
10 MOERKOTTE G, NEUMANN T. Analysis of two existing and one new dynamic programming algorithm for the generation of optimal bushy join trees without cross products [C]// Proceedings of the 32nd International Conference on Very Large Data Bases. 2006: 930-941.
11 MOERKOTTE G, NEUMANN T. Dynamic programming strikes back [C]// Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. 2008: 539-552.
12 CHAUDHURI S. An overview of query optimization in relational systems [C]// Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. 1998: 34-43.
13 TANG X, WU S, ZHANG D, et al.. Detecting logic bugs of join optimizations in dbms. Proceedings of the ACM on Management of Data, 2023, 1 (1): 55.
14 LEIS V, RADKE B, GUBICHEV A, et al.. Query optimization through the looking glass, and what we found running the join order benchmark. The VLDB Journal, 2018, 27, 643- 668.
15 HAN Y, WU Z, WU P, et al. Cardinality estimation in dbms: A comprehensive benchmark evaluation [EB/OL]. (2021-09-13)[2024-07-30]. https://doi.org/10.48550/arXiv.2109.05877.
16 WU Z, NEGI P, ALIZADEH M, et al.. FactorJoin: A new cardinality estimation framework for join queries. Proceedings of the ACM on Management of Data, 2023, 1 (1): 41.
17 BONCZ P, NEUMANN T, ERLING O. TPC-H analyzed: Hidden messages and lessons learned from an influential benchmark [C]// Technology Conference on Performance Evaluation and Benchmarking. Cham: Springer International Publishing, 2013: 61-76.
18 BARATA M, BERNARDINO J, FURTADO P.. An overview of decision support benchmarks: TPC-DS, TPC-H and SSB. New Contributions in Information Systems and Technologies, 2015, (1): 619- 628.
19 NAMBIAR R O, POESS M.. The making of TPC-DS. VLDB, 2006, (6): 1049- 1058.
20 AXTELL R L.. Zipf distribution of US firm sizes. Science, 2001, 293 (5536): 1818- 1820.
21 项兆坤, 陈婷, 苏仟, 等.. 面向OLAP数据库查询处理功能的模糊测试工具. 华东师范大学学报(自然科学版), 2021, (5): 74- 83.
22 ZHANG L, CHAI C, ZHOU X, et al. LearnedSQLGen: Constraint-aware SQL generation using reinforcement learning [C]// Proceedings of the 2022 International Conference on Management of Data. 2022: 945-958.
23 陈婷, 项兆坤, 徐金凯, 等.. 查询优化器连接顺序评估. 华东师范大学学报(自然科学版), 2022, (5): 48- 60.
24 MIRJALILI S. Evolutionary Algorithms and Neural Networks [M]. Cham: Springer International Publishing, 2019.
文章导航

/