Educational Data Management

Online analytical processing query cardinality estimation capability evaluation

  • Wei JIAN ,
  • Zirui HU ,
  • Rong ZHANG
Expand
  • School of Data Science and Engineering, East China Normal University, Shanghai 200062, China

Received date: 2024-07-04

  Accepted date: 2024-07-28

  Online published: 2024-09-23

Abstract

Query optimization can significantly enhance the analysis efficiency of online analytical processing (OLAP) database systems for massive educational data, providing fast and accurate data support for intelligent educational systems. The optimizer mainly consists of three modules: cardinality estimation, space enumeration, and cost models. Specifically, cardinality estimation determines the results of the cost model and guides the selection of query plans. Therefore, the evaluation of the cardinality estimation module of the optimizer plays a crucial role in the optimization of OLAP database systems. This study designs and implements an effective workload generation tool based on primary key-driven diversified data distribution and data relationship construction. The tool includes data generation technology with custom relationships, workload template generation technology based on finite state machines, and parameter instantiation technology driven by target cardinality. Experiments were conducted on three databases: OceanBase, TiDB, and PostgreSQL, analyzing the issues of their optimizers and providing suggestions.

Cite this article

Wei JIAN , Zirui HU , Rong ZHANG . Online analytical processing query cardinality estimation capability evaluation[J]. Journal of East China Normal University(Natural Science), 2024 , 2024(5) : 141 -151 . DOI: 10.3969/j.issn.1000-5641.2024.05.013

References

1 KOCAKO? I D, ERDEM S.. Business intelligence applications in retail business: OLAP, data mining & reporting services. Journal of Information & Knowledge Management, 2010, 9 (2): 171- 181.
2 LAAK K J, ARU J. AI and personalized learning: Bridging the gap with modern educational goals [EB/OL]. (2024-04-03)[2024-07-30]. https://doi.org/10.48550/arXiv.2404.02798.
3 ALRAWASHDEH G S, FYFFE S, AZEVEDO R F L, et al.. Exploring the impact of personalized and adaptive learning technologies on reading literacy: A global meta-analysis. Educational Research Review, 2024, 42, 100587.
4 LIU Y. The application of OLAP and Web technology in the evaluation of higher educational quality and the design of management system [C]// International Conference on Educational Technology and Administration. Cham: Springer International Publishing, 2022: 223-232.
5 PLATTNER H. A common database approach for OLTP and OLAP using an in-memory column database [C]// Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. 2009: 1-2.
6 LEHMAN T J, CAREY M J. Query processing in main memory database management systems [C]// Proceedings of the 1986 ACM SIGMOD International Conference on Management of Data. 1986: 239-250.
7 LAN H, BAO Z, PENG Y.. A survey on advancing the dbms query optimizer: Cardinality estimation, cost model, and plan enumeration. Data Science and Engineering, 2021, (6): 86- 101.
8 SELINGER P G, ASTRAHAN M M, CHAMBERLIN D D, et al. Access path selection in a relational database management system [C]// Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data. 1979: 23-34.
9 VANCE B, MAIER D.. Rapid bushy join-order optimization with cartesian products. ACM SIGMOD Record, 1996, 25 (2): 35- 46.
10 MOERKOTTE G, NEUMANN T. Analysis of two existing and one new dynamic programming algorithm for the generation of optimal bushy join trees without cross products [C]// Proceedings of the 32nd International Conference on Very Large Data Bases. 2006: 930-941.
11 MOERKOTTE G, NEUMANN T. Dynamic programming strikes back [C]// Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. 2008: 539-552.
12 CHAUDHURI S. An overview of query optimization in relational systems [C]// Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. 1998: 34-43.
13 TANG X, WU S, ZHANG D, et al.. Detecting logic bugs of join optimizations in dbms. Proceedings of the ACM on Management of Data, 2023, 1 (1): 55.
14 LEIS V, RADKE B, GUBICHEV A, et al.. Query optimization through the looking glass, and what we found running the join order benchmark. The VLDB Journal, 2018, 27, 643- 668.
15 HAN Y, WU Z, WU P, et al. Cardinality estimation in dbms: A comprehensive benchmark evaluation [EB/OL]. (2021-09-13)[2024-07-30]. https://doi.org/10.48550/arXiv.2109.05877.
16 WU Z, NEGI P, ALIZADEH M, et al.. FactorJoin: A new cardinality estimation framework for join queries. Proceedings of the ACM on Management of Data, 2023, 1 (1): 41.
17 BONCZ P, NEUMANN T, ERLING O. TPC-H analyzed: Hidden messages and lessons learned from an influential benchmark [C]// Technology Conference on Performance Evaluation and Benchmarking. Cham: Springer International Publishing, 2013: 61-76.
18 BARATA M, BERNARDINO J, FURTADO P.. An overview of decision support benchmarks: TPC-DS, TPC-H and SSB. New Contributions in Information Systems and Technologies, 2015, (1): 619- 628.
19 NAMBIAR R O, POESS M.. The making of TPC-DS. VLDB, 2006, (6): 1049- 1058.
20 AXTELL R L.. Zipf distribution of US firm sizes. Science, 2001, 293 (5536): 1818- 1820.
21 项兆坤, 陈婷, 苏仟, 等.. 面向OLAP数据库查询处理功能的模糊测试工具. 华东师范大学学报(自然科学版), 2021, (5): 74- 83.
22 ZHANG L, CHAI C, ZHOU X, et al. LearnedSQLGen: Constraint-aware SQL generation using reinforcement learning [C]// Proceedings of the 2022 International Conference on Management of Data. 2022: 945-958.
23 陈婷, 项兆坤, 徐金凯, 等.. 查询优化器连接顺序评估. 华东师范大学学报(自然科学版), 2022, (5): 48- 60.
24 MIRJALILI S. Evolutionary Algorithms and Neural Networks [M]. Cham: Springer International Publishing, 2019.
Outlines

/