华东师范大学学报(自然科学版) ›› 2024, Vol. 2024 ›› Issue (5): 141-151.doi: 10.3969/j.issn.1000-5641.2024.05.013

• 教育数据管理 • 上一篇    下一篇

OLAP查询基数预估能力评估

简炜, 胡梓锐, 张蓉*()   

  1. 华东师范大学 数据科学与工程学院, 上海 200062
  • 收稿日期:2024-07-04 接受日期:2024-07-28 出版日期:2024-09-25 发布日期:2024-09-23
  • 通讯作者: 张蓉 E-mail:rzhang@dase.ecnu.edu.cn
  • 基金资助:
    国家自然科学基金(62072179); OceanBase联合实验室项目

Online analytical processing query cardinality estimation capability evaluation

Wei JIAN, Zirui HU, Rong ZHANG*()   

  1. School of Data Science and Engineering, East China Normal University, Shanghai 200062, China
  • Received:2024-07-04 Accepted:2024-07-28 Online:2024-09-25 Published:2024-09-23
  • Contact: Rong ZHANG E-mail:rzhang@dase.ecnu.edu.cn

摘要:

查询优化可以显著提升联机分析处理 (online analytical processing, OLAP) 数据库系统对海量教育数据的分析效率, 为智能教学系统提供快速、精准的数据支持. 优化器主要包含基数预估、空间枚举和代价模型3个模块. 其中, 基数预估决定代价模型的结果, 并指导查询计划的选择. 因此, 优化器的基数预估模块评估对OLAP数据库系统优化起到积极的推动作用. 本文设计并实现了一套基于主键驱动的、构造多样化数据分布和数据关联关系的有效负载生成工具, 包含自定义关联关系的数据生成技术、基于有限状态机的负载模版生成技术和目标基数驱动的参数实例化技术. 并在3个数据库OceanBase、TiDB和PostgreSQL上进行了实验, 分析了3个数据库优化器存在的问题, 并给出了建议.

关键词: 分析型数据库, 查询优化, 基数预估

Abstract:

Query optimization can significantly enhance the analysis efficiency of online analytical processing (OLAP) database systems for massive educational data, providing fast and accurate data support for intelligent educational systems. The optimizer mainly consists of three modules: cardinality estimation, space enumeration, and cost models. Specifically, cardinality estimation determines the results of the cost model and guides the selection of query plans. Therefore, the evaluation of the cardinality estimation module of the optimizer plays a crucial role in the optimization of OLAP database systems. This study designs and implements an effective workload generation tool based on primary key-driven diversified data distribution and data relationship construction. The tool includes data generation technology with custom relationships, workload template generation technology based on finite state machines, and parameter instantiation technology driven by target cardinality. Experiments were conducted on three databases: OceanBase, TiDB, and PostgreSQL, analyzing the issues of their optimizers and providing suggestions.

Key words: OLAP database, query optimization, cardinality estimation

中图分类号: