华东师范大学学报(自然科学版) ›› 2017, Vol. 2017 ›› Issue (5): 30-39.doi: 10.3969/j.issn.1000-5641.2017.05.004

• 数据管理 • 上一篇    下一篇

面向CLAIMS基于Smart物化策略的列存储设计与实现

张晗, 周敏奇   

  1. 华东师范大学 数据科学与工程学院, 上海 200062
  • 收稿日期:2017-06-19 出版日期:2017-09-25 发布日期:2017-09-25
  • 通讯作者: 周敏奇,男,副教授,研究方向为对等计算、云计算、分布式数据管理、内存数据库管理系统.E-mail:mqzhou@sei.ecnu.edu.cn E-mail:mqzhou@sei.ecnu.edu.cn
  • 作者简介:张晗,男,硕士研究生,研究方向为内存数据库系统.E-mail:chxiaoyifeng1992@gmail.com
  • 基金资助:
    国家自然科学基金(61672233)

Design and implementation of Smart materialization for column-store in CLAIMS

ZHANG Han, ZHOU Min-qi   

  1. School of Data Science and Engineering, East China Normal University, Shanghai 200062, China
  • Received:2017-06-19 Online:2017-09-25 Published:2017-09-25

摘要: 物化是列存储数据库查询中必不可少的操作,物化策略和物化技术在查询执行过程中起着至关重要的作用.因此设计一种针对列存储数据库的物化策略尤为重要.提前物化生成的元组中存在无关属性;而延迟物化对选择率较高的查询可能无法优化其性能,且某些列会被访问多次.针对以上缺点,本文提出了有别于上述两种策略的策略——Smart物化策略.本文提出了在逻辑查询计划中使用结构——projection,该结构是由用户选取查询所需的属性来生成的,相当于对全表进行物理上的切分;在查询开始时,能减少直接加载到内存的数据量,避免额外的开销.在构建逻辑查询计划过程中,Smart物化策略将projection作为扫描操作标准来对数据进行按列划分,根据一组语句集中对列访问的相关性来对下一次查询所需要的列进行预测,将所需要的列加入到一个最合适的projection中来进行物化.本文通过在分布式内存数据库CLAIMS上使用TPC-H数据集来验证其有效性.

关键词: projection, Smart物化, 数据压缩

Abstract: Materialization is a necessary operation in the process of query execution. Materialization strategy and materialization technology play an important role in the process of query execution. Therefore, it is necessary to design a materialization strategy for column-store database. According to the shortcomings of early materialization and later materialization, we provide a strategy named Smart materialization that are different from the two strategies mentioned above. Here we need to define a concept in the logical query plan-projection, the structure is used to select the desired attributes, the physical table is cut by column, to ensure that the structure at the beginning of the query can reduce the direct load to memory of the amount of data, to avoid additional overhead. In the logical query plan, the projection is divided by columns, and the next required columns are predicted according to the relevance of the query in a set of queries, and the required columns are stabilized in one of the most appropriate projection. We use the data set of TPC-H to verify its validity worked on the disturbed in-memory database-CLAIMS.

Key words: projection, Smart materialization, data compression

中图分类号: