华东师范大学学报(自然科学版) ›› 2018, Vol. 2018 ›› Issue (5): 67-78.doi: 10.3969/j.issn.1000-5641.2018.05.006

• 高性能数据库管理 • 上一篇    下一篇

面向Cedar的列存储设计与实现

俞文谦, 胡爽, 胡卉芪   

  1. 华东师范大学 数据科学与工程学院, 上海 200062
  • 收稿日期:2018-07-09 出版日期:2018-09-25 发布日期:2018-09-26
  • 通讯作者: 胡卉芪,男,助理研究员,研究方向为数据库.E-mail:hqhu@dase.ecnu.edu.cn. E-mail:hqhu@dase.ecnu.edu.cn
  • 作者简介:俞文谦,男,硕士研究生,研究方向为分布式数据库系统.E-mail:wqyu_cs@163.com.
  • 基金资助:
    国家自然科学基金(61702189);上海市青年扬帆计划(17YF1427800)

The designs and implementations of columnar storage in Cedar

YU Wen-qian, HU Shuang, HU Hui-qi   

  1. School of Data Science and Engineering, East China Normal University, Shanghai 200062, China
  • Received:2018-07-09 Online:2018-09-25 Published:2018-09-26

摘要: 随着数据规模和分析需求的日益增长,数据库面向联机分析处理(On-Line Analytical Processing,OLAP)应用的查询性能变得愈发重要.Cedar是一款基于读写分离架构的分布式关系数据库,由于它主要面向联机事务处理(On-Line Transaction Processing,OLTP)业务,在面对分析处理负载时性能表现不足.对于这个问题,很多研究表明列存储技术能够有效地提高I/O(Input/Output)效率,进而提升分析处理的性能.在Cedar上提出了一种列存储机制,分析了其适用场景并针对这种机制改进了Cedar的数据扫描和批量更新方法.实验结果表明,该机制能大幅度地提升Cedar分析处理性能,并且对事务处理性能的影响控制在10%以内.

关键词: 分布式数据库, 列存储, 联机分析处理(OLAP)

Abstract: With the growing size of data and analytical needs, the query performance of databases for OLAP (On-Line Analytical Processing) applications has become increasingly important. Cedar is a distributed relational database based on read-write decoupled architecture. Since Cedar is mainly oriented to the needs of OLTP (On-Line Transaction Processing) applications, it has insufficient performance for handling analytical processing workloads. To address this issue, many studies have shown that column storage technology can effectively improve the efficiency of I/O (Input/Output) and enhance the performance of analytical processing. This paper presents a column-based storage mechanism in Cedar. The study analyzes applicable scenarios and improves Cedar's data query and batch update methods for this mechanism. The results of an experiment demonstrate that the proposed mechanism can enhance the performance of analytical processing substantially, while limiting the negative impacts on transaction processing performance to within 10%.

Key words: distributed database, column-based storage, OLAP

中图分类号: