华东师范大学学报(自然科学版) ›› 2023, Vol. 2023 ›› Issue (5): 40-50.doi: 10.3969/j.issn.1000-5641.2023.05.004

• 数据库系统 • 上一篇    下一篇

存算分离架构下Part元数据的单独管理策略

刘丹琪, 蔡鹏*()   

  1. 华东师范大学 数据科学与工程学院, 上海 200062
  • 收稿日期:2023-06-30 接受日期:2023-07-24 出版日期:2023-09-25 发布日期:2023-09-15
  • 通讯作者: 蔡鹏 E-mail:pcai@dase.ecnu.edu.cn
  • 基金资助:
    国家自然科学基金(61972149, U22B2020)

Separate management strategies for Part metadata under the storage-computing separation architecture

Danqi LIU, Peng CAI*()   

  1. School of Data Science and Engineering, East China Normal University, Shanghai 200062, China
  • Received:2023-06-30 Accepted:2023-07-24 Online:2023-09-25 Published:2023-09-15
  • Contact: Peng CAI E-mail:pcai@dase.ecnu.edu.cn

摘要:

针对ClickHouse存在的硬件资源无法被充分利用、缺少弹性和节点启动过慢的问题, 在存算分离架构下, 提出了一套针对描述数据信息的元数据 (Part元数据) 的管理策略. Part元数据是元数据中最重要的组成成分. 为了能够有效管理远程共享存储上的数据, 采集了所有Part元数据文件, 并将其合并后, 经过键值映射、序列化和反序列化, 存入分布式键值数据库中. 此外, 还设计了一套同步策略, 以确保远程共享存储上的数据与分布式键值数据库中的元数据的一致性. 利用Part元数据管理策略及相关的同步策略, 实现了一个针对Part元数据的管理系统, 解决了ClickHouse节点启动过慢的问题, 并支持高效的节点动态扩缩容.

关键词: 数据库系统, 存算分离架构, 元数据管理

Abstract:

To address the deficiencies of ClickHouse, including underutilization of hardware resources, lack of flexibility, and slow node startup, this paper proposes metadata management strategies under the storage-compute separation architecture, which focuses on the description of data information through Part metadata. Part metadata are the most crucial component of metadata. To effectively manage data on remote shared storage, this study collected all Part metadata files and merged them. After key-value mapping, serialization, and deserialization processes, the merged metadata were stored in a distributed key-value database. Furthermore, a synchronization strategy was designed to ensure consistency between the data on remote shared storage and the metadata in the distributed key-value database. By implementing the above strategies, a metadata management system was developed for Part metadata, which effectively addressed the slow node startup issue in ClickHouse and supported efficient dynamic scaling of nodes.

Key words: database systems, storage-computing separation architecture, Part metadata management

中图分类号: