Journal of East China Normal University(Natural Science) ›› 2023, Vol. 2023 ›› Issue (5): 26-39.doi: 10.3969/j.issn.1000-5641.2023.05.003

• Database Systems • Previous Articles     Next Articles

Hybrid granular buffer management scheme for storage and computing separation architecture

Wenjuan MEI, Peng CAI*()   

  1. School of Data Science and Engineering, East China Normal University, Shanghai 200062, China
  • Received:2023-07-01 Online:2023-09-25 Published:2023-09-15
  • Contact: Peng CAI


The architecture of storage-compute separation has emerged as a solution for improving the performance and efficiency of large-scale data processing. However, there are notable performance bottlenecks in this approach, primarily due to the low access efficiency of object storage and the significant network overhead. Additionally, object storage exhibits low storage efficiency for small-sized files. For instance, ClickHouse, a MergeTree-based database, generates a plethora of small-sized files when storing data. To address these challenges, HG-Buffer (hybrid granularity buffer) is introduced as an SSD (solid state driver)-based caching management solution for optimizing the storage-compute separation in ClickHouse and S3, while also tackling the small-file issue in object storage. The primary objective of HG-Buffer is to minimize network transmission overhead and enhance system access efficiency. This is achieved by introducing SSD as a caching layer between the compute and storage layers and organizing the SSD buffer into two granularities: object buffer and block buffer. The object buffer granularity corresponds to the data granularity in object storage, while the block buffer granularity represents the data granularity accessed by the system, with the block buffer granularity being a subset of the object buffer granularity. By statistically analyzing data hotness information, HG-Buffer adaptively selects the storage location for data, improving SSD space utilization and system performance. Experimental evaluations conducted on ClickHouse and S3 demonstrate the effectiveness and robustness of HG-Buffer.

Key words: storage-computing separation, hybrid granular cache management, solid state driver (SSD) cache

CLC Number: