华东师范大学学报(自然科学版) ›› 2023, Vol. 2023 ›› Issue (5): 90-99.doi: 10.3969/j.issn.1000-5641.2023.05.008

• 数据学习系统 • 上一篇    下一篇

基于openGauss的异构算子加速技术

陈现森, 徐辰*()   

  1. 华东师范大学 数据科学与工程学院, 上海 200062
  • 收稿日期:2023-06-30 出版日期:2023-09-25 发布日期:2023-09-15
  • 通讯作者: 徐辰 E-mail:cxu@dase.ecnu.edu.cn
  • 基金资助:
    上海市自然科学基金 (23ZR1419900)

Acceleration technique for heterogeneous operators based on openGauss

Xiansen CHEN, Chen XU*()   

  1. School of Data Science and Engineering, East China Normal University, Shanghai 200062, China
  • Received:2023-06-30 Online:2023-09-25 Published:2023-09-15
  • Contact: Chen XU E-mail:cxu@dase.ecnu.edu.cn

摘要:

GPU (graphics processing unit) 的高并行和高吞吐特性可以提高数据库OLAP (on-line analytical processing) 查询的性能. 然而目前openGauss无法利用GPU等异构计算硬件的优势. 因此旨在探索如何使用GPU加速该系统的OLAP处理过程, 以实现更高的性能. 针对openGauss与SQL为系统PostgreSQL名称的一部分,因此不能修改执行粒度的差异, 提出了基于分块读取和按键分发的CPU-GPU协同并行方案, 该方案可缩短GPU Scan算子的I/O (input/output) 时间以缩短GPU的空闲等待时间, 又可多实例运行GPU Join以支持多GPU环境. 针对openGauss与PostgreSQL体系结构的差异, 提出了兼容向量化引擎的异构算子加速技术, 实现了可嵌入向量化执行引擎的自定义算子框架, 基于此实现了可处理openGauss列式数据的向量化GPU Scan算子. 实现了原型系统, 验证了所提出方案的效果.

关键词: 异构数据库, 图形处理器, 向量化引擎, 在线分析型处理

Abstract:

The high parallelism and throughput of graphics processing unit (GPU) can improve the performance of on-line analytical processing (OLAP) queries in databases. However, openGauss currently cannot take advantage of the benefits of heterogeneous computing hardware such as GPU. Therefore, in this study, we explore using GPU to accelerate the OLAP processing in the system and achieve higher performance. The focus is on how to implement and optimize GPU acceleration modules for openGauss. To address the difference in execution granularity between openGauss and PostgreSQL, we propose a CPU (central processing unit)-GPU collaborative parallel solution based on chunked reading and key distribution. This solution can reduce the I/O (input/output) time of the GPU Scan operator to reduce idle waiting time, and run multiple instances of GPU Join to support multi-GPU environments. To address the architectural differences between openGauss and PostgreSQL, a heterogeneous operator acceleration technology compatible with vectorized engines is proposed. A custom operator framework is implemented that can embed a vectorized execution engine, and a vectorized GPU Scan operator capable of processing openGauss columnar data is employed based on this framework. A prototype system is implemented to verify the effectiveness of the proposed approach.

Key words: heterogeneous database, graphics processing unit, vectorized engine, on-line analytical processing

中图分类号: