华东师范大学学报(自然科学版) ›› 2022, Vol. 2022 ›› Issue (5): 26-35.doi: 10.3969/j.issn.1000-5641.2022.05.003

• 区块链系统与数据管理 • 上一篇    下一篇

基于查询频率的混合事务分析处理数据同步优化

唐永金, 孙家博, 蔡鹏*()   

  1. 华东师范大学 数据科学与工程学院, 上海 200062
  • 收稿日期:2022-07-11 出版日期:2022-09-25 发布日期:2022-09-26
  • 通讯作者: 蔡鹏 E-mail:pcai@dase.ecnu.edu.cn

Optimization of HTAP data synchronization based on query frequency

Yongjin TANG, Jiabo SUN, Peng CAI*()   

  1. School of Data Science and Engineering, East China Normal University, Shanghai 200062, China
  • Received:2022-07-11 Online:2022-09-25 Published:2022-09-26
  • Contact: Peng CAI E-mail:pcai@dase.ecnu.edu.cn

摘要:

混合事务分析处理 (Hybrid Transaction Analytical Processing, HTAP) 系统需要同时支持事务处理和查询分析两种工作负载. 为了消除这两种负载之间的干扰, HTAP系统会为这两种负载指定不同的数据副本, 分别处理OLTP (Online Transaction Processing) 和OLAP (Online Analytical Processing) 请求, 并基于日志回放进行副本之间的数据同步. HTAP系统致力于同步OLTP数据到OLAP端以提供更新鲜的数据查询服务, 日志的发送与回放速度是影响数据新鲜度的关键因素. 本文在基于表分组的日志并行回放方法的基础上, 针对HTAP负载特点, 提出基于OLAP端查询频率的日志发送与回放方法. 在保障数据一致性的前提下, 本文所提出的方法提升了高频查询表日志的处理优先级, 最终实现高效的日志发送、日志回放以及针对性的高频查询表数据优先展示, 保证了HTAP系统数据的新鲜度.

关键词: 混合事务分析处理, 数据新鲜度, 日志回放, 冲突检测

Abstract:

A hybrid transaction analytical processing (HTAP) system must concurrently support both transaction processing and query analysis. To eliminate interference between them, HTAP systems also typically assign different copies of data to both workloads, handling online transaction processing (OLTP) and online analytical processing (OLAP) requests separately, and synchronizing data between the copies based on a log replay. An HTAP system is committed to efficiently synchronizing OLTP data to OLAP, thereby providing a fresher data access service. In addition, the speed of sending and replaying the logs of the tables to be queried is a key factor affecting the freshness of the data. In this paper, using the table grouping based log parallel replay method and the characteristics of the HTAP load, a log sending and replay method is proposed based on the query frequency of the OLAP side. To ensure data consistency, this method improves the processing priority of high-frequency query table logs and achieves efficient log sending and replay capabilities along with a targeted priority display of high-frequency query table data, thereby ensuring the freshness of the HTAP system.

Key words: hybrid transaction analytical processing, data freshness, log replay, conflict detection

中图分类号: