区块链系统与数据管理

基于查询频率的混合事务分析处理数据同步优化

  • 唐永金 ,
  • 孙家博 ,
  • 蔡鹏
展开
  • 华东师范大学 数据科学与工程学院, 上海 200062

收稿日期: 2022-07-11

  网络出版日期: 2022-09-26

Optimization of HTAP data synchronization based on query frequency

  • Yongjin TANG ,
  • Jiabo SUN ,
  • Peng CAI
Expand
  • School of Data Science and Engineering, East China Normal University, Shanghai 200062, China

Received date: 2022-07-11

  Online published: 2022-09-26

摘要

混合事务分析处理 (Hybrid Transaction Analytical Processing, HTAP) 系统需要同时支持事务处理和查询分析两种工作负载. 为了消除这两种负载之间的干扰, HTAP系统会为这两种负载指定不同的数据副本, 分别处理OLTP (Online Transaction Processing) 和OLAP (Online Analytical Processing) 请求, 并基于日志回放进行副本之间的数据同步. HTAP系统致力于同步OLTP数据到OLAP端以提供更新鲜的数据查询服务, 日志的发送与回放速度是影响数据新鲜度的关键因素. 本文在基于表分组的日志并行回放方法的基础上, 针对HTAP负载特点, 提出基于OLAP端查询频率的日志发送与回放方法. 在保障数据一致性的前提下, 本文所提出的方法提升了高频查询表日志的处理优先级, 最终实现高效的日志发送、日志回放以及针对性的高频查询表数据优先展示, 保证了HTAP系统数据的新鲜度.

本文引用格式

唐永金 , 孙家博 , 蔡鹏 . 基于查询频率的混合事务分析处理数据同步优化[J]. 华东师范大学学报(自然科学版), 2022 , 2022(5) : 26 -35 . DOI: 10.3969/j.issn.1000-5641.2022.05.003

Abstract

A hybrid transaction analytical processing (HTAP) system must concurrently support both transaction processing and query analysis. To eliminate interference between them, HTAP systems also typically assign different copies of data to both workloads, handling online transaction processing (OLTP) and online analytical processing (OLAP) requests separately, and synchronizing data between the copies based on a log replay. An HTAP system is committed to efficiently synchronizing OLTP data to OLAP, thereby providing a fresher data access service. In addition, the speed of sending and replaying the logs of the tables to be queried is a key factor affecting the freshness of the data. In this paper, using the table grouping based log parallel replay method and the characteristics of the HTAP load, a log sending and replay method is proposed based on the query frequency of the OLAP side. To ensure data consistency, this method improves the processing priority of high-frequency query table logs and achieves efficient log sending and replay capabilities along with a targeted priority display of high-frequency query table data, thereby ensuring the freshness of the HTAP system.

参考文献

1 RAZA A, CHRYSOGELOS P, ANADIOTIS A C, et al. Adaptive HTAP through elastic resource scheduling [C]// Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 2020: 2043-2054.
2 PSAROUDAKIS I, WOLF F, MAY N, et al. Scaling up mixed workloads: A battle of data freshness, flexibility, and scheduling [C]// Proceedings of the Technology Conference on Performance Evaluation and Benchmarking. 2014: 97-112.
3 KANTER J M, VEERAMACHANENI K. Deep feature synthesis: Towards automating data science endeavors [C]// Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics. 2015: 15652837.
4 LUO Y, WANG M, ZHOU H, et al. Autocross: Automatic feature crossing for tabular data in real-world applications [C]// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019: 1936-1945.
5 LAMH T, THIEBAUT J M, SINN M, et al. One button machine for automating feature engineering in relational databases [EB/OL]. (2017-06-01)[2022-06-12]. https://arxiv.org/pdf/1706.00327.pdf.
6 孙家博. HTAP系统中的并行日志回放优化 [D]. 上海: 华东师范大学, 2022.
7 MAKRESHANSKI D, GICEVA J, BARTHELS C, et al. BatchDB: Efficient isolated execution of hybrid OLTP + OLAP workloads for interactive applications [C]// Proceedings of the 2017 ACM International Conference on Management of Data. 2017: 37-50.
8 LAHIRI T, CHAVAN S, COLGAN M, et al. Oracle database in-memory: A dual format in-memory database [C]// Proceedings of the 2015 IEEE 31st International Conference on Data Engineering. 2015: 1253-1258.
9 LARSON P ?, BIRKA A, HANSON E N, et al. Real-time analytical processing with SQL Server. Proceedings of the VLDB Endowment, 2015, 8 (12): 1740- 1751.
10 YANG J, RAE I, XU J, et al. F1 Lightning: HTAP as a service. Proceedings of the VLDB Endowment, 2020, 13 (12): 3313- 3325.
11 HUANG D X, LIU Q, CUI Q, et al. TiDB: A Raft-based HTAP database. Proceedings of the VLDB Endowment, 2020, 13 (12): 3072- 3084.
12 HONG C T, ZHOU D, YANG M, et al. KuaFu: Closing the parallelism gap in database replication [C]// Proceedings of the IEEE 29th International Conference on Data Engineering. 2013: 1186-1195.
13 XIA Y, YU X Y, PAVLO A, et al. Taurus: Lightweight parallel logging for in-memory database management systems (extended version) [EB/OL]. (2020-10-14)[2022-06-08]. https://arxiv.org/pdf/2010.06760.pdf.
14 QIN D, BROWNA D, GOEL A. Scalable replay-based replication for fast databases. Proceedings of the VLDB Endowment, 2017, 10 (13): 2025- 2036.
文章导航

/