Blockchain System and Data Management

Optimization of HTAP data synchronization based on query frequency

  • Yongjin TANG ,
  • Jiabo SUN ,
  • Peng CAI
Expand
  • School of Data Science and Engineering, East China Normal University, Shanghai 200062, China

Received date: 2022-07-11

  Online published: 2022-09-26

Abstract

A hybrid transaction analytical processing (HTAP) system must concurrently support both transaction processing and query analysis. To eliminate interference between them, HTAP systems also typically assign different copies of data to both workloads, handling online transaction processing (OLTP) and online analytical processing (OLAP) requests separately, and synchronizing data between the copies based on a log replay. An HTAP system is committed to efficiently synchronizing OLTP data to OLAP, thereby providing a fresher data access service. In addition, the speed of sending and replaying the logs of the tables to be queried is a key factor affecting the freshness of the data. In this paper, using the table grouping based log parallel replay method and the characteristics of the HTAP load, a log sending and replay method is proposed based on the query frequency of the OLAP side. To ensure data consistency, this method improves the processing priority of high-frequency query table logs and achieves efficient log sending and replay capabilities along with a targeted priority display of high-frequency query table data, thereby ensuring the freshness of the HTAP system.

Cite this article

Yongjin TANG , Jiabo SUN , Peng CAI . Optimization of HTAP data synchronization based on query frequency[J]. Journal of East China Normal University(Natural Science), 2022 , 2022(5) : 26 -35 . DOI: 10.3969/j.issn.1000-5641.2022.05.003

References

1 RAZA A, CHRYSOGELOS P, ANADIOTIS A C, et al. Adaptive HTAP through elastic resource scheduling [C]// Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 2020: 2043-2054.
2 PSAROUDAKIS I, WOLF F, MAY N, et al. Scaling up mixed workloads: A battle of data freshness, flexibility, and scheduling [C]// Proceedings of the Technology Conference on Performance Evaluation and Benchmarking. 2014: 97-112.
3 KANTER J M, VEERAMACHANENI K. Deep feature synthesis: Towards automating data science endeavors [C]// Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics. 2015: 15652837.
4 LUO Y, WANG M, ZHOU H, et al. Autocross: Automatic feature crossing for tabular data in real-world applications [C]// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019: 1936-1945.
5 LAMH T, THIEBAUT J M, SINN M, et al. One button machine for automating feature engineering in relational databases [EB/OL]. (2017-06-01)[2022-06-12]. https://arxiv.org/pdf/1706.00327.pdf.
6 孙家博. HTAP系统中的并行日志回放优化 [D]. 上海: 华东师范大学, 2022.
7 MAKRESHANSKI D, GICEVA J, BARTHELS C, et al. BatchDB: Efficient isolated execution of hybrid OLTP + OLAP workloads for interactive applications [C]// Proceedings of the 2017 ACM International Conference on Management of Data. 2017: 37-50.
8 LAHIRI T, CHAVAN S, COLGAN M, et al. Oracle database in-memory: A dual format in-memory database [C]// Proceedings of the 2015 IEEE 31st International Conference on Data Engineering. 2015: 1253-1258.
9 LARSON P ?, BIRKA A, HANSON E N, et al. Real-time analytical processing with SQL Server. Proceedings of the VLDB Endowment, 2015, 8 (12): 1740- 1751.
10 YANG J, RAE I, XU J, et al. F1 Lightning: HTAP as a service. Proceedings of the VLDB Endowment, 2020, 13 (12): 3313- 3325.
11 HUANG D X, LIU Q, CUI Q, et al. TiDB: A Raft-based HTAP database. Proceedings of the VLDB Endowment, 2020, 13 (12): 3072- 3084.
12 HONG C T, ZHOU D, YANG M, et al. KuaFu: Closing the parallelism gap in database replication [C]// Proceedings of the IEEE 29th International Conference on Data Engineering. 2013: 1186-1195.
13 XIA Y, YU X Y, PAVLO A, et al. Taurus: Lightweight parallel logging for in-memory database management systems (extended version) [EB/OL]. (2020-10-14)[2022-06-08]. https://arxiv.org/pdf/2010.06760.pdf.
14 QIN D, BROWNA D, GOEL A. Scalable replay-based replication for fast databases. Proceedings of the VLDB Endowment, 2017, 10 (13): 2025- 2036.
Outlines

/