华东师范大学学报(自然科学版)

• 计算机科学 • 上一篇    下一篇

非阻塞事务型实时数据注入技术研究与实现

余 楷, 李志方, 周敏奇, 周傲英   

  1. 华东师范大学 数据科学与工程研究院, 上海 200062
  • 收稿日期:2016-06-27 出版日期:2016-09-25 发布日期:2016-11-29
  • 通讯作者: 周敏奇, 男, 副教授. 研究方向为对等计算、云计算、分布式数据管理、内存数据管理系统. E-mail: mqzhou@sei.ecnu.edu.cn.
  • 基金资助:

    国家自然科学基金重点项目(61332006), 上海市基金(13ZR1413200)

Research and implementation of transactional real-time data ingestion technology without blocking

YU Kai, LI Zhi-fang, ZHOU Min-qi, ZHOU Ao-ying   

  1. Institue for Data Science and Engineering, East China Normal University, Shanghai 200062, China
  • Received:2016-06-27 Online:2016-09-25 Published:2016-11-29

摘要:

伴随着大数据时代来临, 传统数据库系统已逐渐无法应对海量数据处理带来的挑战, 而分布式数据库系统得到了越来越多的部署和应用. 分布式数据库系统部署数据于多台机器上, 利用大规模并行计算技术实现了对海量数据的存储、管理和分析. 但针对金融领域严苛的事务型实时数据注入需求, 现有分布式数据库系统对其支持有限, 其主要原因在于利用锁和两阶段提交等方式实现分布式事务处理, 无法做到非阻塞式数据注入, 极大地影响了数据注入的性能. 华东师范大学数据科学与工程研究院自主研发的分布式内存数据库系统-----CLAIMS, 已能提供面向关系型数据集的实时数据分析服务, 但尚不能支持实时数据注入. 针对上述实时数据注入的问题, 本文重点分析了现有数据注入技术和基于分布式事务处理的实现方式, 设计了面向元数据的集中式事务处理策略, 利用无锁编程技术, 实现了支持分布式事务的高性能实时数据注入框架, 并通过热备机制实现系统的高可用性. 上述框架在 CLAIMS 系统中的实现, 经充分实验表明: 该框架能够实现高通量的事务型实时数据注入, 同时支持低延时的实时数据查询.

关键词: 分布式数据库, 实时数据注入; 事务, CLAIMS

Abstract:

With the advent of big data era, traditional database systems are facing difficulties in satisfying the new challenges brought by massive data processing, while distributed database systems have been deployed widely in real applications. Distributed database systems partitioned and the dispatched the data across machines under a designed scheme and analyzed all the massive data in massive parallel manner. In facing of the requirements of the transactional real-time data ingestion from financial field, distributed database systems are ineffective and inefficient due to their implementation of the distributed transaction processing based on the lock and two-phase commit, which lead to the impossibility of non-blocking data ingestion. CLAIMS is a distributed in-memory database system designed and implemented by Institute for Data Science and Engineering of ECNU. It supports real-time data analysis towards relational data set but is incapable of real-time data ingestion. To address these problems, we analyzed data ingestion technology and distributed transaction processing algorithms first, and proposed to mimic the transactional data ingestion in the distributed environment with the centralized transaction processing based on meta data, and eventually achieved the real-time data ingestion with high availability and without blocking. The experiment results with the implementation of the proposed algorithms in CLAIMS proved that the proposed framework could achieve high throughput transactional real-time data ingestion as well as low latency real-time
query processing.

Key words: distributed database system, real-time data ingestion, transaction processing, CLAIMS