华东师范大学学报(自然科学版) ›› 2018, Vol. 2018 ›› Issue (5): 79-90,119.doi: 10.3969/j.issn.1000-5641.2018.05.007

• 高性能数据库管理 • 上一篇    下一篇

分布式日志结构数据库系统的主键维护方法研究

黄建伟, 张召, 钱卫宁   

  1. 华东师范大学 数据科学与工程学院, 上海 200062
  • 收稿日期:2018-07-04 出版日期:2018-09-25 发布日期:2018-09-26
  • 通讯作者: 张召,女,副教授,研究方向为区块链系统和海量数据管理与分析.E-mail:zhzhang@dase.ecnu.edu.cn. E-mail:zhzhang@dase.ecnu.edu.cn
  • 作者简介:黄建伟,男,硕士研究生,研究方向为分布式数据库.E-mail:jwhuang@stu.ecnu.edu.cn.
  • 基金资助:
    国家自然科学基金(61432006,61332006,U1401256);国家863计划项目(2016YFB1000905,2018YFB1003400)

Primary key management in distributed log-structured database systems

HUANG Jian-wei, ZHANG Zhao, QIAN Wei-ning   

  1. School of Data Science and Engineering, East China Normal University, Shanghai 200062, China
  • Received:2018-07-04 Online:2018-09-25 Published:2018-09-26

摘要: 目前在电子商务、社交网络、移动互联网等各类应用中存在大量的写密集型负载(例如,电子商务的秒杀活动、社交用户生成的数据流等),这使得基于日志结构的存储成为现代数据库系统中普遍的后端存储方式.而基于日志结构的数据存储方式一般只支持追加操作,高效的主键维护(主键的生成和更新)可以很好地提升数据库追加操作的性能.此外,在分布式和并发的环境中实现主键维护功能还要面临主键唯一性约束、事务性维护、高处理性能的挑战.因此,本文针对日志结构数据存储的特点,研究了如何在分布式日志结构数据库系统中实现高效的主键维护功能.首先,我们提出了两类先读后写操作的并发控制模型;其次,我们应用这两类模型设计了几种高效的主键维护算法;最后,我们在自己的基于日志结构的分布式数据库系统CEDAR中实现了本文提出的主键维护方法,并通过一系列实验验证了所提方法的高效性.

关键词: 日志结构, 分布式数据库, 主键维护, 并发环境

Abstract: At present, there are a large number of writing-intensive loads (e.g., secondkilling of e-commerce, social user-generated data streams) in many applications such as e-commerce, social networking, mobile Internet and so on, which makes log-structured storage a popular technique for back-end storage of modern database systems. However, log-structured storage only supports the append operation, efficient primary key management (primary key generation and update) functions can improve the performance of database append operations. In the distributed and concurrent environment, implementing primary key maintenance faces challenges, such as primary key unique constraints, transactional maintenance, and high-performance requirements. In light of the characteristics of log-structured storage, this paper explores how to implement efficient primary key management in distributed log-structured database systems. First, we propose two kinds of concurrency control models for WAR (Write After Read) operations; second, we adopt these two models to design efficient primary key management algorithms; and finally, we integrate these algorithms into our distributed log-structured database, CEDAR, and verify the effectiveness of the proposed methods by a series of experiments.

Key words: log-structured, distributed database, primary key management, concurrent environment

中图分类号: