Primary key management in distributed log-structured database systems

  • HUANG Jian-wei ,
  • ZHANG Zhao ,
  • QIAN Wei-ning
Expand
  • School of Data Science and Engineering, East China Normal University, Shanghai 200062, China

Received date: 2018-07-04

  Online published: 2018-09-26

Abstract

At present, there are a large number of writing-intensive loads (e.g., secondkilling of e-commerce, social user-generated data streams) in many applications such as e-commerce, social networking, mobile Internet and so on, which makes log-structured storage a popular technique for back-end storage of modern database systems. However, log-structured storage only supports the append operation, efficient primary key management (primary key generation and update) functions can improve the performance of database append operations. In the distributed and concurrent environment, implementing primary key maintenance faces challenges, such as primary key unique constraints, transactional maintenance, and high-performance requirements. In light of the characteristics of log-structured storage, this paper explores how to implement efficient primary key management in distributed log-structured database systems. First, we propose two kinds of concurrency control models for WAR (Write After Read) operations; second, we adopt these two models to design efficient primary key management algorithms; and finally, we integrate these algorithms into our distributed log-structured database, CEDAR, and verify the effectiveness of the proposed methods by a series of experiments.

Cite this article

HUANG Jian-wei , ZHANG Zhao , QIAN Wei-ning . Primary key management in distributed log-structured database systems[J]. Journal of East China Normal University(Natural Science), 2018 , 2018(5) : 79 -90,119 . DOI: 10.3969/j.issn.1000-5641.2018.05.007

References

[1] ROSENBLUM M, OUSTERHOUT J K. The design and implementation of a log-structured file system[J]. ACM Transaction on Computer Systems, 1992, 10(1):26-52.
[2] CHANG F, DEAN J, GHEMAWAT S, et al. Bigtable:A distributed storage system for structured data[C]//Symposium on Operating Systems Design and Implementation. USENIX Association, 2006:205-218.
[3] Hbase.[EB/OL].[2018-07-02]. http://hbase.apache.org/.
[4] Cassandra.[EB/OL].[2018-07-02]. http://cassandra.apache.org/.
[5] CDEAR.[EB/OL].[2018-07-02]. https://github.com/daseECNU/Cedar/.
[6] O'NEIL P, CHENG E, GAWLICK D, et al. The log-structured merge-tree (LSM-tree)[J]. Acta Informatica, 1996, 33(4):351-385.
[7] Mysql Cluster.[EB/OL].[2018-07-02]. https://www.mysql.com/products/cluster/.
[8] STONEBRAKER M, WEISBERG A. The voltdb main memory DBMS[J]. IEEE Data Eng Bull, 2013, 36(2):21-27.
[9] OceanBase.[EB/OL].[2018-07-02]. https://github.com/alibaba/oceanbase/.
[10] GARCIA-MOLINA H, ULLMAN J D, WIDOM J D. Database System Implementation[M]. London:Prentice Hall, 1999:576-601.
[11] The Open Group. Distributed TP:The XA Specification[EB/OL].[2018-07-02]. https://publications.opengroup.org/c193.
[12] KUNG H T. On optimistic methods for concurrency control[C]//International Conference on Very Large Data Bases. IEEE, 1981:351-351.
[13] CHOI H J, JEONG B S. A timestamp-based optimistic concurrency control for handling mobile transactions[C]//International Conference on Computational Science and Its Applications. Berlin:Springer-Verlag, 2006:796-805.
[14] ESWARAN K P, GRAY J N, LORIE R A, et al. The notions of consistency and predicate locks in a database system[J]. Readings in Artificial Intelligence & Databases, 1989, 19(11):523-532.
[15] Sysbench.[EB/OL].[2018-07-02]. https://dev.mysql.com/downloads/benchmarks.html.
Outlines

/