近年来分布式数据库产品层出不穷,但分布式数据库较于单机数据库更复杂,为了让系统可用,设计者需要采用一致性协议来保证分布式数据库系统中的可用性和一致性这两个重要特性.保证一致性需要使用一致性协议为并发的事务更新操作确定一个全局的执行顺序,并协调局部状态和全局状态不断的达成动态一致;保证可用性需要一致性协议协调多副本之间的一致来实现主备节点的无缝切换.因此分布式一致性协议是实现分布式数据库系统的重要基础.详细介绍了经典的分布式一致性协议以及在目前常见的几种分布式数据库系统中一致性协议的应用,并从读写操作、节点类型与网络通信等方面进行对比分析.
In recent years, many distributed database products have emerged in the market; yet, distributed databases are still more complex than centralized databases. In order to make the system useable, designers need to adopt the consistency protocol to ensure two important features of distributed database systems:availability and consistency. The protocol ensures consistency by determining the global execution order of operations for concurrent transactions and by coordinating local and global states to achieve continuous dynamic agreement; The consistency protocol ensures availability by coordinating consistency between multiple copies to achieve seamless switching between master and standby nodes. Hence, the distributed consensus protocol is the fundamental basis for the distributed database system. This paper reviews, in detail, the classic distributed consistency protocol and the application of the consistency protocol to current mature distributed database. The study also provides analysis and a comparison between the two approaches considering factors like read-write operation, node type, and network communication.
[1] TANENBAUM, MAARTEN VAN STEEN. Distributed Systems Principles and Paradigms[M]. 2nd ed. USA:Pearson, 2001:1-10.
[2] BREWER E A. Towards robust distributed systems (abstract)[C]//Nineteenth ACM Symposium on Principles of Distributed Computing. New York:ACM, 2000:7.
[3] GILBERT S, LYNCH N. Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services[J]. Acm Sigact News, 2002, 33(2):51-59.
[4] TANENBAUM A S, STEEN M V. Distributed Systems:Principles and Paradigms[M]. Beijing:Tsinghua University Press, 2002.
[5] 朱涛, 郭进伟, 周欢, 等. 分布式数据库中一致性与可用性的关系[J]. 软件学报, 2018(1):131-149.
[6] 储佳佳, 郭进伟, 刘柏众, 等. 高可用数据库系统中的分布式一致性协议[J]. 华东师范大学学报(自然科学版), 2016, 2016(5):1-9.
[7] GIFFORD D K. Weighted voting for replicated data[C]//Acm Symposium on Operating Systems Principles. New York:ACM, 1979:150-162.
[8] ONGARO D, OUSTERHOUT J K. In search of an understandable consensus algorithm[C]//USENIX Annual Technical Conference. New York:ACM, 2014:305-319.
[9] JUNQUEIRA F P, REED B C, SERAFINI M. Zab:High-performance broadcast for primary-backup systems[C]//International Conference on Dependable Systems & Networks. New York:IEEE, 2011:245-256.
[10] LAMPORT L. Paxos made simple[J]. ACM Sigact News, 2001, 32(4):18-25.
[11] CHANDRA T D, GRIESEMER R, REDSTONE J. Paxos made live:an engineering perspective[C]//Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing. New York:ACM, 2007:398-407.
[12] LAMPORT L B, MASSA M T. Cheap paxos:U.S. Patent 7, 249, 280[P]. 2007-07-24.
[13] LAMPORT L. Fast paxos[J]. Distributed Computing, 2006, 19(2):79-103.
[14] LAMPORT L, MALKHI D, ZHOU L. Vertical paxos and primary-backup replication[C]//Proceedings of the 28th ACM symposium on Principles of distributed computing. New York:ACM, 2009:312-313.
[15] LAMPORT L, SHOSTAK R, PEASE M. The Byzantine generals problem[J]. ACM Transactions on Programming Languages and Systems (TOPLAS), 1982, 4(3):382-401.
[16] 张晨东, 郭进伟, 刘柏众, 等. 基于Raft一致性协议的高可用性实现[J]. 华东师范大学学报(自然科学版), 2015(5):172-184.
[17] 庞天泽. 可扩展数据管理系统中的高可用实现[D]. 上海:华东师范大学, 2016.
[18] FACEBOOK. The Apache Software foundation:Apache Cassandra Documentation v4.0[EB/OL]. (2016-09-01)[2018-04-10] http://cassandra.apache.org/.
[19] ZHENG J, LIN Q, XU J, et al. PaxosStore:High-availability storage made practical in WeChat[J]. Proceedings of the VLDB Endowment, 2017, 10(12):1730-1741.
[20] 江疑. X-Paxos:阿里巴巴的高性能分布式强一致Paxos独立基础库[EB/OL].[2017-08-07]. http://developer.51cto.com/art/201708/547380.htm.
[21] BURROWS M. The Chubby lock service for loosely-coupled distributed systems[C]//USENIX Association. Proceedings of the 7th symposium on Operating systems design and implementation. New York:ACM, 2006:335-350.
[22] BAKER J, BOND C, CORBETT J C, et al. Megastore:Providing scalable, highly available storage for interactive services[C]//Biennial Conference on Innovative Data Systems Research. USA:Online Proceedings, 2011(11):223-234.
[23] CORBETT J C, DEAN J, EPSTEIN M, et al. Spanner:Google's globally distributed database[J]. ACM Transactions on Computer Systems (TOCS), 2013, 31(3):8.