基于Paxos的分布式一致性算法的实现与优化

祝朝凡; 郭进伟; 蔡鹏

doi:10.3969/j.issn.1000-5641.2019.05.014

华东师范大学学报（自然科学版） >

2019 , Vol. 2019 >Issue 5: 168 - 177

DOI: https://doi.org/10.3969/j.issn.1000-5641.2019.05.014

新时期数据管理技术

基于Paxos的分布式一致性算法的实现与优化

祝朝凡 ,
郭进伟 ,
蔡鹏

展开

华东师范大学数据科学与工程学院, 上海 200062

祝朝凡,男,硕士研究生,研究方向为分布式数据库.E-mail:chaofanzhu2@163.com.

收稿日期: 2019-07-28

网络出版日期: 2019-10-11

基金资助

国家重点研发计划（2018YFB1003303）；国家自然科学基金（61432006）

收起

Implementation and optimization of a distributed consistency algorithm based on Paxos

ZHU Chao-fan ,
GUO Jin-wei ,
CAI Peng

Expand

School of Data Science and Engineering, East China Normal University, Shanghai 200062, China

Received date: 2019-07-28

Online published: 2019-10-11

Fold

摘要

互联网的不断发展，企业的信息化程度不断加强，不计其数的数据需要得到及时处理.但是网络环境不稳定，容易发生数据丢失、节点宕机，从而造成严重后果.因此，构建可以容错的分布式存储系统变得越来越受欢迎.为了保证系统的高可用性和一致性，需要引入分布式一致性算法.为了提高系统在不稳定网络下的性能，传统基于Paxos的分布式系统允许日志中存在空洞.然而，当节点进入恢复状态时，这些系统通常需要大量网络交互来补全日志空洞，这极大地增加了节点恢复的时间，从而影响了系统的可用性.针对节点恢复过程中补全日志空洞代价过大的问题，本文重新设计了日志项结构，优化了数据恢复流程，通过实验模拟，验证改进的基于Paxos的一致性算法的有效性.

关键词： 分布式存储系统; 一致性; 日志复制; 节点恢复

本文引用格式

祝朝凡 , 郭进伟 , 蔡鹏 . 基于Paxos的分布式一致性算法的实现与优化[J]. 华东师范大学学报（自然科学版）, 2019 , 2019(5) : 168 -177 . DOI: 10.3969/j.issn.1000-5641.2019.05.014

Abstract

With the ongoing development of the Internet, the degree of informationization in enterprises is continuously increasing, and more and more data needs to be processed in a timely manner. In this context, the instability of network environments may lead to data loss and node downtime, which can have potentially serious consequences. Therefore, building distributed fault-tolerant storage systems is becoming increasingly popular. In order to ensure high availability and consistency across the system, a distributed consistency algorithm needs to be introduced. To improve the performance of unstable networks, traditional distributed systems based on Paxos allow for the existence of holes in the log. However, when a node enters a recovery state, these systems typically require a large amount of network interaction to complete the holes in the log; this greatly increases the time for node recovery and thereby affects system availability. To address the complexity of the node recovery process after completing a hole log, this paper proposes a redesigned log entry structure and optimized data recovery process. The effectiveness of the improved Paxos-based consistency algorithm is verified with experimental simulation.

Key words： distributed storage systems; consistency; log replication; node recovery

参考文献

[1] ADDISIE A, BERTACCO V. Collaborative accelerators for in-memory MapReduce on scale-up machines[C]//Proceedings of the 24th Asia and South Pacific Design Automation Conference. New York:ACM, 2019:747-753.
[2] APPUSWAMY R, GKANTSIDIS C, NARAYANAN D, et al. Scale-up vs scale-out for hadoop:Time to rethink?[C]//Proceedings of the 4th annual Symposium on Cloud Computing. New York:ACM, 2013:20.
[3] KRASKA T, PANG G, FRANKLIN M J, et al. MDCC:Multi-data center consistency[C]//Proceedings of the 8th ACM European Conference on Computer Systems. New York:ACM, 2013:113-126.
[4] MUÑOZ-ESCOÍ F D, DE JUAN-MARÍN R, GARCÍA-ESCRIVÁ J R, et al. CAP theorem:Revision of its related consistency models[J]. The Computer Journal, 2019, 62(6):943-960.
[5] LAMPORT L. Paxos made simple[J]. ACM Sigact News, 2001, 32(4):18-25.
[6] LEE J, MUEHLE M. Distributed transaction management using two-phase commit optimization:U.S. Patent 8,442,962[P]. 2013-5-14.
[7] ATIF M. Analysis and verification of two-phase commit&three-phase commit protocols[C]//2009 International Conference on Emerging Technologies. New York:IEEE, 2009:326-331.
[8] HERLIHY M. A quorum-consensus replication method for abstract data types[J]. ACM Transactions on Computer Systems (TOCS), 1986, 4(1):32-53.
[9] BURROWS M. The Chubby lock service for loosely-coupled distributed systems[C]//Proceedings of the 7th Symposium on Operating systems design and implementation. USENIX Association, 2006:335-350.
[10] BAKER J, BOND C, JAMES C, et al. Megastore:Providing scalable, highly available storage for interactive services[C]//Proceedings of CIDR'11, 2011:9-12.
[11] CORBETT J C, DEAN J, EPSTEIN M, et al. Spanner:Google's globally distributed database[J]. ACM Transactions on Computer Systems (TOCS), 2013, 31(3):8.
[12] ZHENG J, LIN Q, XU J, et al. PaxosStore:High-availability storage made practical in WeChat[J]. Proceedings of the VLDB Endowment, 2017, 10(12):1730-1741.
[13] RAO J, SHEKITA E J, TATA S. Using paxos to build a scalable, consistent, and highly available datastore[J]. Proceedings of the VLDB Endowment, 2011, 4(4):243-254.
[14] OKI B M, LISKOV B H. Viewstamped replication:A new primary copy method to support highly-available distributed systems[C]//Proceedings of the seventh annual ACM Symposium on Principles of distributed computing. New York:ACM, 1988:8-17.
[15] OKI B M. Viewstamped replication for highly available distributed systems[R].Massachusetts Inst of Tech Cambridge Lab for Computer Science, 1988.
[16] LAMPORT L, MASSA M. Cheap paxos[C]//International Conference on Dependable Systems and Networks, 2004. New York:IEEE, 2004:307-314.
[17] MAO Y, JUNQUEIRA F P, MARZULLO K. Mencius:Building efficient replicated state machines for WANs[C]//Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI'08, Berkeley, 2008:369-384.
[18] LAMPORT L B. Generalized paxos:U.S. Patent 7,698,465[P]. 2010-4-13.
[19] ONGARO D, OUSTERHOUT J. In search of an understandable consensus algorithm[C]//2014{USENIX}Annual Technical Conference ({USENIX}{ATC}14). 2014:305-319.
[20] MORARU I, ANDERSEN D G, KAMINSKY M. Egalitarian paxos[C]//ACM Symposium on Operating Systems Principles, 2012.
[21] LIN W, JIANG H, ZHAO N, et al. An optimized multi-Paxos protocol with centralized failover mechanism for cloud storage applications[C]//International Conference on Collaborative Computing:Networking, Applications and Worksharing. New York:Springer, 2018:610-625.
[22] POKE M, HOEFLER T. Dare:High-performance state machine replication on rdma networks[C]//Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing. New York:ACM, 2015:107-118.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献