随着互联网的快速发展和大数据时代的来临,传统数据库的局限性开始逐渐显现,而支持海量数据存储和高并发访问的分布式数据库系统越来越流行.在此背景下阿里巴巴集团研发了一款适用于海量数据存储的分布式数据库系统(OceanBase),并提供单集群和多集群两种部署模式.但多集群部署模式下的可用性较低,无法满足关键性应用的需求,包括:发生故障时不支持主备集群的自动切换;主备集群之间无法保证日志的强同步.针对上述问题,本文分析了传统数据库的高可用方案,针对OceanBase架构的特点,结合了Raft算法的思想,设计并实现了基于时间戳的分布式选举模块、自动化的集群切换模块和基于QUORUM策略的日志强同步模块.经实验验证,以上模块的实现能够提高系统整体的可用性.
With the rapid development of Internet and the upcoming Big Data era, the limitation of traditional database has been emerged and enlarged. The distributed database system based on massive data storage and high concurrent accesses has become more and more popular. Alibaba group developed a distributed database system suitable for mass data storage named OceanBase, which supports two deployment modes, i.e.〖KG-*3〗, single cluster and multiple clusters. But the availability of multiple clusters mode is not efficient and can’t satisfy the requirement of some critical applications, where it does not support the automatic switch between master cluster and slave cluster when a failure occurred and the inconsistent log is also generated during switching under multiple clusters mode. To address these problems, we analysis the high availability solutions of the traditional database,aiming at the characteristics of OceanBase architecture, combining the idea of in Raft, and then designs and implements the distributed election module based on the timestamp of logs, the automatic clusters switching module and the strong synchronization logs module based on QUORUM.The experimental results showed that the above approachescould improve the availability of the whole system.
[1]阳振坤.OceanBase关系数据库架构[J]. 华东师范大学学报(自然科学版),2014(5):141148.
[2]CHANG F, DEAN J, GHEMAWAT S, et al. Bigtable: A distributed storage system for structured data[C]Proceedings of the 7th Conference on USENIX Symposium on Operating Systems Design and Implementation.2006:205218.
[3]〖JP2〗CORBETT J C, DEAN J, EPSTEIN M, et al. Spanner: Google’s globallydistributed database[C]Proceedings of the 10th Conference on USENIX Symposium on Operating Systems Design and Implementation. 2012:251264.〖JP〗
[4]DECANDIA G, HASTORUN D, JAMPANI M, et al. Dynamo: Amazon’s highly available keyvalue store[C] SOSP′07:205220.
[5]吴勇毅.工信部力挺软件国产化 政策机遇促行业大发展[EB/OL]. [20140605].http:it.people.com.cn/n/2014/0605/c100925108211.html.
[6]OceanBase开源[EB/OL].[20140601].http:code.taobao.org/p/OceanBase/wiki/index/.
[7]杨传辉.大规模分布式存储系统:原理解析与架构实战[M].北京:机械工业出版社,2013:154155.
[8]Raft consensus algorithm website[EB/OL]. [20140205].https:raftconsensus.github.io.
[9]SKEEN D. A quorumbased commit protocol[C]Proceedings of the 6th Berkeley Workshop on Distributed Data Management and Computer Networks.1982:6980.
[10]Oracle maximum availability architecture[EB/OL]. [20140601].http:www.oracle.com/technetwork/database/features/availability/maa096107.html.
[11]Oracle Real Application Clusters[EB/OL].[20140501].http:www.oracle.com/technetwork/cn/database/options/clustering/overview/index.html.
[12]黄剑. 基于Oracle Data Guard的容灾策略设计与实现[J].科技广场,2006(11):7173.
[13]Oracle data guard[EB/OL].[20140506].http:www.oracle.com/technetwork/cn/database/dataguardoverview091578zhs.html.〖JP〗
[14]周欢.OceanBase一致性与可用性分析[J]. 华东师范大学学报(自然科学版),2014(5):103116.
[15]杨传辉.OceanBase高可用方案[J]. 华东师范大学学报(自然科学版),2014(5):173179.
[16]LAMPORT L. The parttime parliament[J]. ACM Transactions on Computer Systems, 1998, 16(2):133169.
[17]CHANDRA T D, GRIESEMER R, REDSTONE J. Paxos made live: An engineering perspective[C]Proceedings of the 26th Annual ACM Symposium on PODC. ACM, 2007:398407.
[18]LAMPORT L, MASSA M. Cheap Paxos[C]Proceedings of the 2004 International Conference on Dependable Systems and Networks. IEEE, 2004:307314.
[19]LAMPORT L. Fast Paxos[J]. Distributed Computing, 2006, 19(2): 79103.