华东师范大学学报(自然科学版) ›› 2020, Vol. 2020 ›› Issue (4): 98-107.doi: 10.3969/j.issn.1000-5641.201921011

• 计算机科学 • 上一篇    下一篇

线性驱动的分布式数据库容错性自动化测试

沈靖, 蔡鹏   

  1. 华东师范大学 数据科学与工程学院, 上海 200062
  • 收稿日期:2019-08-22 发布日期:2020-07-20
  • 通讯作者: 蔡鹏,男,副教授,硕士生导师,研究方向为高性能事务处理、高可用机制.E-mail:pcai@dase.ecnu.edu.cn E-mail:pcai@dase.ecnu.edu.cn
  • 基金资助:
    国家重点研发计划(2018YFB1003404); 国家自然科学基金(61972149)

Lineage-driven distributed database system with an automated fault injection test tool

SHEN Jing, CAI Peng   

  1. School of Data Science and Engineering, East China Normal University, Shanghai 200062, China
  • Received:2019-08-22 Published:2020-07-20

摘要: 大规模的分布式数据库中, 诸如网络分区、信息丢失、节点宕机等软硬件故障无法避免. 为了提高分布式数据库的可靠性、验证容错协议的正确性, 分布式数据库应定期进行故障注入测试, 即在系统运行过程中人为引发故障. 然而各种故障的组合空间太大, 无法枚举. 已有的测试方法: 一类是随机式故障组合, 其实现方法简单但不能保证探索了所有的故障组合; 另一类是通过专业知识分析系统构成并设计的故障组合, 其测试结果更加完善但不具备普及性. 以线性数据驱动的故障注入测试LDFI (Lineage-Driven Fault Injection)为原型, 在分布式数据库的基础上,实现了一种同时具有完备性和普及性的自动化故障注入测试工具. 实验结果表明, 该测试工具能够以更少的测试案例, 发现随机式故障注入无法发现的复合故障组合所引起的系统漏洞(bug), 提高了数据库的可信度.

关键词: 分布式, 线性数据, 自动化, 故障注入

Abstract: Failures are unavoidable in distributed database systems. To improve the fault tolerance of distributed database systems and verify the accuracy of fault-tolerant protocols, the system should periodically run a fault injection test to artificially trigger a fault during system operations. However, the scale and complexity of distributed database systems make it difficult to fully enumerate inputs and make it impractical to explain all the behaviors that occur in the system. One of the test methods commonly used is a random combination of faults, which is simple but not complete; the other one is guided by professional knowledge and is not universally applicable. Accordingly, we adopted and revised a research prototype, called the lineage-driven fault injection (LDFI) test, that is both complete and universally applicable. We implemented the automation fault injection tool in Cedar. Experiments showed that lineage-driven fault injection tests can successfully detect system bugs caused by complex fault combinations and improve the credibility of the database; these bugs cannot be detected by random fault injection with fewer test cases.

Key words: distributed, data lineage, automation, fault injection

中图分类号: