[1] BAILIS P, KINGSBURY K. The network is reliable [J]. Communications of the ACM, 2014, 57(9): 48-55. DOI: 10.1145/2643130. [2] DEAN J. Designs, lessons and advice from building large distributed systems [R/OL]. The 3rd ACM SIGOPS International Workshop on Large Scale Distributed Systems and Middleware.(2009-10-10)[2019-08-07]. http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf. [3] ALSBERG P A, DAY J D. A principle for resilient sharing of distributed resources [C]//Proceedings of the 2nd International Conference on Software Engineering. IEEE, 1976: 562-570. [4] STONEBRAKER M. Concurrency control and consistency of multiple copies of data in distributed INGRES [J]. IEEE Transactions on Software Engineering, 1979(3): 188-194. DOI: 10.1109/TSE.1979.234180. [5] GARCIA-MOLINA H. Elections in a distributed computing system [J]. IEEE Transactions on Computers, 1982(1): 48-59. [6] LEESATAPORNWONGSA T, LUKMAN J F, LU S, et al. TaxDC: A taxonomy of non-deterministic concurrency bugs in datacenter distributed systems [J]. ACM SIGPLAN Notices, 2016, 51(4): 517-530. DOI: 10.1145/2954679.2872374. [7] GAO Y, DOU W S, QIN F, et al. An empirical study on crash recovery bugs in large-scale distributed systems [C]//Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 2018: 539-550. [8] FONSECA P, ZHANG K, WANG X, et al. An empirical study on the correctness of formally verified distributed systems [C]//Proceedings of the 12th European Conference on Computer Systems. ACM, 2017: 328-343. [9] GANESAN A, ALAGAPPAN R, ARPACI-DUSSEAU A C, et al. Redundancy does not imply fault tolerance: Analysis of distributed storage reactions to single errors and corruptions [J]. ACM Transactions on Storage, 2017, 13(3): Article 20. DOI: 10.1145/3125497. [10] DAWSON S, JAHANIAN F, MITTON T. ORCHESTRA: A fault injection environment for distributed systems [C]// Proceedings of the 26th International Symposium on Fault Tolerant Computing. IEEE, 1996: 404-414. [11] GUNAWI H S, DO T, JOSHI P, et al. FATE and DESTINI: A framework for cloud recovery testing [C]// Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation. ACM, 2011: 238-252. [12] KANAWATI G A, KANAWATI N A, ABRAHAM J A. FERRARI: A flexible software-based fault and error injection system [J]. IEEE Transactions on Computers, 1995, 44(2): 248-260. DOI: 10.1109/12.364536. [13] LEESATAPORNWONGSA T, HAO M Z, JOSHI P, et al. SAMC: Semantic-aware model checking for fast discovery of deep bugs in cloud systems [C]// Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation. USENIX Association. 2014: 399-414. [14] BASIRI A, BEHNAM N, DE ROOIJ R, et al. Chaos engineering [J]. IEEE Software, 2016, 33(3): 35-41. DOI: 10.1109/MS.2016.60. [15] ALVARO P, ROSEN J, HELLERSTEIN J M. Lineage-driven fault injection [C]//Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 2015: 331-346. [16] ALVARO P, MARCZAK W R, CONWAY N, et al. Dedalus: Datalog in time and space [C]//International Datalog 2.0 Workshop Datalog 2.0 2010: Datalog Reloaded. Berlin: Springer, 2010: 262-281. [17] BUNEMAN P, KHANNA S, TAN W C. Why and where: A characterization of data provenance [C]//International Conference on Database Theory. Berlin: Springer, 2001: 316-330. [18] CUI Y, WIDOM J, WIENER J L. Tracing the lineage of view data in awarehousing environment [J]. ACM Transactions on Database Systems (TODS), 2000, 25(2): 179-227. DOI: 10.1145/357775.357777. [19] ALVARO P, CONWAY N, HELLERSTEIN J M, et al. Consistency analysis in bloom: A CALM and collected approach [C]// 5th Biennial Conference on Innovative Data Systems Research (CIDR ’11). 2011: 249-260. [20] LUKMAN J F, KE H, STUARDO C A, et al. FlyMC: Highly scalable testing of complex interleavings in distributed systems [C]//Proceedings of the 14th EuroSys Conference 2019. ACM, 2019: Article number 20. [21] MAJUMDAR R, NIKSIC F. Why is random testing effective for partition tolerance bugs? [J]. Proceedings of the ACM on Programming Languages, 2018, 2(POPL): Article number 46. DOI: 10.1145/3158134. [22] LAMPORT L. The part-time parliament [J]. ACM Transactions on Computer Systems (TOCS), 1998, 16(2): 133-169. DOI: 10.1145/279227.279229. [23] ONGARO D, OUSTERHOUT J. In search of an understandable consensus algorithm [C]// Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference. USENIX Association, 2014: 305-320. |