Journal of East China Normal University(Natural Science) >
Research and implementation of transactional real-time data ingestion technology without blocking
Received date: 2016-06-27
Online published: 2016-11-29
With the advent of big data era, traditional database systems are facing difficulties in satisfying the new challenges brought by massive data processing, while distributed database systems have been deployed widely in real applications. Distributed database systems partitioned and the dispatched the data across machines under a designed scheme and analyzed all the massive data in massive parallel manner. In facing of the requirements of the transactional real-time data ingestion from financial field, distributed database systems are ineffective and inefficient due to their implementation of the distributed transaction processing based on the lock and two-phase commit, which lead to the impossibility of non-blocking data ingestion. CLAIMS is a distributed in-memory database system designed and implemented by Institute for Data Science and Engineering of ECNU. It supports real-time data analysis towards relational data set but is incapable of real-time data ingestion. To address these problems, we analyzed data ingestion technology and distributed transaction processing algorithms first, and proposed to mimic the transactional data ingestion in the distributed environment with the centralized transaction processing based on meta data, and eventually achieved the real-time data ingestion with high availability and without blocking. The experiment results with the implementation of the proposed algorithms in CLAIMS proved that the proposed framework could achieve high throughput transactional real-time data ingestion as well as low latency real-time
query processing.
YU Kai , LI Zhi-fang , ZHOU Min-qi , ZHOU Ao-ying . Research and implementation of transactional real-time data ingestion technology without blocking[J]. Journal of East China Normal University(Natural Science), 2016 , 2016(5) : 131 -143 . DOI: 10.3969/j.issn.1000-5641.2016.05.015
[ 1 ] DEAN J, GHEMAWAT S. MapReduce: Simplified data processing on large clusters [J]. Communications of the ACM, 2008, 51(1): 107-113.
[ 2 ] ZAHARIA M, CHOWDHURY M, FRANKLIN M J, et al. Spark: Cluster computing with working sets [C]//Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing. Berkeley: USENIX Association, 2010: 10.
[ 3 ] SHVACHKO K, KUANG H, RADIA S, et al. The hadoop distributed file system [C]//Proceedings of IEEE Conference on MSST. 2010: 1-10.
[ 4 ] 胡健, 和轶东. SAP内存计算------HANA [M]. 北京: 清华大学出版社, 2013.
[ 5 ] FARBER F, CHA S K, PRIMSCH J, et al. SAP HANA database: Data management for modern business applications [J]. ACM Sigmod Record, 2012, 40(4): 45-51.
[ 6 ] GLIGOR G, TEODORU S. Oracle exalytics: Engineered for speed-of-thought analytics [J]. Database Systems Journal, 2011, 2(4): 3-8.
[ 7 ] WANG L, ZHOU M Q, ZHANG Z J, et al. Elastic pipelining in in-memory DataBase cluster [R]. 2016.
[ 8 ] TRAVERSO M. Presto: Interacting with petabytes of data at Facebook [EB/OL].(2013-11-07)[2016-06-10]. http://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920.
[ 9 ] ARMBRUST M, XIN R S, LIAN C, et al. Spark SQL: Relational data processing in spark [C]//Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 2015: 1383-1394.
[10] YANG F, TSCHETTER E, LEAUTE X, et al. Druid: A real-time analytical data store [C]//Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 2014.
[11] GARCIA-MOLINA H, ULLMAN J D, WIDOM J. Database System Implementation [M]. Upper Saddle River, NJ: Prentice Hall, 2000.
[12] NAGA P N. Real-time Analytics at Massive Scale with Pinot [EB/OL]. [2016-06-10]. https://engineering.linkedin.com/analytics/real-time-analytics-massive-scale-pinot.
[13] KREPS J, NARKHEDE N, RAO J, et al. Kafka: A distributed messaging system for log processing [C]//Proceedings of the NetDB. 2011: 1-7.
[14] LAMB A, FULLER M, VARADARAJAN R, et al. The vertica analytic database: C-store 7 years later [C]//Proceedings of the VLDB Endowment. 2012: 1790-1801.
[15] CHANG L, WANG Z, MA T, et al. Hawq: A massively parallel processing sql engine in hadoop [C]//Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 2014.
[16] STONEBRAKER M, WEISBERG A. The VoltDB main memory DBMS [J]. IEEE Data Eng Bull, 2013: 21-27.
[17] BRYANT R E, O′HALLARON D R. 深入理解计算机系统[M], 北京:机械工业出版社,2013.
[18] ESWARAN K P, GRAY J N, LORIE R A, et al. The notions of consistency and predicate locks in a database system [J]. Communications of the ACM, 1976, 19(11): 624-633.
[19] STONEBRAKER M. One Size Fits None-(Everything You Learned in Your DBMS Class is Wrong) [R/OL]. (2013-05-30)[2016-07-01]. http://slideshot.epfl.ch/talks/166.
[20] WEIKUM G, VOSSEN G. Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery [M]. San Francisco: Morgan Kaufmann Publishers, 2002.
[21] DIACONU C, FREEDMAN C, ISMERT E, et al. Hekaton: SQL server’s memory-optimized OLTP engine [C]//Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. 2013.
[22] MICHAEL M M. High performance dynamic lock-free hash tables and list-based sets [C]//Proceedings of the 14th Annual ACM Symposium on Parallel Algorithms and Architectures. 2002: 73-82.
[23] LAMPSON BW, STURGIS H E. Crash Recovery in a Distributed Data Storage System [R]. Palo Alto, California: Xerox Palo Alto Research Center, 1979.
[24] SKEEN D. Nonblocking commit protocols [C]//Proceedings of the 1981 ACM SIGMOD International Conference on Management of Data. 1981.
[25] HAN J, HAIHONG E, LE G, et al. Survey on NoSQL database [C]//Proceedings of the 2011 6th International Conference on Pervasive Computing and Applications. 2011: 363-366.
[26] O’NEIL E J, O’NEIL P E, WEIKUM G. The LRU-K page replacement algorithm for database disk buffering [C]//Proceedings of the ACM SIGMOD International Conference on Management of Data. 1993: 297-306.
/
〈 |
|
〉 |