Journal of East China Normal University(Natural Science)

On the distributed consensus protocol in high-availability database systems

CHU Jia-jia, GUO Jin-wei, LIU Bo-zhong, ZHANG Chen-dong, QIAN Wei-ning

2016, 2016 (5): 1-9. doi: 10.3969/j.issn.1000-5641.2016.05.001

Abstract ( 929 )

HTML ( 18 )

PDF (476KB) ( 819 )

Save

Availability and consistency are the two important characteristics of the distributed database systems, which need to be guaranteed by the distributed consensus
protocol. Ensuring consistency requires a consensus protocol to determine a global execution sequence for concurrent transaction updates, and to coordinate the consistency between local states and the global state continuously. For the implementation of the availability, we need to coordinate the consistency between the multiple backups to achieve the seamless switch between the main and backup nodes. Visible, distributed consistency protocol is the basis for the realization of high availability database system. Distributed consensus protocol is the base of high availability database systems. This paper summarizes
the classical consistency protocols and the main applications in high-availability systems of the distributed consistency protocols. Meanwhile, the implementation costs and limitations of the consistency protocols are analyzed and evaluated.

References | Related Articles | Metrics

Compilation techniques for high throughput transaction processing

WANG Dong-hui, ZHU Tao, QIAN Wei-ning

2016, 2016 (5): 10-17. doi: 10.3969/j.issn.1000-5641.2016.05.002

Abstract ( 829 )

HTML ( 77 )

PDF (391KB) ( 808 )

Save

Because of the problem of low utilization of CPU in memory database, the present research work is focused on improving the execution efficiency and concurrency control by transaction compilation technology to improve the performance of database. This article mainly introduces the following aspects of the memory database transaction compilation technology. First, this paper introduces the general process of transaction processing and analyzes the factors that limit the performance of the system. Second, we analyze the compilation techniques, including Just-in-time Compilation, Dependence Theory and Transaction Chopping. Third, we analyze the database and show how to improve the performance combined with the introduction of typical memory database, such as VoltDB, Hekaton and so on. Finally, the research prospects are given.

References | Related Articles | Metrics

Key techniques and challenges of transaction commit in main-memory database systems

HU Shuang, ZHOU Huan, QIAN Wei-ning

2016, 2016 (5): 18-26. doi: 10.3969/j.issn.1000-5641.2016.05.003

Abstract ( 661 )

HTML ( 9 )

PDF (434KB) ( 765 )

Save

Since its debut in the 1990s, ARIES, as the traditional transaction commit, has been widely adopted by the mainstream commercial or open source database systems. With the performance of applications increasingly improved, transaction commit based on the traditional hardwares has become the first bottleneck for performance of systems. However, the developments of high-performance hardwares, such as memory and multiple CPU cores, offer a new opportunity for the optimization of transaction commit. In this paper, we analyze and summarize the existing bottlenecks of traditional transaction commit in detail. Furthermore, key techniques of transaction commit based on new hardwares are discussed and concluded, including their application status, advantages and disadvantages. Finally, challenges and future developments for optimization of transaction commit are discussed.

References | Related Articles | Metrics

Fault-tolerance in distributed in-memory database systems

ZHAO Zhen-hui, HUANG Cheng-shen, ZHOU Min-qi, ZHOU Ao-ying

2016, 2016 (5): 27-35. doi: 10.3969/j.issn.1000-5641.2016.05.004

Abstract ( 866 )

HTML ( 11 )

PDF (767KB) ( 692 )

Save

In the big data era, distributed system has been widely deployed and applied in various fields. Nevertheless, the more nodes involved, the higher probability of system failures may occur. It is important to introduce fault-tolerance mechanism for distributed systems to achieve even higher performance, higher reliability and higher availability. CLAIMS system is an in-memory database system for real-time data analysis, which is mainly used for financial applications. It provides near real time query task and analytic task. This paper mainly discuss fault-tolerance mechanism in CLAIMS. Achieve lease-based quick system failure detection (Fail-fast). Achieve restart of affected analytic task after detecting failure (Fail-over). Achieve in-memory state recovery of abnormal node. Experiment indicate that the algorithm presented in this paper can achieve fault-tolerance in CLAIMS.

References | Related Articles | Metrics

Distributed secondary index based on LSM Tree

LONG Fei, WENG Hai-xing, GAO Ming, ZHANG Zhao

2016, 2016 (5): 36-44. doi: 10.3969/j.issn.1000-5641.2016.05.005

Abstract ( 1239 )

HTML ( 29 )

PDF (746KB) ( 1360 )

Save

In recent years, Log-Structured-Merge Tree has been widely used in NoSQL systems. This is mainly because it has proposed two algorithms: update delayed and batch write, convert random write to batch write, reducing the cost of moving the disk arm therefore the write performance of database has been enhanced greatly. However, the read performance of database has also been affected negatively. The essential difference between LSM Tree and B Tree makes NoSQL not suitable for using B Tree as index structure directly. This paper implements a distributed secondary index based on LSM Tree, and proposes a bulk loading method in this read and write separation architecture. We also do lots of works on the optimization of index query plan to avoid repeatly query parsing IO so that the performance of index read has been greatly improved.

References | Related Articles | Metrics

Data correlation-based partition approach for distributed deputy database

WANG Min, PENG Cheng-chen, LI Rong-rong, PENG Yu-wei

2016, 2016 (5): 45-55. doi: 10.3969/j.issn.1000-5641.2016.05.006

Abstract ( 751 )

HTML ( 11 )

PDF (1068KB) ( 680 )

Save

Object deputy database (ODDB) is an advanced database system with strong ability of complex information processing. With the rapid development of data, distributed storage becomes more and more important to ODDB. However, there exist correlations between objects in ODDB, which makes the traditional data portioning method of distributed storage unsuitable. To solve this problem, we propose a data correlation-based partition approach for ODDB. Firstly, we cluster correlated objects according to the deputy tree, and each object cluster is considered as a heap file in storage. Secondly, on the basis of schema feature and semantic feature, we divide object clusters into k subsets using k-means, each subset is stored on one of the storage nodes. Finally, we compare our method with random distributed storage, the results show that our approach is obviously better in query efficiency.

References | Related Articles | Metrics

Optimization strategies of correlated subquery for distributed database

MAO Si-yu, ZHANG Li-jun, ZHANG Xiao-fang, GAO Jin-tao, LI Zhan-huai

2016, 2016 (5): 56-66. doi: 10.3969/j.issn.1000-5641.2016.05.007

Abstract ( 825 )

HTML ( 11 )

PDF (698KB) ( 737 )

Save

A query which occurs in another query as a filter is called subquery, and if the filtering condition of a subquery depends on its parent query, it is called correlated
subquery. Generally, the execution cost of query with correlated subquery is high due to that subquery would be executed multiply, which leads to multiple disk access and extra communications in distributed system. Based on the investigation of the classical optimization strategies of correlated subquery, and according to the characteristics of distributed system, we adopt pulling up subquery, removing useless tree and eliminating aggregation function to optimize correlated subquery in distributed database system. And we implement these strategies in the distributed relational database OceanBase for the correlated subquery predicate EXIST. Experiment results show that these strategies can significantly improve the performance of a correlated subquery.

References | Related Articles | Metrics

A join algorithm based on bloom filter in OceanBase

MAO Xiao-xiao, DUAN Hui-chao, GAO Ming

2016, 2016 (5): 67-74. doi: 10.3969/j.issn.1000-5641.2016.05.008

Abstract ( 1025 )

HTML ( 14 )

PDF (459KB) ( 887 )

Save

In the era of big data, the movement of “de-IOE” campaign and the development of activities such as Double 11 have put forward higher request of the performance of distributed database. OceanBase is an open sourced distributed database implemented by Alibaba. It supports for cross-table relational query of massive data but the performance for complex queries remains to be improved. The network transmission overheads caused by join operator seriously influenced the performance of distributed database. This paper proposes a join algorithm based on bloom filter. It filters the data of the right table
by constructing a bloom filter on the join column of the left table. The key point of this algorithm is that it reduces the overhead of unnecessary data transmission and the consumption of memory resources by data processing. We implement this algorithm in OceanBase and the experiment results show that the algorithm can greatly improve the efficiency of join operator.

References | Related Articles | Metrics

Implementation of Semi-Join algorithm in a distributed system

QIAN Zhao-ming, WANG Lei, YU Sheng-jun, GONG Xue-qing

2016, 2016 (5): 75-80. doi: 10.3969/j.issn.1000-5641.2016.05.009

Abstract ( 856 )

HTML ( 17 )

PDF (945KB) ( 1173 )

Save

As the scope of application of the new distributed system is becoming wider, the application is no longer satisfied with using primary key access to read the data, and how to efficiently achieve such complex operations as Join in these systems has become a research hot topic. This paper introduces how to realize the Join operation in the distributed systems based on the Semi-Join algorithm, and puts forward two ways to get the data in right table, and the performance of the algorithm is also analyzed through experiments.

References | Related Articles | Metrics

Distributed and scalable stream join algorithm

WANG Xiao-tong, FANG Jun-hua, ZHANG Rong

2016, 2016 (5): 81-88. doi: 10.3969/j.issn.1000-5641.2016.05.010

Abstract ( 798 )

HTML ( 12 )

PDF (964KB) ( 769 )

Save

Join-Matrix is a high-performance model for stream join processing in a parallel shared-nothing environment, which supports arbitrary join operations and is
resilient to data skew for taking random tuple distribution as its routing policy. To evenly distribute workload and minimize network communication cost, designing an efficient partitioning policy on the matrix is particularly essential. In this paper, we propose a novel stream join operator that continuously adjust its partitioning scheme to real-time data dynamics. Specifically, based on the sample statistics of streams and rated load of each physical machine, a lightweight scheme generator produces a partitioning scheme; then the corresponding solutions for state relocation are generated by a migration plan generator to minimize migration cost while ensuring result correctness.Our experiments on different kinds of data sets demonstrate that our operator outperforms the static-of-the-art strategies in resource utilization, throughput and system latency.

References | Related Articles | Metrics

Research on OLAP query processing technology for asymmetric in-memory computing platform

ZHANG Yan-song, ZHANG Yu, ZHOU Xuan, WANG Shan

2016, 2016 (5): 89-102. doi: 10.3969/j.issn.1000-5641.2016.05.011

Abstract ( 804 )

HTML ( 9 )

PDF (2725KB) ( 893 )

Save

This paper proposes an OLAP query processing technology for nowadays and future asymmetric in-memory computing platform. Asymmetric in-memory computing platform means that computer equips with different computing feature processors and different memory access devices so that the OLAP processing model needs to be optimized for different computing features and implementation designs to enable the different processing stages to adapt to the characteristics of corresponding storage and computing hardware for higher hardware utilization and performance. This paper proposes the 3-stage OLAP
computing model, which divides the traditional iterative processing model into computing intensive and data intensive workloads to be assigned to general purpose processor with full fledged functions and coprocessor with powerful parallel processing capacity. The data transmission overhead between different storage and computing devices is also minimized. The experimental results show that the 3-stage OLAP computing model based on workload partitioning can be adaptive to CPU-Phi asymmetric computing platform, the acceleration on OLAP query processing can be achieved by accelerating computing intensive workload by computing intensive hardware.

References | Related Articles | Metrics

A RDBMS-based graph computing platform

JIANG Kui, CHEN Liang

2016, 2016 (5): 103-111. doi: 10.3969/j.issn.1000-5641.2016.05.012

Abstract ( 856 )

HTML ( 10 )

PDF (981KB) ( 704 )

Save

This paper proposes a new RDBMS-based (relational database management system) graph computing platform. In this platform, graph data is represented in native data structures, achieving the same representation as in native graph computing systems. On top of this native representation, graph algorithms are expressed as SQL (Structured Query Language) statements, which are executed by the underlying relational database systems. Experimental results show that this new graph computing platform leverages mature SQL technologies on query optimization and execution, thereby providing superior performance in many aspects. Its current performance limitations, on the other hand, will be overcome by future evolution and optimization of relational database systems.

References | Related Articles | Metrics

GraphHP: A hybrid platform for iterative graph processing

SU Jing, SUO Bo, CHEN Qun, PAN Wei, LI Zhan-huai

2016, 2016 (5): 112-120. doi: 10.3969/j.issn.1000-5641.2016.05.013

Abstract ( 795 )

HTML ( 10 )

PDF (958KB) ( 902 )

Save

BSP (Bulk Synchronous Parallel) computing model is an important foundation for the establishment of a large-scale iterative graph processing distributed system. Existing platforms (e.g., Pregel, Giraph, and Hama) have achieved a high scalability, but the high frequency synchronization and communication load between the hosts have seriously affected the efficiency of parallel computing. In order to solve this key problem, this paper proposes a hybrid model based on GraphHP (Graph Hybrid Processing). It not only inherits the BSP programming interface with the vertex as the center, but also can significantly reduce the synchronization and communication load. By establishing the hybrid execution model between the interior and the interval partition of the graph, the GraphHP realizes the pseudo super step iteration calculation, and separates the internal computation from the distributed synchronization and communication. This hybrid execution model does not need heavy scheduling algorithm or the serial algorithm can effectively reduce the synchronization and communication load. Finally, this paper evaluates the implementation of the classic BSP application in the GraphHP platform, and the experiment shows that it is more efficient than the existing BSP platform. Although the GraphHP platform proposed in this paper is based on Hama, it is easy to migrate to other BSP platforms.

References | Related Articles | Metrics

Sorting algorithm analysis of distributed data based on Map/Reduce

YU Sheng-jun, GONG Xue-qing, ZHU jun, QIAN Wei-ning

2016, 2016 (5): 121-130. doi: 10.3969/j.issn.1000-5641.2016.05.014

Abstract ( 883 )

HTML ( 8 )

PDF (1549KB) ( 944 )

Save

Distributed system has been widely applied in recent years to tackle the storage and calculation of big data. Sorting of large-scale dataset in the distributed system has become the fundamental problem to affect a varieties of application performances which is not only concerning about the selection of sorting algorithm at each node, but also about the development of distributed algorithms to coordinate at each node. This paper summarizes the common distributed sorting algorithms which are applied in the distributed system. Analysis has been conducted to the implementation process, cost model and applicable field of each algorithm. And the analysis results have been verified by experiments. This work can help developers choose and optimize the big data sorting
algorithm in distributed environments.

References | Related Articles | Metrics

Research and implementation of transactional real-time data ingestion technology without blocking

YU Kai, LI Zhi-fang, ZHOU Min-qi, ZHOU Ao-ying

2016, 2016 (5): 131-143. doi: 10.3969/j.issn.1000-5641.2016.05.015

Abstract ( 846 )

HTML ( 9 )

PDF (846KB) ( 814 )

Save

With the advent of big data era, traditional database systems are facing difficulties in satisfying the new challenges brought by massive data processing, while distributed database systems have been deployed widely in real applications. Distributed database systems partitioned and the dispatched the data across machines under a designed scheme and analyzed all the massive data in massive parallel manner. In facing of the requirements of the transactional real-time data ingestion from financial field, distributed database systems are ineffective and inefficient due to their implementation of the distributed transaction processing based on the lock and two-phase commit, which lead to the impossibility of non-blocking data ingestion. CLAIMS is a distributed in-memory database system designed and implemented by Institute for Data Science and Engineering of ECNU. It supports real-time data analysis towards relational data set but is incapable of real-time data ingestion. To address these problems, we analyzed data ingestion technology and distributed transaction processing algorithms first, and proposed to mimic the transactional data ingestion in the distributed environment with the centralized transaction processing based on meta data, and eventually achieved the real-time data ingestion with high availability and without blocking. The experiment results with the implementation of the proposed algorithms in CLAIMS proved that the proposed framework could achieve high throughput transactional real-time data ingestion as well as low latency real-time
query processing.

References | Related Articles | Metrics

Designs and implementations of stored procedure in OceanBase

ZHU Jun, LIU Bai-zhong, YU Sheng-jun, GONG Xue-qing, ZHOU Min-qi

2016, 2016 (5): 144-152. doi: 10.3969/j.issn.1000-5641.2016.05.016

Abstract ( 1069 )

HTML ( 24 )

PDF (763KB) ( 799 )

Save

As an extension of standard SQL(Structured Query Language), the stored procedure is an important feature in modern databases. OceanBase is a new type of
distributed database system which supports massive data processing, but the open-sourced version OceanBase does not support stored procedure, which influences its adoption in enterprises. In this paper, we analyze the principle of stored procedure and the query processing mechinism of OceanBase in detail. Then, the complete design and implementation of stored procedure which supports PL/SQL are presented.

References | Related Articles | Metrics

DBugHelper: A Debug assistant tool for distributed systems

ZHANG Yan-fei, ZHANG Chun-xi, LI Yu-ming, ZHANG Rong

2016, 2016 (5): 153-164. doi: 10.3969/j.issn.1000-5641.2016.05.017

Abstract ( 744 )

HTML ( 6 )

PDF (634KB) ( 680 )

Save

Development of large-scale distributed systems has experienced a long developing period. During the whole development cycle, debug is one of the most important steps. We meet the challenges of finding all the bugs and the corresponding solutions fixing bugs in a short time. Bug reports record bug histories and solutions, which provide a way to understand bug features and help to find solutions for new bugs. After we analyze the bug reports and fixed solutions, we find that there are strong correlation and similarity among many large-scale distributed systems. Thus the developing and fixing scheme of bugs may have similar characteristics. Then existed fixing solutions of bugs can be used to assist fixing new bugs. In this paper, we propose DBugHelper, a debug helping tool which
can be applied to boost the development of large-scale distributed systems and provide a more effective way to fix bugs. In DBugHelper, the existed bug reports are processed offline, and the latest bug report is represented as a query vector. We query the bug report history database and find the similar bugs with their solutions. In such way, we suppose to shorten the whole system development period.

References | Related Articles | Metrics

Network request service mechanisms in a scalable database management system

XIAO Bing, GUO Jin-wei, QIAN Wei-ning

2016, 2016 (5): 165-172. doi: 10.3969/j.issn.1000-5641.2016.05.018

Abstract ( 726 )

HTML ( 7 )

PDF (582KB) ( 659 )

Save

Network request service mechanism involves the transmitting and processing of the requests. It plays an important role for interactive components to cooperate in a distributed database management system. Taking scalable database management systems as background, this paper introduces the inside network service model and the network request service mechanisms. On the basis of the primary implementation in database systems, we analyze different demands from different requests in terms of transmitting and processing. The service distribution of each mechanism and associative analysis are presented based on OceanBase.

References | Related Articles | Metrics

Table of Content