Journal of East China Normal University(Natural Science)

Fault tolerance in NoSQL systems: Principles and system analysis

KONG Chao, QIAN Wei-Ning, ZHOU Ao-Ying

2014, 2014 (5): 1-16. doi: 10.3969/j.issn.10005641.2014.05.001

Abstract ( 1797 )

PDF (2484KB) ( 1948 )

Save

NoSQL data management systems have been widely used in Web data management and processing applications, for their high scalabilities and fault tolerence. The fault tolerence is achieved by using new consistency models and data replications in clustered systems. In this paper, the mechanism and implementation details of five representative NoSQL systems, i.e. Bigtable, HBase, Dynamo, Cassandra, and PNUSTS, were discussed and analyzed, after a general introduction to the principles of consistency preserving and fault tolerent processing. Furthermore, the impact of these technologies on system availability, performance and workload balance, was analyzed. Finally, their influence on the design of inmemory database management systems was discussed.

Related Articles | Metrics

Survey of resource uniform management and scheduling in cluster

LI Yong-Feng, ZHOU Min-Qi, HU Hua-Liang

2014, 2014 (5): 17-30. doi: 10.3969/j.issn.10005641.2014.05.002

Abstract ( 2602 )

PDF (1403KB) ( 5978 )

Save

With the rapid development of Internet and the coming of big data, resource management system, a thin resource sharing layer that enables sharing cluster across diverse cluster computing frameworks, by giving frameworks a common interface for accessing cluster resources. For powering both large Internet services and a growing number of dataintensive scientific applications, cluster computing framework will continue emerge, and no framework will be optimal for all applications. Therefore, multiplexing a cluster between frameworks makes significant difference. Deploying and running multiple frameworks in the same cluster, improves utilization and allowing applications to share access to large datasets that may be costly to replicate across clusters. This paper is aimed to illustrate current major techniques of resource management and scheduling in cluster, including resource representation model, resource allocation model and scheduling policy. Finally, current prominent solutions, which have been developed and used by many companies, will be demonstrated, and we then summary and contrast these solutions used in recent years.

Related Articles | Metrics

Key techniques and challenges of designing new OLTP database systems

REN Kun, LI Zhan-Huai

2014, 2014 (5): 31-42. doi: 10.3969/j.issn.10005641.2014.05.003

Abstract ( 1836 )

PDF (588KB) ( 1779 )

Save

Traditional database systems were designed based on the hardware environment in 1970s. However, with the era of “Cloud Computing” and “Big Data”, On Line Transaction Processing requires database systems to provide more transaction throughput and better scalability. Meanwhile, the development of Computer hardware, in particular memory and multiple CPU Cores offer new opportunity for database systems evolution. Therefore, researching and designing new distributed database systems becomes more and more crucial. This paper researched on the key techniques and challenges of designing new OLTP Database systems.

Related Articles | Metrics

Development of parallel computing models in the big data era

PAN Wei, LI Zhan-Huai

2014, 2014 (5): 43-54. doi: 10.3969/j.issn.10005641.2014.05.004

Abstract ( 2522 )

PDF (459KB) ( 5977 )

Save

In the era of big data, the changing of the constraints gives the parallel computing opportunities and challenges for developing. This paper reviewed the new progress and changes of the parallel computing; combining with the effects of the hardware environments, computing pattern, application requirements on the parallel computing, the relevant research on batchoriented parallel computing model, streamingoriented parallel computing model, graphoriented parallel computing model and inmemory parallel computing model are summarized; Finally, the future development trends are evaluated.

Related Articles | Metrics

Research of large scale graph data management with in memory computing techniques

YUAN Pei-Sen, SHU Xin, SHA Chao-Feng, XU Huan-Liang

2014, 2014 (5): 55-71. doi: 10.3969/j.issn.10005641.2014.05.005

Abstract ( 2577 )

PDF (2216KB) ( 6072 )

Save

Graph is an important data model, which can describe structural information including dependent relationship, such as transportation network, social network and webpage hyperlink. Management of big graph brings challenges for traditional techniques, however, distributed cluster provide platform and techniques for this problem. Nowadays, the ratio of performance and price of memory promote rapidly, while demand of applications of highperformance, inmemory computing for massive data management is becoming popular. The storage and evaluation of massive graph requires highperformance platform. In this context, it’s significant for studying graph data management with inmemory techniques. This paper surveyes key techniques of management of massive graph data, and researched graph data management of inmemory computing techniques，and finally summarizes the paper.

Related Articles | Metrics

Survey of architecture and system level optimization for non volatile main memory

SUN Guang-Yu, SHU Ji-Wu, WANG Peng

2014, 2014 (5): 72-81. doi: 10.3969/j.issn.10005641.2014.05.006

Abstract ( 1557 )

PDF (1220KB) ( 2293 )

Save

With the rapid improvement of modern computing applications, there is an increasing requirement of capacity, performance, and power consumption of memory system for both “computingintensive” and “dataintensive” applications. However, main memory based on traditional DRAM technology cannot fully satisfy the requirement because the improvement of DRAM technology is slower than that of CMOS technology. Moreover, it becomes even worse since the performance gap between HDD based storage and DRAM based main memory keeps increasing at the same time. Recently, the substantial progress of various nonvolatile memory technologies has provided an opportunity to mitigate this problem. This paper presents a survey of recent architecture and system level research work on nonvolatile main memory. It shows that different types of nonvolatile main memory can help improve performance and reduce power consumption of memory system significantly.

Related Articles | Metrics

Survey of mainmemory database availability

JIANG Ze-Yuan, LIU Hui-Lin, WU Gang, WANG Guo-Ren

2014, 2014 (5): 82-88. doi: 10.3969/j.issn.10005641.2014.05.007

Abstract ( 1669 )

PDF (712KB) ( 3707 )

Save

With the development of hardware technology, the cost of main memory is decreasing, which makes it possible to let DBMS (Database Management System) put the whole data into main memory. Compared to traditional DRDB (DiskResident Database), MMDB (MainMemory Database) provides much faster of data storage, higher throughput of applications, stronger ability on concurrent access, and meets the demand of timely response. However, due to its volatility, MMDB has differences on system availability with DRDB. The survey focuses on main strategies of improving the availability of MMDBs, including fast recovery, redundant backup and fault tolerance mechanism.

Related Articles | Metrics

Distributed computing system for communication data management

CHAO Ping-Fu, ZHENG Zhi-Ling, FANG Jun-Hua, ZHANG Rong

2014, 2014 (5): 89-102. doi: 10.3969/j.issn.10005641.2014.05.008

Abstract ( 1670 )

PDF (2911KB) ( 1938 )

Save

In this article, a communication data management platform based on an open sourced resource on clustered system which has the requirement for inmemory data computing is introduced, in order to support real time processing as well as online queries under the massive data volume efficiently. In particular, we firstly give a brief analysis on popular distributed and in memory techniques, provide candidate techniques and test for choosing the approprate ones for our task. Then, we design and implement an online communication data processing and query platform. Finally, we use inmemory techniques to optimize our platform performance. The experimental results indicate that the inmemory distributed computing system not only outperforms the disk based system in both query response time and real time processing speed, but also improves on resource utilization and data throughput.

Related Articles | Metrics

Consistency and availability in OceanBase

ZHOU Huan, FAN Qiu-Shi, HU Hua-Liang

2014, 2014 (5): 103-116. doi: 10.3969/j.issn.10005641.2014.05.009

Abstract ( 2677 )

PDF (3235KB) ( 2221 )

Save

OceanBase as a distributed database, not only supports for cross table relational query and interbank transactions but also ensures consistency and availability. Based on the study of traditional database architectures and distributed database architectures, this article analyzed the architecture of consistency and availability in OceanBase, and finally discussed the implementations of the three architectures evolved from OceanBase.

Related Articles | Metrics

Research on in memory data warehouse cluster technologies

ZHANG Yan-Song, WANG Shan, ZHOU Hui

2014, 2014 (5): 117-132. doi: 10.3969/j.issn.10005641.2014.05.010

Abstract ( 1391 )

PDF (4711KB) ( 2493 )

Save

With the development of hardware integration techniques, multicore processor and big memory come to be main stream configuration and inmemory computing comes to be the emerging high performance data analytical platform. In memory data warehouse cluster technologies target high performance analytical computing, and it will be the basic platform for big data real time analytical processing. This paper briefly introduces the research work on inmemory data warehouse cluster of Renmin University high performance database research group, including the developments of column distribution and column computing service oriented ScaMMDB cluster, horizontal partition and parallel computing oriented ScaMMDBII, and reverse star schema distribution and cluster vector computing oriented MiNTOLAPCluster technologies. The critical issues and technical challenges are also presented in this paper. Finally, we give a prospective discussion on future technologies for the coming in memory data warehouse cluster requirements.

Related Articles | Metrics

Simulator for hybrid memory architecture

LIU Dong, ZHANG Jin-Bao, LIAO Xiao-Fei, JIN Hai

2014, 2014 (5): 133-140. doi: 10.3969/j.issn.10005641.2014.05.011

Abstract ( 3029 )

PDF (1508KB) ( 5685 )

Save

This paper proposed a method for building a simulator for hybrid memory architecture based on gem5. When building, this method first added a hybrid memory controller between the memory bus and the memory model, then introduced the nonvolatile memory model of NVMain and hooked it up to the the newly added hybrid memory controller along with the native DRAM model of gem5. This method could achieve the goal of building a simulator for hybrid memory architecture, which was proved by the experiment results.

Related Articles | Metrics

The architecture of OceanBase relational database system

YANG Zhen-Kun

2014, 2014 (5): 141-148. doi: 10.3969/j.issn.10005641.2014.05.012

Abstract ( 4202 )

PDF (1727KB) ( 2636 )

Save

Traditional RDBMS is essentially a singlemachine system and usually employs highend server and highreliable storage device due to performance and reliability issues which make it incompetent to serve today’s Internet applications with high scalability, high performance, high available and low cost. OceanBase is an opensource, sharednothing, distributed relational database system, which is developed from scratch at Alibaba. Built on top of computer clusters consisting of inexpensive commodity PC servers, OceanBase meets the requirement of modern internet applications quite well. OceanBase has been widely used in the production systems of Taobao, Tmall and Alipay for years.

Related Articles | Metrics

Memory transaction engine of OceanBase

LI Kai, HAN Fu-Sheng

2014, 2014 (5): 147-163. doi: 10.3969/j.issn.10005641.2014.05.013

Abstract ( 3489 )

PDF (2582KB) ( 4144 )

Save

OceanBase is a distributed scalable relational database.Its storage architecture is designed by separating baseline static data and increment dynamic data, whose memory transaction engine, namely Memtable, provide dynamic data storage, write, and query, clients wrote data to the inmemory data structure. Memtable and some transaction management structures together form the inmemory database engine, which can achieve the transaction ACID properties. Bybased multiversion concurrency control techniques to prevent reading and writing from blocking each other to achieve readonly transactions to meet the“snapshot isolation”level; Provide multiwrite concurrency control by using classic rowlock technology to meet the“read committed”transaction isolation level.

Related Articles | Metrics

Scalable distributed storage of OceanBase

HUANG Gui, ZHUANG Ming-Qiang

2014, 2014 (5): 164-172. doi: 10.3969/j.issn.10005641.2014.05.014

Abstract ( 2504 )

PDF (1629KB) ( 2843 )

Save

OceanBase is a distributed relational database, its purpose is to store vast amounts of structured data in highgrowth, lowcost servers to achieve high availability, high scalability and costeffective services. OceanBase using memory and external store hybrid storage mode, stores the incremental (update) data in memory, and the baseline (readonly) data in external storage (usually disk), baseline data is divided into slices we called tablet roughly the same amount of data and the use of distributed B+ tree stored on many data servers, using the daily merge mechanism to keep the combined incremental data into baseline.This article describes the basic structure and distribution methods of OceanBase baseline data storage, as well as the daily merge mechanism, in addition, we will introduce in OceanBase baseline data storage format of the specific design and implementation.

Related Articles | Metrics

High availability solution of OceanBase

YANG Chuan-Hui

2014, 2014 (5): 173-179. doi: 10.3969/j.issn.10005641.2014.05.015

Abstract ( 3239 )

PDF (1451KB) ( 3516 )

Save

Shared storage or masterslave replication is used in traditional RDBMS to achieve high availability. The first solution relies on high available hardware, and so are of high cost, while the second solution cannot meet the requirements of strong consistency and high availability concurrently. OceanBase combines cloud computing and database technology. Its high availability solution is based on Paxos protocol. This solution is built on top of commodity machine. It meets requirements of both strong consistency and high availability with low cost.

Related Articles | Metrics

Join algorithms towards inmemory computing

ZHANG Lei, FANG Zhu-He, ZHOU Min-Qi, HUANG Lan

2014, 2014 (5): 180-191. doi: 10.3969/j.issn.10005641.2014.05.016

Abstract ( 1727 )

PDF (1178KB) ( 2404 )

Save

The development of memory and CPU technology marks that mainmemory computing age is coming. This paper systematically reviewed memory computing based join algorithms and made detailed analysis on the advantages and disadvantages of existing join algorithms in two dimensions, prospecting for future research directions. Finally, some research work about mainmemory computing based join algorithms on our Claims prototype system was introduced.

Related Articles | Metrics

In-memory index: Performance enhancement techniques leveraging on processors

DONG Shao-Chan, ZHOU Min-Qi, ZHANG Rong, ZHOU Ao-Ying

2014, 2014 (5): 192-206. doi: 10.3969/j.issn.10005641.2014.05.017

Abstract ( 1620 )

PDF (1068KB) ( 2783 )

Save

As main memory capacities grows larger and larger, the memory era has arrived and inmemory databases have taken the place of traditional diskbased databases to provide efficient data management. In this paper, we analyzed the fundamental elements in inmemory index designing: summarized and evaluated the existing index structures, pointing out the future opportunities and challenges based on the development trend of current applications. Finally, we introduced our ongoing distributed inmemory index studies on the Cluster Aware InMemory System (CLAIMS).

Related Articles | Metrics

Fault tolerance recovery techniques in large distributed parallel computing system

ZHANG Xin-Zhou, ZHOU Min-Qi

2014, 2014 (5): 207-215. doi: 10.3969/j.issn.10005641.2014.05.018

Abstract ( 1571 )

PDF (397KB) ( 2713 )

Save

Supercomputing systems today often come in the form of large numbers of commodity systems linked together into a computing cluster. These systems, like any distributed system, can have large numbers of independent hardware components cooperating or collaborating on a computation. Unfortunately,any of this vast number of components can fail at any time, resulting in potentially erroneous output. In order to improve the robustness of supercomputing applications in the presence of failures,many techniques have been developed to provide resilience to these kinds of system faults. This survey provides an overview of these various fault tolerance techniques.

Related Articles | Metrics

In-memory cluster computing: Interactive data analysis

HUANG Lan, SUN Ke, CHEN Xiao-Zhu, ZHOU Min-Qi

2014, 2014 (5): 216-227. doi: 10.3969/j.issn.10005641.2014.05.019

Abstract ( 1818 )

PDF (2056KB) ( 2512 )

Save

This paper discussed the management and analysis over data for decision support, which is defined as one of the three categories of big data. In this big data era, business intelligence creates tremendous large market values, while the enhancement in the computer hardware further stimulate the emergence of new data analysis applications, which require interactive data analysis. Based on the detailed analysis of the typical applications, we find that the inmemory cluster computing system will be the future trends for interactive data analysis. In the environment of inmemory cluster computing systems, the network communication has become the main bottleneck when comparing to memory data access and disk I/Os. Hence, the further research topics within the inmemory cluster computing aspects, including the system topology of the distributed sharednothing inmemory computing systems when considering the characteristics of memory (e.g., volatility, memory wall) as well as communication bottleneck, the data placement and index strategies for isomerism, multilevel cache, the parallel computing framework of multi-granularity over multi-core, multi-processor and multicomputer, the data consistency of the distributed data management, data compression and process mechanism over the column wise data storage.

Related Articles | Metrics

LCDJ: Locality conscious join algorithm for in-memory cluster computing

ZHANG Lei, ZHOU Min-Qi, WANG Li

2014, 2014 (5): 228-239. doi: 10.3969/j.issn.10005641.2014.05.020

Abstract ( 1674 )

PDF (1665KB) ( 1721 )

Save

Equal join is one of the most important operators in database systems. Hash join is an efficient algorithm for equal join. In distributed inmemory database system, data tables are partitioned across multiple nodes. Hash join needs two input tables to be repartitioned on the joined attributes under the same hash function before local join, to make sure that tuples from the two tables with the same join values are transferred to the same node. Since the speed of data processing in memory is much faster than the speed of network, data repartition occupies a large amount of time and becomes the bottleneck of equal join in distributed in-memory database. This paper proposes a novel equal join algorithm, which takes full advantages of in-memory computing and reduces the volume of data to be transferred. The algorithm first accumulates accurate statistics on the joined attributes of two tables, and then builds a cost model to measure the cost of different schedule strategies, and generates the optimized schedule strategy. Furthermore, the degree of parallelism and computing load balance are taken into consideration in our cost model. The proposed algorithm is implemented on our in-memory distributed prototype system Claims. Extensive experiment confirms that the algorithm effectively reduces the cost of network communication, improves the query response time by a huge margin, and gets a higher performance than Hive and Shark.

Related Articles | Metrics

Co-OLAP: Research on cooperated OLAP with star schema benchmark on hybrid CPU&GPU platform

ZHANG Yu, ZHANG Yan-Song, ZHANG Bing, CHEN Hong, WANG Shan

2014, 2014 (5): 240-251. doi: 10.3969/j.issn.10005641.2014.05.021

Abstract ( 1637 )

PDF (2631KB) ( 1436 )

Save

Nowadays GPUs have powerful parallel computing capability even for moderate GPUs on moderate servers. Opposite to the recent research efforts, a moderate server may be equipped with several high level CPUs and a moderate GPU, which can provide additional computing power instead of more powerful CPU computing. In this paper, we focus on Co-OLAP(Cooperated OLAP) processing on a moderate workstation to illustrate how to make a moderate GPU cooperate with powerful CPUs and how to distribute data and computation between the balanced computing platforms to create a simple and efficient Co-OLAP model. According to real world configuration, we propose a maximal high performance data distribution model based on RAM size, GPU device memory size, dataset schema and special designed AIR(array index referencing) algorithm. The Co-OLAP model distributes dataset into host and device memory resident datasets, the OLAP is also divided into CPU and GPU adaptive computing to minimize data movement between CPU and GPU memories. The experimental results show that two Xeon six-core CPUs slightly outperform one NVIDA Quadra 5 000 GPU with 352 cuda cores with SF=20 SSB dataset, the Co-OLAP model can assign balanced workload and make each platform simple and efficient.

Related Articles | Metrics

Batch processing for main memory data management

ZHOU Hui, XUE Zhong-Bin

2014, 2014 (5): 252-262. doi: 10.3969/j.issn.10005641.2014.05.022

Abstract ( 1719 )

PDF (1591KB) ( 1736 )

Save

Main memory based data processing is substantially faster than disk based data processing. When developing a traditional Disk Resident DBMS, various optimization techniques are required to ensure that query response time meet the requirements of general applications. This is less necessary for a Main Memory DBMS, whose response time usually goes far beyond the requirements of most applications, due to the superior speed of main memory. As a result, throughput becomes a more important concern for system design. The central idea of Batch Processing is to achieve improved throughput at by trading off response time. Therefore, we believe that batch processing will play an important role in main memory centered data processing. This paper attempts to provide some insight on how to apply the idea of batch processing to speedup Main Memory DBMS. A case study on inmemory moving object manage is used to demonstrate the effectiveness of batch processing.

Related Articles | Metrics

Equi-join optimization on spark

BIAN Hao-Qiong, CHEN Yue-Guo, DU Xiao-Yong, GAO Yan-Jie

2014, 2014 (5): 261-270. doi: 10.3969/j.issn.10005641.2014.05.023

Abstract ( 3323 )

PDF (833KB) ( 1600 )

Save

Equi-join is one of the most common and costly operations in data analysis. The implementations of equijoin on Spark are different from those in parallel databases. Equi-join algorithms based on data prepartitioning which are commonly used in parallel databases can hardly be implemented on Spark. Currently common used equijoin algorithms on Spark, such as broadcast join and repartition join, are not efficient. How to improve equijoin performance on Spark becomes the key of big data analysis on Spark. This work combines the advantages of SimiJoin and Partition Join and provides an optimized equijoin algorithm based on the features of Spark. It is indicated by cost analysis and experiment that this algorithm is 12 times faster than algorithms used in stateoftheart data analysis systems on Spark.

Related Articles | Metrics

A nested query strategy oriented massive distributed database

PEI Ou-Ya, LIU Wen-Jie, LI Zhan-Huai, TIAN Zheng

2014, 2014 (5): 271-280. doi: 10.3969/j.issn.10005641.2014.05.024

Abstract ( 1870 )

PDF (1646KB) ( 2225 )

Save

NoSQL databases have very good read/write performance and scalability in Big data analysis and processing, but they cannot support complete SQL queries and transactions across tables or rows, which limits the application of financial business based on the traditional relation databases. OceanBase, a distributed database, combines the advantages of relational databases and nonrelational databases, supporting relational queries and transactions across tables or rows. However, at present OceanBase only supports simple and nonnested queries which cannot meet the needs of financial business. After studying the OceanBase architecture and query strategy, a new strategy based on BloomFilter and HashMap is proposed in this paper. Experiments show that the strategy can improve the existing query strategy and enhance query performance.

Related Articles | Metrics

Study on stored procedure implementation oriented to OceanBase

ZHU Tao, ZHOU Min-Qi, ZHANG Zhao

2014, 2014 (5): 281-289. doi: 10.3969/j.issn.10005641.2014.05.025

Abstract ( 1638 )

PDF (900KB) ( 1689 )

Save

A stored procedure is a precompiled subroutine stored in database server, which improves the efficiency of applications’ database access. This paper discussed the implementation of stored procedure based on both static language and dynamic language. Besides, we gave a primary design for implementing stored procedures in OceanBase.

Related Articles | Metrics

Implementation of database schema design in distributed environment

PANG Tian-Ze, ZHANG Chen-Dong, GAO Ming, GONG Xue-Qing

2014, 2014 (5): 290-300. doi: 10.3969/j.issn.10005641.2014.05.026

Abstract ( 1934 )

PDF (2321KB) ( 2002 )

Save

Recently, we have witnessed an exponential increase in the amount of data. It results in a problem that a centralized database is hard to scaleup to the massive business requirements. A distributed database (DDB) is an alternative that can be scalable to the large scale applications by distributing the data to multinode server. Now, many enterprises have successfully implemented some distributed databases, such as Google Spanner and TaoBao OceanBase. In the theory of the designation of traditional database, different normal forms reduce the operational exception and data redundancy, and also ensure the data integrity. However, a schema design strictly following the normal forms leads to an inefficiently distributed database system because of the large amount of distributed relational operations. Fortunately, denormalization can significantly improve the query efficiency by reducing the number of relations and the amount of the distributed relational operations. OceanBase, a distributed database, is implemented by TaoBao and has high performance for OLTP, rather than OLAP. In this paper, we introduce how to utilize denormalization to design the schema for OceanBase and to improve the performance of OLAP. Finally, we illustrate the efficiency and effectiveness of the denormalization design for OceanBase in the empirical study by using benchmark TPC-H.

Related Articles | Metrics

OceanBase schema design for OLAP application

GU Ling, WENG Hai-Xing, HU Hua-Liang, ZHAO Qiong

2014, 2014 (5): 301-310. doi: 10.3969/j.issn.10005641.2014.05.027

Abstract ( 2225 )

PDF (3186KB) ( 1797 )

Save

As big data era is coming, high demands on the scalibility and query efficiency of database as user requirements are becoming more and more complicated. OceanBase developed by Alibaba is the relational distributed database. It is eqiupped with the feature of scalibility, low cost and availability. In addition, it is used in much wider applications, including OLTP and OLAP. However, the newest released version of OceanBase can only support the primary key index and cannot support the secondary index. Besides, OceanBase has no parallelism for join, which affects the query efficiency enormously. 〖JP2〗Therefore, the OceanBase schema design is necessary to make the primary key index and decreasing times of join useful. This paper studies TPCH application as the OLAP example to analyse the relational database schema design and propose the OceanBase schema design. At last, we varify the efficiency of the schema design through experiments.

Related Articles | Metrics

Application of inmemory data management technology in genealogy information system

ZHANG Wen-Jie, PENG Zhi-Yong, PENG Yu-Wei

2014, 2014 (5): 311-319. doi: 10.3969/j.issn.10005641.2014.05.028

Abstract ( 1221 )

PDF (712KB) ( 1982 )

Save

In this paper, a new genealogy information system was designed and implemented. It provides data inputting, data service and data outputting functions. The new genealogy information system is based on the distributed structure. Its distributed nodes employ the inmemory data management technology. Every distributed node initializes the hot data and creates index based on the user request in mainmemory columnstores. And it implements the data synchronization between the disk and inmemory as well as the data synchronization between distributed nodes and data center data node with transaction logs.

Related Articles | Metrics

How to evaluate inmemory database objectively

KANG Qiang-Qiang, JIN Che-Qing, ZHANG Zhao, HU Hua-Liang, ZHOU Ao-Ying

2014, 2014 (5): 320-329. doi: 10.3969/j.issn.10005641.2014.05.029

Abstract ( 1732 )

PDF (1829KB) ( 2073 )

Save

The hardware technology continues to develop in the past decade, and the price of memory gets lower so that many computer systems tend to deployhugesize memory. To fulfill this benefit, the researchers also developed several inmemory databases (IMDB) that execute workloads after preloading the whole data into memory. The bloom of various inmemory databases shows the necessity of testing and evaluating their performance objectively and fairly. Although the existing database benchmarks have shown great success during the development of the database technologies, including Wisconsin benchmark, TPCX series, and so on, such work cannot be applied straightforwardly due to the lack of consideration of several unique characteristics of inmemory databases. In this article, we propose a novel benchmark, called InMemBench, to test and evaluate the performance of an inmemory database objectively and fairly. Different from traditional database benchmarks, we take special consideration of startup, data organization, and data compression. Moreover, we conduct extensive experiments to illustrate the effectiveness and efficiency of our proposal.

Related Articles | Metrics

Benchmarking continuous queries over social streams

LI Ye, XIA Fan, QIAN Wei-Ning

2014, 2014 (5): 330-339. doi: 10.3969/j.issn.10005641.2014.05.030

Abstract ( 1469 )

PDF (684KB) ( 2143 )

Save

Continuous query processing over social streams has a wide range of applications. However,the problem is not intensively studied. This paper built a model of the continuous query problem over social streams. Data characteristics, types and distributions of workload, and performance measurements, were introduced. Furthermore, a benchmarks on this problem was presented. This work is important for system selection and comparison of technologies for social stream processing.

Related Articles | Metrics

Design and implementation of a database benchmark visualization tool VisualDBBench and application in mainmemory databases

LI Liang, WU Gang, LIU Hui-Lin, WANG Guo-Ren

2014, 2014 (5): 340-350. doi: 10.3969/j.issn.10005641.2014.05.031

Abstract ( 1919 )

PDF (4135KB) ( 1952 )

Save

From the view of developing automatic database testing tools, this paper studies TPC Benchmark C and TPC benchmark H standard specification in depth. On the above basis, the test model is presented, and the architecture and major classes of our automated database testing tools, VisualDBBench, are described. Furthermore, we verified that main-memory databases have more advantages than that of traditional databases.

Related Articles | Metrics

Table of Content