中国综合性科技类核心期刊(北大核心)
中国科学引文数据库来源期刊(CSCD)
美国《化学文摘》(CA)收录
美国《数学评论》(MR)收录
俄罗斯《文摘杂志》收录
导航切换
Home
About Journal
About Journal
Awards
Editorial Board
Editorial Team
Submit Review
Guide for Authors
Online Submission
Editor Work
Peer Review
Editor-in-chief
Office Work
Reader Center
Current Issue
Topics
Top Cited
Top Read
Top Downloaded
Volumn Content
Subject Content
Archive
Subscribe
E-mail Alert
Download
Contacts Us
Office Work
中文
Content of Data Management Techniques In the New Era in our journal
Published in last 1 year
|
In last 2 years
|
In last 3 years
|
All
Please wait a minute...
For Selected:
Download Citations
EndNote
Reference Manager
ProCite
BibTeX
RefWorks
Toggle Thumbnails
Select
Efficient data loading for log-structured data stores
DING Guo-hao, XU Chen, QIAN Wei-ning
Journal of East China Normal University(Natural Sc 2019, 2019 (
5
): 143-158. DOI: 10.3969/j.issn.1000-5641.2019.05.012
Abstract
(
388
)
HTML
(
15
)
PDF
(1535KB)(
256
)
Knowledge map
With the rapid development of Internet technology in recent years, the number of users and the data processed by Internet companies and traditional financial institutions are growing rapidly. Traditionally, businesses have tackled this scalability problem by adding servers and adopting methods based on database sharding; however, this can lead to significant manual maintenance expenses and hardware overhead. To reduce overhead and the problems caused by database sharding, businesses commonly replace heritage equipment with new database systems. In this context, new databases based on log-structured merge tree storage (such as OceanBase) are being widely used; the data blocks stored on the disks of such systems exhibit global orderly features. In the process of switching from a traditional database to a new database, a large amount of data must be transferred, and database node downtime may occur when there is extended loading. In order to reduce the total time for loading and failure recovery, we propose a data loading method that supports load balancing and efficient fault tolerance. To support balanced data loading, we pre-calculate the number of partitions based on the file size and the default block size of a target system rather than using a pre-determined number of partitions. In addition, we use the feature that the data exported from the sharding database is usually sorted to determine the split points between partitions by selecting partial sampling blocks and selecting samples in sampling blocks at equal intervals, avoiding the high overhead caused by global sampling and selecting samples randomly or at the head in sampling blocks. To speed up the recovery process, we propose a replica-based partial recovery to avoid restart-based complete reloading; this method uses the multi-replica of an LSM-tree system to reduce the amount of reloaded data. Experimental results show that by pre-calculating the number of partitions and partial sampling blocks and by using equal-interval sampling, we can accelerate data loading relative to pre-determining the number of partitions and global sampling blocks as well as relative to random or head sampling strategies. Hence, we demonstrate the efficiency of replica-based partial failure recovery compared to restart-based complete reloading.
Reference
|
Related Articles
|
Metrics
Select
Implementation of LevelDB-based secondary index on two-dimensional data
LIU Zi-hao, HU Hui-qi, XU Rui, ZHOU Xuan
Journal of East China Normal University(Natural Sc 2019, 2019 (
5
): 159-167. DOI: 10.3969/j.issn.1000-5641.2019.05.013
Abstract
(
542
)
HTML
(
17
)
PDF
(459KB)(
402
)
Knowledge map
With the growth of spatial data generated by scientific research, especially two-dimensional data and the ongoing development of NoSQL-type database technology, more and more spatial data is now stored in NoSQL databases. LevelDB is an open source Key-Value NoSQL database that is widely used because it offers excellent write performance based on LSM architecture. Given the limitations of the Key-Value structure, it is impossible to index spatial data effectively. For this problem, a LevelDB and R-treebased secondary index was proposed to support spatial two-dimensional data indexing and neighbor queries. Experimental results show that the structure has good usability.
Reference
|
Related Articles
|
Metrics
Select
Implementation and optimization of a distributed consistency algorithm based on Paxos
ZHU Chao-fan, GUO Jin-wei, CAI Peng
Journal of East China Normal University(Natural Sc 2019, 2019 (
5
): 168-177. DOI: 10.3969/j.issn.1000-5641.2019.05.014
Abstract
(
351
)
HTML
(
16
)
PDF
(494KB)(
354
)
Knowledge map
With the ongoing development of the Internet, the degree of informationization in enterprises is continuously increasing, and more and more data needs to be processed in a timely manner. In this context, the instability of network environments may lead to data loss and node downtime, which can have potentially serious consequences. Therefore, building distributed fault-tolerant storage systems is becoming increasingly popular. In order to ensure high availability and consistency across the system, a distributed consistency algorithm needs to be introduced. To improve the performance of unstable networks, traditional distributed systems based on Paxos allow for the existence of holes in the log. However, when a node enters a recovery state, these systems typically require a large amount of network interaction to complete the holes in the log; this greatly increases the time for node recovery and thereby affects system availability. To address the complexity of the node recovery process after completing a hole log, this paper proposes a redesigned log entry structure and optimized data recovery process. The effectiveness of the improved Paxos-based consistency algorithm is verified with experimental simulation.
Reference
|
Related Articles
|
Metrics
Select
Implementation and optimization of GPU-based relational streaming processing systems
HUANG Hao, LI Zhi-fang, WANG Jia-lun, WENG Chu-liang
Journal of East China Normal University(Natural Sc 2019, 2019 (
5
): 178-189. DOI: 10.3969/j.issn.1000-5641.2019.05.015
Abstract
(
332
)
HTML
(
11
)
PDF
(1148KB)(
290
)
Knowledge map
State-of-the-art CPU-based streaming processing systems support complex queries on large-scale datasets. However, limited by CPU computational capability, these systems suffer from the performance tradeoff between throughput and response time, and cannot achieve the best of both. In this paper, we propose a GPU-based streaming processing system, named Serval, that co-utilizes CPU and GPU resources and efficiently processes streaming queries by micro-batching. Serval adopts the pipeline model and uses streaming execution cache to optimize throughput and response time on large scale datasets. To meet the demands of various scenarios, Serval implements multiple tuning policies by scaling the micro-batch size dynamically. Experiments show that a single-server Serval outperforms a 3-server distributed Spark Streaming by 3.87x throughput with a 91% response time on average, reflecting the efficiency of the optimization.
Reference
|
Related Articles
|
Metrics
Select
Woodpecker+: Customized workload performance evaluation based on data characteristics
ZHANG Tao, ZHANG Xiao-lei, LI Yu-ming, ZHANG Chun-xi, ZHANG Rong
Journal of East China Normal University(Natural Sc 2019, 2019 (
5
): 190-202. DOI: 10.3969/j.issn.1000-5641.2019.05.016
Abstract
(
344
)
HTML
(
12
)
PDF
(1529KB)(
330
)
Knowledge map
There are a number of performance testing tools, like Sysbench and OLTPBench, that can be used to benchmark the testing of database performance. However, because the standard benchmark workload is fixed and application scenarios for users are not always representative, it is impossible to accurately determine system performance. Moreover, if users are required to use a high-level programming language to implement a test workload separately for each application, this will undoubtedly introduce a substantial amount of repetitive work, resulting in inefficient testing. To address these issues, this paper designs and implements a user-defined performance test workload tool. The main benefits of this tool can be summarized as follows:It is easy to use and expandable; it provides a test definition language (TDL) for efficient construction of test cases; and it offers flexible control for mixed execution of transactions, data access distribution, lightweight and granular statistical information collection and analysis, and support for multiple mainstream DBMSs and other databases that provide database access interfaces. We highlight the tool's features through a detailed set of customized workload experiments running on the mainstream DBMS.
Reference
|
Related Articles
|
Metrics