Given challenges with poor query performance for databases using LSM-trees, the present research explores the use of index and cache technologies to improve the query performance of LSM-trees. First, the paper introduces the basic structure of an LSM-tree and analyzes the factors that affect query performance. Second, we analyze current query optimization technologies for LSM-trees, including index optimization technology and cache optimization technology. Third, we analyze how index and cache, in particular, can improve the query performance of databases using LSM-trees and summarize existing research in this area. Finally, we present possible avenues for further research.
In the era of big data, the single-write multi-read process with separate storage and computing architectures can no longer meet the demands for efficient reading and writing of massive datasets. Multiple computing nodes providing write services concurrently can also cause cache inconsistencies. Some studies have proposed a global ordered transaction log to detect conflicts and maintain data consistency for the whole system using broadcast and playback of the transaction log. However, this scheme has poor scalability because it maintains the global write log at each write node. To solve this problem, this paper proposes a partition-based concurrency control scheme, which reduces the transaction log maintained by each write node by partitioning, and effectively improves the system’s overall expansion ability.
Query processing, including optimization and execution, is one of the most critical functionalities of modern relational database management systems (DBMS). The complexity of query processing functionalities, however, leads to high testing costs. It hinders rapid iterations during the development process and can lead to severe errors when deployed in production environments. In this paper, we propose a tool to better serve the testing and evaluation of DBMS query processing functionalities; the tool uses a fuzzing approach to generate random data that is highly associated with primary keys and generates valid complex analytical queries. The tool constructs constrained optimization problems to efficiently compute the exact cardinalities of operators in queries and furnish the results. We launched small-scale testing of our method on different versions of TiDB and demonstrated that the tool can effectively detect bugs in different versions of TiDB.
As a decentralized distributed ledger, blockchain technology is widely used to share data between untrusted parties. Compared with traditional databases that have been refined over many years, blockchains cannot support rich queries, are limited to single query interfaces, and suffer from slow response. Simple organizational structures and discrete storage limits are the main barriers that limit the expression of transaction data. In order to make up for the shortcomings of existing blockchain technology and achieve efficient application development, users can build abstract models, encapsulate easy-to-use interfaces, and improve the efficiency of queries. We also propose a general data management middleware for blockchain, which has the following characteristics: ① Support for custom construction of data models and the flexibility to add new abstractions to transaction data; ② Provide multiple data access interfaces to support rich queries and use optimization methods such as synchronous caching mechanisms to improve query efficiency; ③ Design advance hash calculation and asynchronous batch processing strategies to optimize transaction latency and throughput. We integrated the proposed data management middleware with the open source blockchain CITA and verified its ease of use and efficiency through experiments.
Blockchain system adopts full replication data storage mechanism, which retains a complete copy of the whole block chain for each node. The scalability of the system is poor. Due to the existence of Byzantine nodes in the blockchain system, the shard scheme used in the traditional distributed system cannot be directly applied in the blockchain system. In this paper, the storage consumption of each block is reduced from O(n) to O(1) by combining erasure code and Byzantine fault-tolerant algorithm, and the scalability of the system is enhanced. This paper proposes a method to partition block data, which can reduce the storage redundancy and affect the query efficiency less. A coding block storage method without network communication is proposed to reduce the system storage and communication overhead. In addition, a dynamic recoding method for entry and exit of blockchain nodes is proposed, which not only ensures the reliability of the system, but also reduces the system recoding overhead. Finally, the system is implemented on the open source blockchain system CITA, and through sufficient experiments, it is proved that the system has improved scalability, availability and storage efficiency.
With the advent of the big data era, the financial industry has been generating increasing volumn of data, exerting pressure on database systems. LevelDB is a key-value database, developed by Google, based on the LSM-tree architecture. It offers fast writing and a small footprint, and is widely used in the financial industry. In this paper, we propose a design method for the L0layer, based on non-volatile memory and machine learning, with the aim of addressing the shortcomings of the LSM-tree architecture, including write pause, write amplification, and unfriendly reading. The proposed solution can slow down or even solve the aforementioned problems; the experimental results demonstrate that the design can achieve better read and write performance.