Distributed secondary index based on LSM Tree

  • LONG Fei ,
  • WENG Hai-xing ,
  • GAO Ming ,
  • ZHANG Zhao
Expand
  • Institute for Data Science and Engineering, East China Normal University, Shanghai 200062, China

Received date: 2016-06-27

  Online published: 2016-11-29

Abstract

In recent years, Log-Structured-Merge Tree has been widely used in NoSQL systems. This is mainly because it has proposed two algorithms: update delayed and batch write, convert random write to batch write, reducing the cost of moving the disk arm therefore the write performance of database has been enhanced greatly. However, the read performance of database has also been affected negatively. The essential difference between LSM Tree and B Tree makes NoSQL not suitable for using B Tree as index structure directly. This paper implements a distributed secondary index based on LSM Tree, and proposes a bulk loading method in this read and write separation architecture. We also do lots of works on the optimization of index query plan to avoid repeatly query parsing IO so that the performance of index read has been greatly improved.

Key words: Secondary Index; LSM Tree; NoSQL

Cite this article

LONG Fei , WENG Hai-xing , GAO Ming , ZHANG Zhao . Distributed secondary index based on LSM Tree[J]. Journal of East China Normal University(Natural Science), 2016 , 2016(5) : 36 -44 . DOI: 10.3969/j.issn.1000-5641.2016.05.005

References

[1] APACHE ORG. Apache HBase[EB/OL]. [2016-07-07]. https://hbase.apache.org/.
[2] LAKSHMAN A, MALIK P. Cassandra: A decentralized structured storage system[J]. ACM SIGOPS Operating Systems Review, 2010, 44(2): 35-40.
[3] O’NEIL P, CHENG E, GAWLICK D, et al. The log-structured merge-tree (LSM-tree)[J]. Acta Informatica, 1996, 33(4): 351-385.
[4] HUAWEI. Secondary index in HBase[EB/OL]. [2016-07-07]. https://github.com/Huawei-Hadoop/hindex.
[5] CORBETT J C, DEAN J, EPSTEIN M, ET A L. Spanner: Google’s globally distributed database[J]. ACM Transactions on Computer Systems (TOCS), 2013, 31(3): 8.
[6] CHEN G, VO H T, WU S, et al. A framework for supporting DBMS-like indexes in the cloud[J]. Proceedings of The Vldb Endowment, 2011, 4(11): 702-713.
[7] 翁海星, 宫学庆, 朱燕超,等. 集群环境下分布式索引的实现[J]. 计算机应用,2016, 36(1): 1-7.
[8] CHANG F, DEAN J, GHEMAWAT S, et al. Bigtable: A distributed storage system for structured data[J]. ACM Transactions on Computer Systems, 2008, 26(2): 4.
[9] TAN W, TATA S, TANG Y, et al. Diff-index: differentiated index in distributed log-structured data stores[C]. Extending Database Technology, 2014: 700-711.
[10] 阳振坤. OceanBase~关系数据库架构[J]. 华东师范大学学报(自然科学版), 2014(5): 141-148.
[11] 孟必平, 王腾蛟, 李红燕, 等. 分片位图索引:一种适用于云数据管理的辅助索引机制[J]. 计算机学报, 2012, 35(11):2306-2316.
[12] 黄贵, 庄明强. OceanBase分布式存储引擎[J]. 华东师范大学学报(自然科学版),2014 (5): 164-172.
[13] ALIBABA INC. OceanBase[Z/OL].[2016-07-07]. https://github.com/alibaba/oceanbase/tree/master/oceanbase 0.4.
[14] 杨传辉.大规模分布式存储系统: 原理解析与架构实战[M].北京:机械工业出版社, 2013.
[15] COOPER B F, SILBERSTEIN A, TAM E, et al. Benchmarking cloud serving systems with YCSB[C]// Proceedings of the 1st ACM Symposium on Cloud Computing. ACM, 2010: 143-154.

Outlines

/