华东师范大学学报(自然科学版)

• 计算机科学 • 上一篇    下一篇

基于LSM Tree的分布式索引实现

隆 飞, 翁海星,  高 明,  张 召   

  1. 华东师范大学 数据科学与工程研究院, 上海  200062
  • 收稿日期:2016-06-27 出版日期:2016-09-25 发布日期:2016-11-29
  • 通讯作者: 张 召, 女, 副教授, 硕士生导师, 研究方向为数据库. E-mail: zhzhang@sei.ecnu.edu.cn.
  • 基金资助:

    国家 863 计划项目(2015AA015307); 国家自然科学基金(U1401256, 61402180, 61402177); CCF-腾讯联合研究基金(AGR20150114); 上海市自然科学研究基金(14ZR1412600)

Distributed secondary index based on LSM Tree

LONG Fei, WENG Hai-xing, GAO Ming, ZHANG Zhao   

  1. Institute for Data Science and Engineering, East China Normal University, Shanghai 200062, China
  • Received:2016-06-27 Online:2016-09-25 Published:2016-11-29

摘要:

近年来 Log-Structured-Merge(LSM) Tree 在 NoSQL 系统中得到了广泛地应用. 主要是因为 LSM Tree 架构提出了延迟更新和批量写入的算法, 将随机写转换为批量写, 减少了磁盘臂的移动开销, 从而大大地提升了数据库的写入性能. 然而, 读性能却也因此受到影响. LSM Tree 和 B Tree 之间的本质区别使得 NoSQL 系统不适宜直接引用 B Tree 作为辅助索引结构. 本文实现了 LSM Tree 下的一种分布式辅助索引结构, 提出针对这种读写分离架构的索引批量加载策略, 并对 LSM Tree 的查询计划树进行了缓冲优化, 避免了重复的查询解析, 使得索引读的性能得到了相应的提升.

关键词: 辅助索引; , 日志结构合并树, NoSQL

Abstract:

In recent years, Log-Structured-Merge Tree has been widely used in NoSQL systems. This is mainly because it has proposed two algorithms: update delayed and batch write, convert random write to batch write, reducing the cost of moving the disk arm therefore the write performance of database has been enhanced greatly. However, the read performance of database has also been affected negatively. The essential difference between LSM Tree and B Tree makes NoSQL not suitable for using B Tree as index structure directly. This paper implements a distributed secondary index based on LSM Tree, and proposes a bulk loading method in this read and write separation architecture. We also do lots of works on the optimization of index query plan to avoid repeatly query parsing IO so that the performance of index read has been greatly improved.

Key words: Secondary Index, LSM Tree, NoSQL