华东师范大学学报(自然科学版) ›› 2023, Vol. 2023 ›› Issue (2): 73-81.doi: 10.3969/j.issn.1000-5641.2023.02.009

• 计算机科学 • 上一篇    下一篇

基于大页内存的学习索引内存分配策略

官嘉林1, 朱艳1, 吴庭亮1, 陈艳2,*(), 张敬伟1   

  1. 1. 桂林电子科技大学 广西可信软件重点实验室, 广西 桂林 541004
    2. 桂林航天工业学院 计算机科学与工程学院, 广西 桂林 541004
  • 收稿日期:2021-09-09 出版日期:2023-03-25 发布日期:2023-03-23
  • 通讯作者: 陈艳 E-mail:chenyan@guat.edu.cn
  • 基金资助:
    国家自然科学基金(U1711263); 广西自然科学基金 (2018GXNSFAA281199, 2020GXNSFAA159117); 广西高校中青年教师基础能力提升项目(2018KY0651)

A memory allocation strategy for learned index based on huge pages

Jialin GUAN1, Yan ZHU1, Tingliang WU1, Yan CHEN2,*(), Jingwei ZHANG1   

  1. 1. Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin, Guangxi 541004, China
    2. School of Computer Science and Engineering, Guilin University of Aerospace Technology, Guilin, Guangxi 541004, China
  • Received:2021-09-09 Online:2023-03-25 Published:2023-03-23
  • Contact: Yan CHEN E-mail:chenyan@guat.edu.cn

摘要:

大数据时代, 数据信息的不断膨胀给数据的快速存取带来了巨大挑战. 因此, 设计一种高效的索引结构具有重要意义. ALEX (updatable adaptive learned index)是一种利用机器学习模型代替传统B-树索引结构的学习索引, 具有较好的时间、空间性能, 但存在频繁的缺页中断问题. 为解决此问题, 进一步提升ALEX性能, 在ALEX基础上提出了一种基于大页内存的内存预分配策略, 较好地降低了内存缺页中断率, 提升了ALEX性能. 在内存分配阶段, 采用预分配策略; 在内存回收阶段, 则采用延迟释放策略. 在Longitudes数据集上的实验表明, 该策略具有良好的效果.

关键词: 学习索引, 大页内存, 数据存取

Abstract:

In the era of big data and with the continuous expansion of data, there are significant challenges with efficient access to data. Hence, designing an efficient index structure is of great significance. ALEX (updatable adaptive learned index) is a learned index that uses a machine learning model to replace the traditional B-tree index structure. Although it offers good time and space performance, it suffers from frequent page faults. In order to solve this problem and further improve the performance of ALEX, a memory pre-allocation strategy based on huge pages is proposed, on the basis of ALEX, that can help reduce the rate of memory page faults and improve the overall performance of ALEX. In the memory allocation phase, the pre-allocation strategy is adopted, and the memory free phase adopts a delayed release strategy. Experiments on the Longitudes dataset show that this strategy offers good performance.

Key words: learned index, huge pages, data access

中图分类号: