华东师范大学学报(自然科学版) ›› 2023, Vol. 2023 ›› Issue (5): 65-76.doi: 10.3969/j.issn.1000-5641.2023.05.006

• 数据学习系统 • 上一篇    下一篇

FeaDB: 基于内存的多版本在线特征存储

高歌, 胡卉芪*()   

  1. 华东师范大学 数据科学与工程学院, 上海 200062
  • 收稿日期:2023-06-30 接受日期:2023-07-26 出版日期:2023-09-25 发布日期:2023-09-15
  • 通讯作者: 胡卉芪 E-mail:hqhu@dase.ecnu.edu.cn
  • 基金资助:
    上海市自然科学基金(23ZR1418300)

FeaDB: In-memory based multi-version online feature store

Ge GAO, Huiqi HU*()   

  1. School of Data Science and Engineering, East China Normal University, Shanghai 200062, China
  • Received:2023-06-30 Accepted:2023-07-26 Online:2023-09-25 Published:2023-09-15
  • Contact: Huiqi HU E-mail:hqhu@dase.ecnu.edu.cn

摘要:

特征管理是搭建人工智能数据管道中的重要一环. 特征存储要求在模型训练和推理阶段提供有效版本的特征推送服务. 为响应这一需求, 特征存储需要为特征实时更新和版本管理提供保证, 以协同上游的特征摄取, 为模型服务系统提供数据动力. 在人工智能辅助决策的在线预测任务中, 为了提供更好的用户体验, 模型服务系统需要实时响应决策请求, 实时特征检索面临更低延迟的挑战. 聚焦这一挑战, 开发基于内存的多版本在线特征存储FeaDB. 使用时间序列建模特征, 并提供特征版本管理语义, 满足特征从生产到消费的版本管理需求; 采用追加写方式保证实时特征加载性能, 设计基于版本的索引减少读延迟; 为进一步减小特征消费延迟, 提出版本快照机制, 实验证明采用快照读机制增加了特征集版本的检索效率.

关键词: 人工智能数据管理系统, 多版本存储, 在线特征存储

Abstract:

Feature management plays an important role in the AI(artificial intelligence) pipeline. Feature stores are designed to offer effective versioning of features during the model training and inference stages. Feature stores must ensure real-time feature updates and version management to collaborate with the upstream data ingestion tasks and power the model serving system. In AI-powered online decision augmentation applications, the model serving system responds to requests in real time to provide better user experience, and feature stores face the challenge of low-latency online feature retrieval. Focusing on this challenge, we developed FeaDB, an in-memory based multi-version online feature store, which adopts a time series model and provides feature versioning semantics to automatically manage features from ingestion to serving. Moreover, an append-write operation was applied to ensure ingestion performance, and version indexing was optimized to improve read operations. A snapshot mechanism is proposed, and it was experimentally proven that snapshot read operations improve performance of lookup and range lookup.

Key words: database for Artificial Intelligence, multi-version store, online feature store

中图分类号: