System for Learning from Data

FeaDB: In-memory based multi-version online feature store

  • Ge GAO ,
  • Huiqi HU
Expand
  • School of Data Science and Engineering, East China Normal University, Shanghai 200062, China

Received date: 2023-06-30

  Accepted date: 2023-07-26

  Online published: 2023-09-20

Abstract

Feature management plays an important role in the AI(artificial intelligence) pipeline. Feature stores are designed to offer effective versioning of features during the model training and inference stages. Feature stores must ensure real-time feature updates and version management to collaborate with the upstream data ingestion tasks and power the model serving system. In AI-powered online decision augmentation applications, the model serving system responds to requests in real time to provide better user experience, and feature stores face the challenge of low-latency online feature retrieval. Focusing on this challenge, we developed FeaDB, an in-memory based multi-version online feature store, which adopts a time series model and provides feature versioning semantics to automatically manage features from ingestion to serving. Moreover, an append-write operation was applied to ensure ingestion performance, and version indexing was optimized to improve read operations. A snapshot mechanism is proposed, and it was experimentally proven that snapshot read operations improve performance of lookup and range lookup.

Cite this article

Ge GAO , Huiqi HU . FeaDB: In-memory based multi-version online feature store[J]. Journal of East China Normal University(Natural Science), 2023 , 2023(5) : 65 -76 . DOI: 10.3969/j.issn.1000-5641.2023.05.006

References

1 ZAHARIA M, CHEN A, DAVIDSON A, et al.. Accelerating the machine learning lifecycle with MLflow. IEEE Data Engineering Bulletin, 2018, 41 (4): 39- 45.
2 LUO Z, YEUNG S H, ZHANG M, et al. MLCask: Efficient management of component evolution in collaborative data analytics pipelines [C]// 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 2021: 1655-1666.
3 SCHLEGEL M, SATTLER K U.. Management of machine learning lifecycle artifacts: A survey. ACM SIGMOD Record, 2023, 51 (4): 18- 35.
4 GHARIBI G, WALUNJ V, RELLA S, et al. Modelkb: Towards automated management of the modeling lifecycle in deep learning [C]// 2019 IEEE/ACM 7th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE). IEEE, 2019: 28-34.
5 SCULLEY D, HOLT G, GOLOVIN D, et al.. Hidden technical debt in machine learning systems. Advances in Neural Information Processing Systems, 2015, 2, 2503- 2511.
6 JEREMY H, MIKE D B. Meet Michelangelo: Uber’s Machine Learning Platform [EB/OL]. (2017-09-05)[2023-06-30]. https://www.uber.com/en-TW/blog/michelangelo-machine-learning-platform/.
7 WILLEM P, MIKE D. Feast: An open source feature store for machine learning [EB/OL]. (2021-01-21)[2023-06-30]. https://feast.dev/blog/what-is-a-feature-store/.
8 CHEN C, YANG J, LU M, et al.. Optimizing in-memory database engine for AI-powered on-line decision augmentation using persistent memory. Proceedings of the VLDB Endowment, 2021, 14 (5): 799- 812.
9 ORMENISAN A A, ISMAIL M, HAMMAR K, et al. Horizontally scalable ml pipelines with a feature store [C]// Proceedings of the 2nd SysML Conference. Palo Alto, CA, USA, 2019.
10 FAGIN R, NIEVERGELT J, PIPPENGER N, et al.. Extendible Hashing—a fast access method for dynamic files. ACM Transactions on Database Systems (TODS), 1979, 4 (3): 315- 344.
11 NETFLIX. System architectures for personalization and recommendation [EB/OL]. (2013-03-27)[2023-06-30]. https://netflixtechblog.com/system-architectures-for-personalization-and-recommendation-e081aa94b5d8/.
12 ARVAZ K, ZOHAIB H. Building a gigascale ML feature store with redis, binary serialization, string Hashing, and compression [EB/OL]. (2020-11-19)[2023-06-30]. https://doordash.engineering/2020/11/19/building-a-gigascale-ml-feature-store-with-redis/.
13 SARAH W. Ralf [EB/OL]. (2022-03-13)[2023-06-30]. https://github.com/feature-store/ralf/.
14 MOHANTY P, KRISHNASWAMY S, CHOI E. Automated Cache Hierarchy for Feature Stores [R]. CA: University of California, Berkeley, 2021.
15 ORR L, SANYAL A, LING X, et al. Managing ML pipelines: Feature stores and the coming wave of embedding ecosystems [EB/OL]. (2021-08-11)[2023-06-30]. https://arxiv.org/pdf/2108.05053.pdf.
Outlines

/