教育数据管理

面向在线教育场景的异构数据生成工具

  • 周伟 ,
  • 王可 ,
  • 胡卉芪
展开
  • 华东师范大学 数据科学与工程学院, 上海 200062

收稿日期: 2024-07-03

  网络出版日期: 2024-09-23

基金资助

国家重点研发计划项目(2023YFC3341202)

Heterogeneous data generation tools for online education scenarios

  • Wei ZHOU ,
  • Ke WANG ,
  • Huiqi HU
Expand
  • School of Data Science and Engineering, East China Normal University, Shanghai 200062, China

Received date: 2024-07-03

  Online published: 2024-09-23

摘要

在数字化教育应用领域, 在线课堂等平台的开发人员在追求数据驱动的优化过程中, 面临着隐私问题和现有数据集规模不足的挑战. 针对此, 构建了一种适应教育特性的异构数据模型, 并实现了相应的数据生成工具 (E-Tools), 用于模拟复杂教育场景下的数据交互. 实验表明, 该工具在多种数据规模下, 都能保持高效的数据生成速度 (64 ~ 74 $ {\rm{MB}}\cdot{\rm{s}}^{-1} $), 展现了良好的线性扩展能力, 验证了所提模型的有效性及工具生成较大数据量的能力. 同时, 设计了反映学生学习行为的异构数据查询负载, 为教育平台的性能评估与优化提供了强有力的支持.

本文引用格式

周伟 , 王可 , 胡卉芪 . 面向在线教育场景的异构数据生成工具[J]. 华东师范大学学报(自然科学版), 2024 , 2024(5) : 114 -127 . DOI: 10.3969/j.issn.1000-5641.2024.05.011

Abstract

In the digital education application domain, developers of platforms such as online classrooms face the challenges of privacy issues and existing datasets’ insufficient size in their pursuit of data-driven optimization. To address this, a set of heterogeneous data models adapted to the characteristics of education were constructed, and corresponding data generation tools (E-Tools) that can be used to simulate data interactions in complex educational scenarios were implemented. Experimental results have shown that the tool can maintain an efficient data generation speed (64–74 $ {\rm{MB}}\cdot {{\rm{s}}^{-1}} $) under a variety of data sizes, demonstrating good linear scaling ability, which validates the model’s effectiveness and the tool’s ability to generate larger data volumes. A heterogeneous data query load reflecting students’ learning behaviors was also designed to provide strong support for performance evaluation and the education platform’s optimization.

参考文献

1 李志民.. 教育信息化与教育数字化转型升级. 中国教育信息化, 2024, 30 (1): 71- 75.
2 钱源, 施佺.. 基于多源异构数据源的高校决策支持服务平台研究. 中国教育信息化, 2020, (5): 50- 53.
3 YU J F, LUO G, XIAO T, et al. MOOCCube: A large-scale data repository for NLP applications in MOOCs [C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics (ACL), 2020: 3135-3142.
4 KHELIFATI A, KHAYATI M, DIGNOS A, et al.. TSM-Bench: Benchmarking time series database systems for monitoring applications. Proceedings of the VLDB Endowment, 2023, 16 (11): 3363- 3376.
5 POKORNY J. Data integration in a multi-model environment [C]// International Conference on Information Integration and Web Intelligence. Cham: Springer Nature Switzerland, 2023: 121-127.
6 LU J H, LIU Z H, XU P F, et al. UDBMS: Road to unification for multi-model data management [C]// Advances in Conceptual Modeling, ER 2018, Lecture Notes in Computer Science, Vol 11158. Cham: Springer, 2018: 285-294.
7 HOLUBOVA I, CONTOS P, SVOBODA M. Multi-model data modeling and representation: State of the art and research challenges [C]// Proceedings of the 25th International Database Engineering & Applications Symposium. ACM, 2021: 242-251.
8 ZHANG C, LU J H, XU P F, et al. UniBench: A benchmark for multi-model database management systems [C]// Performance Evaluation and Benchmarking for the Era of Artificial Intelligence, TPCTC 2018, Lecture Notes in Computer Science, Vol 11135. Cham: Springer, 2019: 7-23.
9 KIM B, KOO K, ENKHBAT U, et al.. M2Bench: A database benchmark for multi-model analytic workloads. Proceedings of the VLDB Endowment, 2022, 16 (4): 747- 759.
10 武法提, 黄石华.. 基于多源数据融合的共享教育数据模型研究. 电化教育研究, 2020, 41 (5): 59- 65.
11 黄石华, 武法提.. 场景化分析: 一种数据驱动下的学习行为解释性框架. 电化教育研究, 2023, 44 (5): 51- 59.
12 黄涛, 王一岩, 张浩, 等.. 智能教育场域中的学习者建模研究趋向. 远程教育杂志, 2020, 38 (1): 50- 60.
13 宋卓卿. 面向在线课程的学生学习行为建模研究 [D]. 广西 桂林: 桂林电子科技大学, 2023.
14 KRISHNA RAO L V, GOWTHAMI B V, HEMA B, et al. Analysis of student behavioural patterns by machine learning [C]// ICT Analysis and Applications, Lecture Notes in Networks and Systems, Vol 517. Singapore: Springer Nature Singapore, 2022: 59-68.
文章导航

/