融合人体姿态估计和目标检测的学生课堂行为识别

王泽杰; 沈超敏; 赵春; 刘新妹; 陈杰

doi:10.3969/j.issn.1000-5641.2022.02.007

华东师范大学学报（自然科学版） >

2022 , Vol. 2022 >Issue 2: 55 - 66

DOI: https://doi.org/10.3969/j.issn.1000-5641.2022.02.007

计算机科学

融合人体姿态估计和目标检测的学生课堂行为识别

王泽杰 ,
沈超敏 ,
赵春 ,
刘新妹 ,
陈杰

展开

1. 华东师范大学计算机科学与技术学院, 上海　200062
2. 华东师范大学上海市多维度信息处理重点实验室, 上海　200241
3. 华东师范大学信息化治理办公室, 上海　200062
4. 华东师范大学教育信息技术学系, 上海　200062

收稿日期: 2020-11-04

网络出版日期: 2022-03-28

基金资助

国家自然科学基金(11771276, 61731009); 上海市“科技创新行动计划”人工智能科技支撑专项项目(20511100200); 上海市科学技术委员会资助项目(14DZ2260800)

收起

Recognition of classroom learning behaviors based on the fusion of human pose estimation and object detection

Zejie WANG ,
Chaomin SHEN ,
Chun ZHAO ,
Xinmei LIU ,
Jie CHEN

Expand

1. School of Computer Science and Technology, East China Normal University, Shanghai　200062, China
2. Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University, Shanghai　200241, China
3. Information Technology Service, East China Normal University, Shanghai　200062, China
4. Department of Education Information Technology, East China Normal University, Shanghai　200062, China

Received date: 2020-11-04

Online published: 2022-03-28

Fold

摘要

在课堂教学中, 人工智能技术可以帮助实现学生行为分析自动化, 让教师能够高效且直观地掌握学生学习行为投入的情况, 为后续优化教学设计与实施教学干预提供数据支持. 构建了学生课堂行为数据集, 为后续研究工作提供了数据基础; 提出了一种行为检测方法及一套可行的高精度的行为识别模型, 利用OpenPose算法提取的人体姿态全局特征, 融合YOLO v3算法提取的交互物体局部特征, 对学生行为进行了识别分析, 提高了识别精度; 改进了模型结构, 压缩并优化了模型, 降低了空间与时间的消耗. 选取与学习投入状态紧密相关的4种行为: 正坐、侧身、低头和举手进行识别, 该检测与识别方法在验证集上的精度达到了95.45%, 在课堂上玩手机和书写等常见行为的识别精度较原模型有很大的提高.

关键词： 学习行为识别; 人体姿态估计; 目标检测; 计算机视觉; 深度学习

本文引用格式

王泽杰 , 沈超敏 , 赵春 , 刘新妹 , 陈杰 . 融合人体姿态估计和目标检测的学生课堂行为识别[J]. 华东师范大学学报（自然科学版）, 2022 , 2022(2) : 55 -66 . DOI: 10.3969/j.issn.1000-5641.2022.02.007

Abstract

As a result of ongoing advances in artificial intelligence technology, the potential for learning analysis in teaching evaluation and educational data mining is gradually being recognized. In classrooms, artificial intelligence technology can help to enable automated student behavior analysis, so that teachers can effectively and intuitively grasp students’ learning behavior engagement; the technology, moreover, can provide data to support subsequent improvements in learning design and implementation of teaching interventions. The main scope of the research is as follows: Construct a classroom student behavior dataset that provides a basis for subsequent research; Propose a behavior detection method and a set of feasible, high-precision behavior recognition models. Based on the global features of the human posture extracted from the Openpose algorithm and the local features of the interactive objects extracted by the YOLO v3 algorithm, student behavior can be identified and analyzed to help improve recognition accuracy; Improve the model structure, compress and optimize the model, and reduce the consumption of computing power and time. Four behaviors closely related to the state of learning engagement: listening, turning sideways, bowing, and raising hands are recognized. The accuracy of the detection and recognition method on the verification set achieves 95.45%. The recognition speed and accuracy of common behaviors, such as playing with mobile phones and writing, are greatly improved compared to the original model.

Key words： learning behavior recognition; pose estimation; object detection; computer vision; deep learning

参考文献

1	KUH GEORGE D. Assessing what really matters to student learning inside the national survey of student engagement. Change, 2001, 33 (3): 10- 17.
2	CAO Z, SIMON T, WEI S E, et al. Realtime multi-person 2D pose estimation using part affinity fields [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2017: 1302-1310.
3	REDMON J, FARHADI A. YOLO v3: An incremental improvement [EB/OL]. (2018-04-08)[2021-10-26].https://arxiv.org/pdf/1804.02767.pdf.
4	GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2014: 580-587.
5	GIRSHICK R. Fast R-CNN [EB/OL]. (2015-9-27)[2021-10-26].https://arxiv.org/pdf/1504.08083.pdf.
6	REN S, HE K, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 39 (6): 1137- 1149.
7	REDMON J, FARHADI A. YOLO 9000: Better, faster, stronger [C]//IEEE Conference on Computer Vision & Pattern Recognition. 2017: 6517-6525.
8	SANEIRO M, SANTOS O C, SALMERON-MAJADAS S, et al. Towards emotion detection in educational scenarios from facial expressions and body movements through multimodal approaches [J]. The Scientific World Journal, 2014: 484873.
9	LUCEY P, COHN J F, KANADE T, et al. The extended cohn-kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression [C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops. 2010: 94-101.
10	LEI F, WEI Y, HU J, et al. Student action recognition based on multiple features [C]//2019 International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData). 2019: 428-432.
11	LI P, WANG Q, ZENG H, et al. Local log-euclidean multivariate gaussian descriptor and its application to image classification. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 39 (4): 803- 817.
12	LOWE D G. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 2004, 60 (2): 91- 110.
13	林灿然, 许伟亮, 李逸. 基于多模态数据的课堂学生行为识别技术的探究. 现代计算机, 2020, (6): 70- 76.
14	LI X, WANG M, ZENG W, et al. A students’ action recognition database in smart classroom [C]//2019 14th International Conference on Computer Science & Education (ICCSE). 2019: 523-527.
15	SUN B, ZHAO K, XIAO Y, et al. BNU-LCSAD: A video database for classroom student action recognition [C]//Optoelectronic Imaging and Multimedia Technology VI. 2019: 111871V.
16	TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks [C]//Proceedings of the IEEE International Conference on Computer Vision. 2015: 4489-4497.
17	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [EB/OL]. (2014-04-10)[2021-10-26].https://arxiv.org/pdf/1409.1556.pdf.
18	HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 770-778.
19	IOFFE S, SZEGEDY C. Batch normalization: Accelerating deep network training by reducing internal covariate shift [C]//International Conference on Machine Learning. 2015: 448-456.
20	WANG H, SCHMID C. Action recognition with improved trajectories [C]//Proceedings of the IEEE International Conference on Computer Vision. 2013: 3551-3558.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献