Recognition of classroom learning behaviors based on the fusion of human pose estimation and object detection

Zejie WANG; Chaomin SHEN; Chun ZHAO; Xinmei LIU; Jie CHEN

doi:10.3969/j.issn.1000-5641.2022.02.007

Journal of East China Normal University(Natural Science) >

2022 , Vol. 2022 >Issue 2: 55 - 66

DOI: https://doi.org/10.3969/j.issn.1000-5641.2022.02.007

Computer Science

Recognition of classroom learning behaviors based on the fusion of human pose estimation and object detection

Zejie WANG ,
Chaomin SHEN ,
Chun ZHAO ,
Xinmei LIU ,
Jie CHEN

Expand

1. School of Computer Science and Technology, East China Normal University, Shanghai　200062, China
2. Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University, Shanghai　200241, China
3. Information Technology Service, East China Normal University, Shanghai　200062, China
4. Department of Education Information Technology, East China Normal University, Shanghai　200062, China

Received date: 2020-11-04

Online published: 2022-03-28

Fold

Abstract

As a result of ongoing advances in artificial intelligence technology, the potential for learning analysis in teaching evaluation and educational data mining is gradually being recognized. In classrooms, artificial intelligence technology can help to enable automated student behavior analysis, so that teachers can effectively and intuitively grasp students’ learning behavior engagement; the technology, moreover, can provide data to support subsequent improvements in learning design and implementation of teaching interventions. The main scope of the research is as follows: Construct a classroom student behavior dataset that provides a basis for subsequent research; Propose a behavior detection method and a set of feasible, high-precision behavior recognition models. Based on the global features of the human posture extracted from the Openpose algorithm and the local features of the interactive objects extracted by the YOLO v3 algorithm, student behavior can be identified and analyzed to help improve recognition accuracy; Improve the model structure, compress and optimize the model, and reduce the consumption of computing power and time. Four behaviors closely related to the state of learning engagement: listening, turning sideways, bowing, and raising hands are recognized. The accuracy of the detection and recognition method on the verification set achieves 95.45%. The recognition speed and accuracy of common behaviors, such as playing with mobile phones and writing, are greatly improved compared to the original model.

Key words： learning behavior recognition; pose estimation; object detection; computer vision; deep learning

Cite this article

Zejie WANG , Chaomin SHEN , Chun ZHAO , Xinmei LIU , Jie CHEN . Recognition of classroom learning behaviors based on the fusion of human pose estimation and object detection[J]. Journal of East China Normal University(Natural Science), 2022 , 2022(2) : 55 -66 . DOI: 10.3969/j.issn.1000-5641.2022.02.007

References

1	KUH GEORGE D. Assessing what really matters to student learning inside the national survey of student engagement. Change, 2001, 33 (3): 10- 17.
2	CAO Z, SIMON T, WEI S E, et al. Realtime multi-person 2D pose estimation using part affinity fields [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2017: 1302-1310.
3	REDMON J, FARHADI A. YOLO v3: An incremental improvement [EB/OL]. (2018-04-08)[2021-10-26].https://arxiv.org/pdf/1804.02767.pdf.
4	GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2014: 580-587.
5	GIRSHICK R. Fast R-CNN [EB/OL]. (2015-9-27)[2021-10-26].https://arxiv.org/pdf/1504.08083.pdf.
6	REN S, HE K, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 39 (6): 1137- 1149.
7	REDMON J, FARHADI A. YOLO 9000: Better, faster, stronger [C]//IEEE Conference on Computer Vision & Pattern Recognition. 2017: 6517-6525.
8	SANEIRO M, SANTOS O C, SALMERON-MAJADAS S, et al. Towards emotion detection in educational scenarios from facial expressions and body movements through multimodal approaches [J]. The Scientific World Journal, 2014: 484873.
9	LUCEY P, COHN J F, KANADE T, et al. The extended cohn-kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression [C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops. 2010: 94-101.
10	LEI F, WEI Y, HU J, et al. Student action recognition based on multiple features [C]//2019 International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData). 2019: 428-432.
11	LI P, WANG Q, ZENG H, et al. Local log-euclidean multivariate gaussian descriptor and its application to image classification. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 39 (4): 803- 817.
12	LOWE D G. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 2004, 60 (2): 91- 110.
13	林灿然, 许伟亮, 李逸. 基于多模态数据的课堂学生行为识别技术的探究. 现代计算机, 2020, (6): 70- 76.
14	LI X, WANG M, ZENG W, et al. A students’ action recognition database in smart classroom [C]//2019 14th International Conference on Computer Science & Education (ICCSE). 2019: 523-527.
15	SUN B, ZHAO K, XIAO Y, et al. BNU-LCSAD: A video database for classroom student action recognition [C]//Optoelectronic Imaging and Multimedia Technology VI. 2019: 111871V.
16	TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks [C]//Proceedings of the IEEE International Conference on Computer Vision. 2015: 4489-4497.
17	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [EB/OL]. (2014-04-10)[2021-10-26].https://arxiv.org/pdf/1409.1556.pdf.
18	HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 770-778.
19	IOFFE S, SZEGEDY C. Batch normalization: Accelerating deep network training by reducing internal covariate shift [C]//International Conference on Machine Learning. 2015: 448-456.
20	WANG H, SCHMID C. Action recognition with improved trajectories [C]//Proceedings of the IEEE International Conference on Computer Vision. 2013: 3551-3558.

Options

Outlines

模态框（Modal）标题

Abstract

Cite this article

References