Computer Science

An improved method for hand gesture estimation based on depth image pre-rotation

  • XU Zhengze ,
  • ZHANG Wenjun
Expand
  • 1. Shanghai Film Academy, Shanghai University, Shanghai 200072, China;
    2. School of Communication, East China Normal University, Shanghai 200241, China

Received date: 2019-04-28

  Online published: 2020-07-20

Abstract

Hand gesture estimation is much more difficult than human pose estimation from depth images, in part because existing algorithms are unable to recognize different appearances of the same hand gesture after rotation. In this paper, an improved approach for hand gesture estimation based on in-plane image rotation is proposed. First, a convolutional neural network (CNN) was trained by datasets with an auto tagged optimum angle of rotation. Then, prior to hand gesture recognition, an in-plane image of the hand depth was processed by the predicted angle of rotation through the trained CNN model. Lastly, depth pixels were classified by random decision forest (RDF), followed by clustering to generate the hand joint position. Experiments show that this method can reduce the error between the predicted position of the hand joint and the exact position, and the accuracy of gesture estimation improves by about 4.69% from the baseline.

Cite this article

XU Zhengze , ZHANG Wenjun . An improved method for hand gesture estimation based on depth image pre-rotation[J]. Journal of East China Normal University(Natural Science), 2020 , 2020(4) : 124 -133 . DOI: 10.3969/j.issn.1000-5641.201921004

References

[1] ZHOU R, YUAN J S, ZHANG Z Y. Robust hand gesture recognition based on finger-earth mover’s distance with a commodity depth camera [C]// Proceedings of the 19th International Conference on Multimedia. ACM, 2011: 1093–1096. DOI: 10.1145/2072298.2071946.
[2] TOMPSON J, STEIN M, LECUN Y, et al. Real-time continuous pose recovery of human hands using convolutional networks [J]. ACM Transactions on Graphics, 2014, 33(5): Article number 169. DOI: 10.1145/2629500.
[3] SINHA A, CHOI C, RAMANI K. Deephand: Robust hand pose estimation by completing a matrix imputed with deep features [J]. Computer Vision and Pattern Recognition, 2016(1): 4150-4158.
[4] KHAN R, HANBURY A, STTTINGER J, et al. Color based skin classification [J]. Pattern Recognition Letters, 2012, 33(2): 157-163. DOI: 10.1016/j.patrec.2011.09.032.
[5] GE L H, LIANG H, YUAN J S, et al. 3D convolutional neural networks for efficient and robust hand pose estimation from single depth images [C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017: 5679-5688. DOI: 10.1109/CVPR.2017.602.
[6] YUAN S X, YE Q, STENGER B, et al. BigHand2.2M benchmark: Hand pose dataset and state of the art analysis [C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017: 2605-2613. DOI: 10.1109/CVPR.2017.279.
[7] SHOTTON J, GIRSHICK R, FITZGIBBON A, et al. Efficient human pose estimation from single depth images [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(12): 2821-2840. DOI: 10.1109/TPAMI.2012.241.
[8] QIAN C, SUN X, WEI Y C, et al. Realtime and robust hand tracking from depth [C]// 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2014: 1106-1113. DOI: 10.1109/CVPR.2014.145.
[9] XU C, CHENG L. Efficient hand pose estimation from a single depth image [C]// 2013 IEEE International Conference on Computer Vision. IEEE, 2013: 3456-3462. DOI: 10.1109/ICCV.2013.429.
[10] CAMPBELL L W, BECKER D A, AZARBAYEJANI A, et al. Invariant features for 3-D gesture recognition [C]// Proceedings of the Second International Conference on Automatic Face and Gesture Recognition. IEEE, 1996: 157-162. DOI: 10.1109/AFGR.1996.557258.
[11] JOONGROCK K, SUNJIN Y, DONGCHUL K, et al L. An adaptive local binary pattern for 3D hand tracking [J]. Pattern Recognition, 2017, 61: 139-152. DOI: 10.1016/j.patcog.2016.07.039.
[12] KESKIN C, KIRAÇ F, KARA Y E, et al. Real time hand pose estimation using depth sensors [C]// 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops). IEEE, 2011: 1228-1234. DOI: 10.1109/ICCVW.2011.6130391.
[13] LAPTEV D, SAVINOV N, BUHMANN J M, et al. TI-POOLING: Transformation-invariant pooling for feature learning in convolutional neural networks [C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016: 289-297. DOI: 10.1109/CVPR.2016.38.
[14] BOUREAU Y L, PONCE J, LECUN Y. A theoretical analysis of feature pooling in visual recognition [C]// Proceedings of the 27th International Conference on Machine Learning (ICML-10). 2010: 111–118.
[15] LEPETIT V, LAGGER P, FUA P. Randomized trees for real-time keypoint recognition [C]// 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05). IEEE, 2005: 775–781. DOI: 10.1109/CVPR.2005.288.
[16] CHENG G, ZHOU P C, HAN J W. Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images [J]. IEEE Transactions on Geoscience and Remote Sensing, 2016, 54(12): 7405-7415. DOI: 10.1109/TGRS.2016.2601622.
[17] MELAX S, KESELMAN L, ORSTEN S. Dynamics based 3D skeletal hand tracking [C]// Proceedings of the 2013 Graphics Interface Conference. ACM, 2013: 63-70. DOI:10.1145/2448196.2448232.
[18] SRIDHAR S, OULASVIRTA A, THEOBALT C. Interactive markerless articulated hand motion tracking using RGB and depth data [C]// 2013 IEEE International Conference on Computer Vision. IEEE, 2013: 2456-2463. DOI: 10.1109/ICCV.2013.305.
[19] OIKONOMIDIS I, KYRIAZIS N, ARGYROS A. Efficient model-based 3D tracking of hand articulations using kinect [C]// Proceedings of the British Machine Vision Conference. BMVC, 2011: 101.1-101.11. DOI: 10.5244/C.25.101.
[20] ROMERO J, KJELLSTROM H, KRAGIC D. Monocular real-time 3D articulated hand pose estimation [C]// 2009 9th IEEE-RAS International Conference on Humanoid Robots. IEEE, 2009: 87-92. DOI: 10.1109/ICHR.2009.5379596.
[21] SUN X, WEI Y C, LIANG S, et al. Cascaded hand pose regression [C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2015: 824-832. DOI: 10.1109/CVPR.2015.7298683.
[22] TOMPSON J, JAIN A, LECUN Y, et al. Joint training of a convolutional network and a graphical model for human pose estimation [EB/OL]. (2014-09-17)[2019-03-01]. https://arxiv.org/pdf/1406.2984.pdf.
[23] JHINN W L, GOH K O M, HOE L S, et al. A contactless rotation-invariant palm vein recognition system [J]. Advanced Science Letters, 2018, 24(2): 1143-1148. DOI: 10.1166/asl.2018.10704.
[24] CHENG G, HAN J W, ZHOU P C, et al. Learning rotation-invariant and fisher discriminative convolutional neural networks for object detection [J]. IEEE Transactions on Image Processing, 2019, 28(1): 265-278. DOI: 10.1109/TIP.2018.2867198.
[25] SHOTTON J, SHARP T, KIPMAN A, et al. Realtime human pose recognition in parts from single depth images [J]. Communications of the ACM, 2013, 56(1): 116-124. DOI: 10.1145/2398356.2398381.
[26] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [EB/OL]. (2015-04-10)[2019-03-01]. https://arxiv.org/pdf/1409.1556.pdf.
[27] TANG D H, CHANG H J, TEJANI A, et al. Latent regression forest: structured estimation of 3D articulated hand posture [C]// 2014 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2014: 3786-3793. DOI: 10.1109/CVPR.2014.490.
[28] ŠARI’C M. Libhand: A library for hand articulation [EB/OL]. [2019-03-01]. http://www.libhand.org/.
[29] KINGMA D P, LEI BA J. Adam: A method for stochastic optimization [EB/OL]. (2017-01-30)[2019-03-01]. https://arxiv.org/pdf/1412.6980v9.pdf.
[30] GLOROT X, BENGIO Y. Understanding the difficulty of training deep feedforward neural networks [J]. Journal of Machine Learning Research, 2010, 9: 249-256.
Outlines

/