Computer Science

Momentum-updated representation with reconstruction constraint for limited-view 3D object recognition

  • Ruibo CUI ,
  • Feng WANG
Expand
  • School of Computer Science and Technology, East China Normal University, Shanghai 200062, China

Received date: 2022-06-18

  Online published: 2023-11-23

Abstract

We propose a neural network training framework called momentum-updated representation with reconstruction constraint for 3D (three-dimensional) object recognition using 2D (two-dimensional) images without angle labels. First, self-supervised learning is employed to address the lack of angle labels. Second, we use momentum updating based on a dynamic queue to maintain the stability of the object representation. Furthermore, the reconstruction constraint is applied to the learning process with an auto-encoder module, which enables the representation to capture more semantic information of the objects. Finally, during training, a dynamic queue reduction strategy is proposed for handling the imbalanced data distribution. Experiments on two popular multi-view datasets, ModelNet and ShapeNet, demonstrate that the proposed method outperforms existing methods.

Cite this article

Ruibo CUI , Feng WANG . Momentum-updated representation with reconstruction constraint for limited-view 3D object recognition[J]. Journal of East China Normal University(Natural Science), 2023 , 2023(6) : 61 -72 . DOI: 10.3969/j.issn.1000-5641.2023.06.006

References

1 QI S H, NING X, YANG G W, et al.. Review of multi-view 3D object recognition methods based on deep learning. Displays, 2021, 69, 102053.
2 FENG Y F, ZHANG Z Z, ZHAO X B, et al. GVCNN: Group-view convolutional neural networks for 3D shape recognition [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2018: 264-272. DOI: 10.1109/CVPR.2018.00035.
3 GAO Z, WANG D Y, HE X G, et al. Group-pair convolutional neural networks for multi-view based 3d object retrieval [C]// Proceedings of the AAAI Conference on Artificial Intelligence. AAAI, 2018: 2223-2231. DOI: 10.1007/s10489-021-02471-7.
4 SU H, MAJI S, KALOGERAKIS E, et al. Multi-view convolutional neural networks for 3D shape recognition [C]// Proceedings of the IEEE International Conference on Computer Vision. IEEE, 2015: 945-953. DOI:10.1109ICCV.2015.114.
5 SUN K, ZHANG J H, LIU J M, et al.. DRCNN: Dynamic routing convolutional neural network for multi-view 3D object recognition. IEEE Transactions on Image Processing, 2020, 30, 868- 877.
6 DAI G X, XIE J, FANG Y. Siamese CNN-BILSTM architecture for 3D shape representation learning [C]// Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI-18). AAAI, 2018: 670-676. DOI: 10.24963ijcai.201893.
7 WEI X, YU R X, SUN J. View-gcn: View-based graph convolutional network for 3d shape analysis [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2020: 1850-1859. DOI :10.1109CVPR42600.2020.00192.
8 HO C H, MORGADO P, PERSEKIAN A, et al. PIEs: Pose Invariant embeddings [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2019: 12369-12378. DOI: 10.1109/CVPR.2019.01266.
9 AGRAWAL P, CARREIRA J, MALIK J. Learning to see by moving [C]// Proceedings of the IEEE International Conference on Computer Vision. IEEE, 2015: 37-45. DOI: 10.1109/ICCV.2015.13.
10 JAYARAMANU D, GAO R, GRAUMAN K. Unsupervised learning through one-shot image-based shape reconstruction [EB/OL]. (2018-07-31)[2022-04-27]. https://arxiv.org/abs/1709.00505v1.
11 NOROOZI M, FAVARO P. Unsupervised learning of visual representations by solving jigsaw puzzles [C]// European Conference on Computer Vision, Computer Vision – ECCV 2016 . Cham: Springer. 2016: 69-84. DOI: 10.1007/978-3-319-46466-4_5.
12 HE K M, CHEN X L, XIE S N, et al. Masked autoencoders are scalable vision learners [C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022: 15979-15988. DOI: 10.1109/CVPR52688.2022.01553.
13 DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth $ 16\times16 $ words: Transformers for image recognition at scale [EB/OL]. (2021-06-03)[2022-04-28]. https://doi.org/10.48550/arXiv.2010.11929.
14 HE K M, FAN H Q, WU Y X, et al. Momentum contrast for unsupervised visual representation learning [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2020: 9729-9738. DOI: 10.1109/CVPR42600.2020.00975.
15 CHEN T, KORNBLITH S, NOROUZI M, et al. A simple framework for contrastive learning of visual representations [C]// Proceedings of the 37th International Conference on Machine Learning( ICML’20). The Journal of Machine Learning Research (JMLR), 2020: 1597-1607. DOI: 10.5555/3524938.3525087.
16 CHEN X L, HE K M. Exploring simple siamese representation learning [C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2021: 15745-15753. DOI: 10.1109/CVPR46437.2021.01549.
17 YE M, ZHANG X, YUEN P C, et al. Unsupervised embedding learning via invariant and spreading instance feature [C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2019: 6203-6212. DOI: 10.1109/CVPR.2019.00637.
18 KANEZAKI A, MATSUSHITA Y, NISHIDA Y. RotationNet: Joint object categorization and pose estimation using multiviews from unsupervised viewpoints [C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2018: 5010-5019. DOI:10.1109/CVPR.2018.00526.
19 SONG R, ZHANG W, ZHAO Y T, et al. Unsupervised multi-view CNN for salient view selection of 3D objects and scenes [C]//European Conference on Computer Vision, Computer Vision – ECCV 2020. Cham: Springer, 2020: 454-470. DOI: 10.1007/s11263- 022-01592-x.
20 HO C H, LIU B, WU T Y, et al. Exploit clues from views: Self-supervised and regularized learning for multiview object recognition [C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020: 9087-9097. DOI:10.1109/CVPR42600.2020.00911.
21 KINGMA D P, WELLING M. Auto-encoding variational bayes [EB/OL]. (2014-05-01)[2022-04-28]. https://arxiv.org/abs/1312.6114v10.
22 GOOGFELLOW I, BENGIO Y, COURVILLE A, et al. Deep Learning [M]. Cambridge, MA USA: MIT Press, 2016.
23 KINGMA D P, SALIMANS T, WELLING M. Variational dropout and the local reparameterization trick [C]// Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2. Cambridge, MA USA: MIT Press. 2015: 2575-2583.
24 WU Z R, SONG S R, KHOSLA A, et al. 3D ShapeNets: A deep representation for volumetric shapes [C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2015: 1912-1920. DOI:10.1109/CVPR.2015.7298801.
25 CHANG A X, FUNKHOUSER T, Guibas L, et al. ShapeNet: An information-rich 3D model repository [EB/OL]. (2015-12-09)[2022-04-28]. https://arxiv.org/abs/1512.03012.
26 SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [EB/OL]. (2015-04-10)[2022-04-28]. https://arxiv.org/abs/1409.1556v6.
27 DENG J, DONG W, SOCHER R, et al. ImageNet: A large-scale hierarchical image database [C]// 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2009: 248-255. DOI:10.1109/CVPR.2009.5206848.
28 HOSPEDALES T, ANTONIOU A, MICAELLI P, et al.. Meta-learning in neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44 (9): 5149- 5169.
29 VAN DER M L, HINTON G.. Visualizing data using t-SNE. Journal of Machine Learning Research, 2008, 9, 2579- 2605.
Outlines

/