Momentum-updated representation with reconstruction constraint for limited-view 3D object recognition

Ruibo CUI; Feng WANG

doi:10.3969/j.issn.1000-5641.2023.06.006

Journal of East China Normal University(Natural Science) >

2023 , Vol. 2023 >Issue 6: 61 - 72

DOI: https://doi.org/10.3969/j.issn.1000-5641.2023.06.006

Computer Science

Momentum-updated representation with reconstruction constraint for limited-view 3D object recognition

Ruibo CUI ,
Feng WANG

Expand

School of Computer Science and Technology, East China Normal University, Shanghai　200062, China

Received date: 2022-06-18

Online published: 2023-11-23

Fold

Abstract

We propose a neural network training framework called momentum-updated representation with reconstruction constraint for 3D (three-dimensional) object recognition using 2D (two-dimensional) images without angle labels. First, self-supervised learning is employed to address the lack of angle labels. Second, we use momentum updating based on a dynamic queue to maintain the stability of the object representation. Furthermore, the reconstruction constraint is applied to the learning process with an auto-encoder module, which enables the representation to capture more semantic information of the objects. Finally, during training, a dynamic queue reduction strategy is proposed for handling the imbalanced data distribution. Experiments on two popular multi-view datasets, ModelNet and ShapeNet, demonstrate that the proposed method outperforms existing methods.

Key words： multi-view object recognition; self-supervised learning; auto-encoder

Cite this article

Ruibo CUI , Feng WANG . Momentum-updated representation with reconstruction constraint for limited-view 3D object recognition[J]. Journal of East China Normal University(Natural Science), 2023 , 2023(6) : 61 -72 . DOI: 10.3969/j.issn.1000-5641.2023.06.006

References

1	QI S H, NING X, YANG G W, et al.. Review of multi-view 3D object recognition methods based on deep learning. Displays, 2021, 69, 102053.
2	FENG Y F, ZHANG Z Z, ZHAO X B, et al. GVCNN: Group-view convolutional neural networks for 3D shape recognition [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2018: 264-272. DOI: 10.1109/CVPR.2018.00035.
3	GAO Z, WANG D Y, HE X G, et al. Group-pair convolutional neural networks for multi-view based 3d object retrieval [C]// Proceedings of the AAAI Conference on Artificial Intelligence. AAAI, 2018: 2223-2231. DOI: 10.1007/s10489-021-02471-7.
4	SU H, MAJI S, KALOGERAKIS E, et al. Multi-view convolutional neural networks for 3D shape recognition [C]// Proceedings of the IEEE International Conference on Computer Vision. IEEE, 2015: 945-953. DOI:10.1109ICCV.2015.114.
5	SUN K, ZHANG J H, LIU J M, et al.. DRCNN: Dynamic routing convolutional neural network for multi-view 3D object recognition. IEEE Transactions on Image Processing, 2020, 30, 868- 877.
6	DAI G X, XIE J, FANG Y. Siamese CNN-BILSTM architecture for 3D shape representation learning [C]// Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI-18). AAAI, 2018: 670-676. DOI: 10.24963ijcai.201893.
7	WEI X, YU R X, SUN J. View-gcn: View-based graph convolutional network for 3d shape analysis [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2020: 1850-1859. DOI :10.1109CVPR42600.2020.00192.
8	HO C H, MORGADO P, PERSEKIAN A, et al. PIEs: Pose Invariant embeddings [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2019: 12369-12378. DOI: 10.1109/CVPR.2019.01266.
9	AGRAWAL P, CARREIRA J, MALIK J. Learning to see by moving [C]// Proceedings of the IEEE International Conference on Computer Vision. IEEE, 2015: 37-45. DOI: 10.1109/ICCV.2015.13.
10	JAYARAMANU D, GAO R, GRAUMAN K. Unsupervised learning through one-shot image-based shape reconstruction [EB/OL]. (2018-07-31)[2022-04-27]. https://arxiv.org/abs/1709.00505v1.
11	NOROOZI M, FAVARO P. Unsupervised learning of visual representations by solving jigsaw puzzles [C]// European Conference on Computer Vision, Computer Vision – ECCV 2016 . Cham: Springer. 2016: 69-84. DOI: 10.1007/978-3-319-46466-4_5.
12	HE K M, CHEN X L, XIE S N, et al. Masked autoencoders are scalable vision learners [C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022: 15979-15988. DOI: 10.1109/CVPR52688.2022.01553.
13	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth $ 16\times16 $ words: Transformers for image recognition at scale [EB/OL]. (2021-06-03)[2022-04-28]. https://doi.org/10.48550/arXiv.2010.11929.
14	HE K M, FAN H Q, WU Y X, et al. Momentum contrast for unsupervised visual representation learning [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2020: 9729-9738. DOI: 10.1109/CVPR42600.2020.00975.
15	CHEN T, KORNBLITH S, NOROUZI M, et al. A simple framework for contrastive learning of visual representations [C]// Proceedings of the 37th International Conference on Machine Learning( ICML’20). The Journal of Machine Learning Research (JMLR), 2020: 1597-1607. DOI: 10.5555/3524938.3525087.
16	CHEN X L, HE K M. Exploring simple siamese representation learning [C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2021: 15745-15753. DOI: 10.1109/CVPR46437.2021.01549.
17	YE M, ZHANG X, YUEN P C, et al. Unsupervised embedding learning via invariant and spreading instance feature [C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2019: 6203-6212. DOI: 10.1109/CVPR.2019.00637.
18	KANEZAKI A, MATSUSHITA Y, NISHIDA Y. RotationNet: Joint object categorization and pose estimation using multiviews from unsupervised viewpoints [C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2018: 5010-5019. DOI:10.1109/CVPR.2018.00526.
19	SONG R, ZHANG W, ZHAO Y T, et al. Unsupervised multi-view CNN for salient view selection of 3D objects and scenes [C]//European Conference on Computer Vision, Computer Vision – ECCV 2020. Cham: Springer, 2020: 454-470. DOI: 10.1007/s11263- 022-01592-x.
20	HO C H, LIU B, WU T Y, et al. Exploit clues from views: Self-supervised and regularized learning for multiview object recognition [C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020: 9087-9097. DOI:10.1109/CVPR42600.2020.00911.
21	KINGMA D P, WELLING M. Auto-encoding variational bayes [EB/OL]. (2014-05-01)[2022-04-28]. https://arxiv.org/abs/1312.6114v10.
22	GOOGFELLOW I, BENGIO Y, COURVILLE A, et al. Deep Learning [M]. Cambridge, MA USA: MIT Press, 2016.
23	KINGMA D P, SALIMANS T, WELLING M. Variational dropout and the local reparameterization trick [C]// Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2. Cambridge, MA USA: MIT Press. 2015: 2575-2583.
24	WU Z R, SONG S R, KHOSLA A, et al. 3D ShapeNets: A deep representation for volumetric shapes [C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2015: 1912-1920. DOI:10.1109/CVPR.2015.7298801.
25	CHANG A X, FUNKHOUSER T, Guibas L, et al. ShapeNet: An information-rich 3D model repository [EB/OL]. (2015-12-09)[2022-04-28]. https://arxiv.org/abs/1512.03012.
26	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [EB/OL]. (2015-04-10)[2022-04-28]. https://arxiv.org/abs/1409.1556v6.
27	DENG J, DONG W, SOCHER R, et al. ImageNet: A large-scale hierarchical image database [C]// 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2009: 248-255. DOI:10.1109/CVPR.2009.5206848.
28	HOSPEDALES T, ANTONIOU A, MICAELLI P, et al.. Meta-learning in neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44 (9): 5149- 5169.
29	VAN DER M L, HINTON G.. Visualizing data using t-SNE. Journal of Machine Learning Research, 2008, 9, 2579- 2605.

Options

Outlines

模态框（Modal）标题

Abstract

Cite this article

References