Journal of East China Normal University(Natural Science) >
Momentum-updated representation with reconstruction constraint for limited-view 3D object recognition
Received date: 2022-06-18
Online published: 2023-11-23
We propose a neural network training framework called momentum-updated representation with reconstruction constraint for 3D (three-dimensional) object recognition using 2D (two-dimensional) images without angle labels. First, self-supervised learning is employed to address the lack of angle labels. Second, we use momentum updating based on a dynamic queue to maintain the stability of the object representation. Furthermore, the reconstruction constraint is applied to the learning process with an auto-encoder module, which enables the representation to capture more semantic information of the objects. Finally, during training, a dynamic queue reduction strategy is proposed for handling the imbalanced data distribution. Experiments on two popular multi-view datasets, ModelNet and ShapeNet, demonstrate that the proposed method outperforms existing methods.
Ruibo CUI , Feng WANG . Momentum-updated representation with reconstruction constraint for limited-view 3D object recognition[J]. Journal of East China Normal University(Natural Science), 2023 , 2023(6) : 61 -72 . DOI: 10.3969/j.issn.1000-5641.2023.06.006
1 | QI S H, NING X, YANG G W, et al.. Review of multi-view 3D object recognition methods based on deep learning. Displays, 2021, 69, 102053. |
2 | FENG Y F, ZHANG Z Z, ZHAO X B, et al. GVCNN: Group-view convolutional neural networks for 3D shape recognition [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2018: 264-272. DOI: 10.1109/CVPR.2018.00035. |
3 | GAO Z, WANG D Y, HE X G, et al. Group-pair convolutional neural networks for multi-view based 3d object retrieval [C]// Proceedings of the AAAI Conference on Artificial Intelligence. AAAI, 2018: 2223-2231. DOI: 10.1007/s10489-021-02471-7. |
4 | SU H, MAJI S, KALOGERAKIS E, et al. Multi-view convolutional neural networks for 3D shape recognition [C]// Proceedings of the IEEE International Conference on Computer Vision. IEEE, 2015: 945-953. DOI:10.1109ICCV.2015.114. |
5 | SUN K, ZHANG J H, LIU J M, et al.. DRCNN: Dynamic routing convolutional neural network for multi-view 3D object recognition. IEEE Transactions on Image Processing, 2020, 30, 868- 877. |
6 | DAI G X, XIE J, FANG Y. Siamese CNN-BILSTM architecture for 3D shape representation learning [C]// Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI-18). AAAI, 2018: 670-676. DOI: 10.24963ijcai.201893. |
7 | WEI X, YU R X, SUN J. View-gcn: View-based graph convolutional network for 3d shape analysis [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2020: 1850-1859. DOI :10.1109CVPR42600.2020.00192. |
8 | HO C H, MORGADO P, PERSEKIAN A, et al. PIEs: Pose Invariant embeddings [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2019: 12369-12378. DOI: 10.1109/CVPR.2019.01266. |
9 | AGRAWAL P, CARREIRA J, MALIK J. Learning to see by moving [C]// Proceedings of the IEEE International Conference on Computer Vision. IEEE, 2015: 37-45. DOI: 10.1109/ICCV.2015.13. |
10 | JAYARAMANU D, GAO R, GRAUMAN K. Unsupervised learning through one-shot image-based shape reconstruction [EB/OL]. (2018-07-31)[2022-04-27]. https://arxiv.org/abs/1709.00505v1. |
11 | NOROOZI M, FAVARO P. Unsupervised learning of visual representations by solving jigsaw puzzles [C]// European Conference on Computer Vision, Computer Vision – ECCV 2016 . Cham: Springer. 2016: 69-84. DOI: 10.1007/978-3-319-46466-4_5. |
12 | HE K M, CHEN X L, XIE S N, et al. Masked autoencoders are scalable vision learners [C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022: 15979-15988. DOI: 10.1109/CVPR52688.2022.01553. |
13 | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth $ 16\times16 $ words: Transformers for image recognition at scale [EB/OL]. (2021-06-03)[2022-04-28]. https://doi.org/10.48550/arXiv.2010.11929. |
14 | HE K M, FAN H Q, WU Y X, et al. Momentum contrast for unsupervised visual representation learning [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2020: 9729-9738. DOI: 10.1109/CVPR42600.2020.00975. |
15 | CHEN T, KORNBLITH S, NOROUZI M, et al. A simple framework for contrastive learning of visual representations [C]// Proceedings of the 37th International Conference on Machine Learning( ICML’20). The Journal of Machine Learning Research (JMLR), 2020: 1597-1607. DOI: 10.5555/3524938.3525087. |
16 | CHEN X L, HE K M. Exploring simple siamese representation learning [C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2021: 15745-15753. DOI: 10.1109/CVPR46437.2021.01549. |
17 | YE M, ZHANG X, YUEN P C, et al. Unsupervised embedding learning via invariant and spreading instance feature [C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2019: 6203-6212. DOI: 10.1109/CVPR.2019.00637. |
18 | KANEZAKI A, MATSUSHITA Y, NISHIDA Y. RotationNet: Joint object categorization and pose estimation using multiviews from unsupervised viewpoints [C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2018: 5010-5019. DOI:10.1109/CVPR.2018.00526. |
19 | SONG R, ZHANG W, ZHAO Y T, et al. Unsupervised multi-view CNN for salient view selection of 3D objects and scenes [C]//European Conference on Computer Vision, Computer Vision – ECCV 2020. Cham: Springer, 2020: 454-470. DOI: 10.1007/s11263- 022-01592-x. |
20 | HO C H, LIU B, WU T Y, et al. Exploit clues from views: Self-supervised and regularized learning for multiview object recognition [C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020: 9087-9097. DOI:10.1109/CVPR42600.2020.00911. |
21 | KINGMA D P, WELLING M. Auto-encoding variational bayes [EB/OL]. (2014-05-01)[2022-04-28]. https://arxiv.org/abs/1312.6114v10. |
22 | GOOGFELLOW I, BENGIO Y, COURVILLE A, et al. Deep Learning [M]. Cambridge, MA USA: MIT Press, 2016. |
23 | KINGMA D P, SALIMANS T, WELLING M. Variational dropout and the local reparameterization trick [C]// Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2. Cambridge, MA USA: MIT Press. 2015: 2575-2583. |
24 | WU Z R, SONG S R, KHOSLA A, et al. 3D ShapeNets: A deep representation for volumetric shapes [C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2015: 1912-1920. DOI:10.1109/CVPR.2015.7298801. |
25 | CHANG A X, FUNKHOUSER T, Guibas L, et al. ShapeNet: An information-rich 3D model repository [EB/OL]. (2015-12-09)[2022-04-28]. https://arxiv.org/abs/1512.03012. |
26 | SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [EB/OL]. (2015-04-10)[2022-04-28]. https://arxiv.org/abs/1409.1556v6. |
27 | DENG J, DONG W, SOCHER R, et al. ImageNet: A large-scale hierarchical image database [C]// 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2009: 248-255. DOI:10.1109/CVPR.2009.5206848. |
28 | HOSPEDALES T, ANTONIOU A, MICAELLI P, et al.. Meta-learning in neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44 (9): 5149- 5169. |
29 | VAN DER M L, HINTON G.. Visualizing data using t-SNE. Journal of Machine Learning Research, 2008, 9, 2579- 2605. |
/
〈 |
|
〉 |