Surface-height- and uncertainty-based depth estimation for Mono3D

Yinshuai JI; Jinhua XU

doi:10.3969/j.issn.1000-5641.2025.01.006

Journal of East China Normal University(Natural Science) >

2025 , Vol. 2025 >Issue 1: 72 - 81

DOI: https://doi.org/10.3969/j.issn.1000-5641.2025.01.006

Computer Science

Surface-height- and uncertainty-based depth estimation for Mono3D

Yinshuai JI ,
Jinhua XU

Expand

School of Computer Science and Technology, East China Normal University, Shanghai　200062, China

Received date: 2023-11-25

Online published: 2025-01-20

Copyright

Fold

Abstract

Monocular three-dimensional (3D) object detection is a fundamental but challenging task in autonomous driving and robotic navigation. Directly predicting object depth from a single image is essentially an ill-posed problem. Geometry projection is a powerful depth estimation method that infers an object’s depth from its physical and projected heights in the image plane. However, height estimation errors are amplified by the depth error. In this study, the physical and projected heights of object surface points (rather than the height of the object itself) were estimated to obtain several depth candidates. In addition, the uncertainties in the heights were estimated and the final object depth was obtained by assembling the depth predictions according to the uncertainties. Experiments demonstrated the effectiveness of the depth estimation method, which achieved state-of-the-art (SOTA) results on a monocular 3D object detection task of the KITTI dataset.

Key words： monocular 3D object detection (Mono3D); depth estimation; geometry projection; automatic driving

Cite this article

Yinshuai JI , Jinhua XU . Surface-height- and uncertainty-based depth estimation for Mono3D[J]. Journal of East China Normal University(Natural Science), 2025 , 2025(1) : 72 -81 . DOI: 10.3969/j.issn.1000-5641.2025.01.006

References

1	ZAMANAKOS G, TSOCHATZIDIS L, AMANATIADIS A, et al.. A comprehensive survey of LIDAR-based 3D object detection methods with deep learning for autonomous driving. Computers & Graphics, 2021, 99, 153- 181.
2	FAN L, PANG Z Q, ZHANG T Y, et al. Embracing single stride 3D object detector with sparse transformer [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). IEEE, 2022: 8448-8458.
3	SUN P, TAN M X, WANG W Y, et al. SWFormer: Sparse window transformer for 3D object detection in point clouds [C]// Computer Vision – ECCV 2022, ECCV 2022, Lecture Notes in Computer Science, vol 13670. Cham: Springer, 2022: 426-442.
4	SHI G S, LI R F, MA C. PillarNet: Real-time and high-performance pillar-based 3D object detection [C]// Computer Vision – ECCV 2022, ECCV 2022, Lecture Notes in Computer Science, vol 13670. Cham: Springer, 2022: 35-52.
5	CAI Y J, LI B Y, JIAO Z Y, et al. Monocular 3D object detection with decoupled structured polygon estimation and height-guided depth estimation [J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 10478-10485.
6	SHI X P, YE Q, CHEN X Z, et al. Geometry-based distance decomposition for monocular 3D object detection [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. IEEE, 2021: 15172-15181.
7	LU Y, MA X Z, YANG L, et al. Geometry uncertainty projection network for monocular 3D object detection [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. IEEE, 2021: 3111-3121.
8	吉银帅, 续晋华, 孙仕亮. 一种基于目标表面点高度和不确定性的单目深度估计方法: CN116843737A [P]. 2023-10-03.
9	ZHANG Y P, LU J W, ZHOU J. Objects are different: Flexible monocular 3D object detection [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2021: 3289-3298.
10	LI Z L, QU Z, ZHOU Y, et al. Diversity matters: Fully exploiting depth clues for reliable monocular 3D object detection [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2022: 2791-2800.
11	MA X Z, ZHANG Y M, XU D, et al. Delving into localization errors for monocular 3D object detection [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2021: 4721-4730.
12	LI P X, ZHAO H C, LIU P F, et al. RTM3D: Real-time monocular 3D detection from object keypoints for autonomous driving [C]// Computer Vision – ECCV 2020, ECCV 2020, Lecture Notes in Computer Science, vol 12348. Cham: Springer, 2020: 644-660.
13	DING M Y, HUO Y Q, YI H W, et al. Learning depth-guided convolutions for monocular 3D object detection [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. IEEE, 2020: 11672-11681.
14	CHEN X Z, KUNDU K, ZHANG Z Y, et al. Monocular 3D object detection for autonomous driving [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2016: 2147-2156.
15	BRAZIL G, LIU X M. M3D-RPN: Monocular 3D region proposal network for object detection [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. IEEE, 2019: 9287-9296.
16	QIN Z Q, LI X. MonoGround: Detecting monocular 3D objects from the ground [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2022: 3793-3802.
17	PENG L, WU X P, YANG Z, et al. DID-M3D: Decoupling instance depth for monocular 3D object detection [C]// Computer Vision – ECCV 2022, ECCV 2022, Lecture Notes in Computer Science, vol 13661. Cham: Springer, 2022: 71-88.
18	SHI S S, WANG X G, LI H S. PointRCNN: 3D object proposal generation and detection from point cloud [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2019: 770-779.
19	ZHOU Y, TUZEL O. VoxelNet: End-to-end learning for point cloud based 3D object detection [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2018: 4490-4499.
20	LANG A H, VORA S, CAESAR H, et al. PointPillars: Fast encoders for object detection from point clouds [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2019: 12697-12705.
21	RODDICK T, KENDALL A, CIPOLLA R. Orthographic feature transform for monocular 3D object detection [EB/OL]. (2018-11-20)[2023-10-08]. https://doi.org/10.48550/arXiv.1811.08188.
22	READING C, HARAKEH A, CHAE J, et al. Categorical depth distribution network for monocular 3D object detection [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2021: 8555-8564.
23	WANG Y, CHAO W L, GARG D, et al. Pseudo-lidar from visual depth estimation: Bridging the gap in 3D object detection for autonomous driving [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2019: 8445-8453.
24	MA X Z, WANG Z H, LI H J, et al. Accurate monocular 3D object detection via color-embedded 3D reconstruction for autonomous driving [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. IEEE, 2019: 6851-6860.
25	CHONG Z Y, MA X Z, ZHANG H, et al. MonoDistill: Learning spatial features for monocular 3D object detection [EB/OL]. (2022-01-26)[2023-10-08]. https://doi.org/10.48550/arXiv.2201.10830.
26	HU M, WANG S L, LI B, et al. PENet: Towards precise and efficient image guided depth completion [C]// 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021: 13656-13662.
27	PHUONG M, LAMPERT C H. Towards understanding knowledge distillation [EB/OL]. (2021-05-27)[2023-10-08]. https://doi.org/10.48550/arXiv.2105.13093.
28	ANGER H O.. Use of a gamma-ray pinhole camera for in vivo studies. Nature, 1952, 170 (4318): 200- 201.
29	YU F, WANG D Q, SHELHAMER E, et al. Deep layer aggregation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2018: 2403-2412.
30	MOUSAVIAN A, ANGUELOV D, FLYNN J, et al. 3D bounding box estimation using deep learning and geometry [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2017: 7074-7082.
31	GEIGER A, LENZ P, STILLER C, et al.. Vision meets robotics: The KITTI dataset. The International Journal of Robotics Research, 2013, 32 (11): 1231- 1237.
32	KENDALL A, GAL Y. What uncertainties do we need in bayesian deep learning for computer vision? [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY, United States: Curran Associates Inc., 2017: 5580–5590.
33	SIMONELLI A, BULO S R, PORZI L, et al. Disentangling monocular 3D object detection [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. IEEE, 2019: 1991-1999.
34	WANG L, ZHANG L, ZHU Y, et al. Progressive coordinate transforms for monocular 3D object detection [C]// Advances in Neural Information Processing Systems 34 (NeurIPS 2021), 2021: 13364-13377.
35	HUANG K C, WU T H, SU H T, et al. MonoDTR: Monocular 3D object detection with depth-aware transformer [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2022: 4012-4021.
36	LIAN Q, LI P L, CHEN X Z. MonoJSG: Joint semantic and geometric cost volume for monocular 3D object detection [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2022: 1070-1079.

Options

Outlines

模态框（Modal）标题

Abstract

Cite this article

References