Research on Computer Vision 3D Reconstruction and Interactive Perception Technology for the Metaverse_Vol. 13. FSSD2025_Conferences

Home > Conferences > Vol. 13. FSSD2025 >

Research on Computer Vision 3D Reconstruction and Interactive Perception Technology for the Metaverse

Download PDF

DOI: https://doi.org/10.62381/ACS.FSSD2025.25

Author(s)

Hongyi Li

Affiliation(s)

Sino-French Institute of Nuclear Engineering & Technology, Sun Yat-sen University, Zhuhai, China

Abstract

As an emerging field where virtual and reality are deeply integrated, the metaverse has put forward higher requirements for computer vision 3D reconstruction and interactive perception technologies. This paper delves deeply into the computer vision 3D reconstruction technology for the metaverse, including the principles and processes of 3D reconstruction based on multi-view geometry, deep learning and other methods, and analyzes its application advantages and challenges in the construction of metaverse scenes. Meanwhile, the interactive perception technology was studied, covering key technologies such as gesture recognition, pose estimation, and eye-tracking, as well as their significant roles in achieving natural interaction in the metaverse. In addition, the integrated development of 3D reconstruction and interactive perception technologies, as well as their future development trends and application prospects in the metaverse, was also discussed, aiming to provide theoretical support and technical references for the further development of metaverse-related technologies.

Keywords

Metaverse; Computer Vision; Three-Dimensional Reconstruction; Interactive Perception

References

[1] Kozinets, R. V. (2023). Immersive netnography: a novel method for service experience research in virtual reality, augmented reality and metaverse contexts. Journal of Service Management, 34(1), 100-125. [2] Yanwen, Z., Kai, H., & Pengsheng, W. (2020). Review of 3D reconstruction algorithms. Nanjing Xinxi Gongcheng Daxue Xuebao, 12(5), 591-602. [3] Smith, J. W. (2015). Immersive virtual environment technology to supplement environmental perception, preference and behavior research: a review with applications. International journal of environmental research and public health, 12(9), 11486-11505. [4] Hartley, R. (2003). Multiple view geometry in computer vision (Vol. 665). Cambridge university press. [5] Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60, 91-110. [6] Zhang, Z. (2002). A flexible new technique for camera calibration. IEEE Transactions on pattern analysis and machine intelligence, 22(11), 1330-1334. [7] Schonberger, J. L., & Frahm, J. M. (2016). Structure-from-motion revisited. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4104-4113). [8] Choy, C. B., Xu, D., Gwak, J., Chen, K., & Savarese, S. (2016). 3d-r2n2: A unified approach for single and multi-view 3d object reconstruction. In Computer vision–ECCV 2016: 14th European conference, amsterdam, the netherlands, October 11-14, 2016, proceedings, part VIII 14 (pp. 628-644). Springer International Publishing. [9] Eigen, D., Puhrsch, C., & Fergus, R. (2014). Depth map prediction from a single image using a multi-scale deep network. Advances in neural information processing systems, 27. [10] Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., & Xiao, J. (2015). 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1912-1920).