Level-S2fM: Structure from Motion on Neural Level Set of Implicit Surfaces

1Wuhan University, 2NC State University

Level-S2fM incrementally recovery the 3D structure and camera poses on a neural level sets, which is driven by the 2D correspondences

and rendering. Meanwhile, the top-down regularization of neural level sets help to filter the outliers.

Abstract

This paper presents a neural incremental Structure-from-Motion (SfM) approach, Level-S2fM In our formulation, we aim at simultaneously learning coordinate MLPs for the implicit surfaces and the radiance fields, and estimating the camera poses and scene geometry, which is mainly sourced from the established keypoint correspondences by SIFT.

Our formulation would face some new challenges due to inevitable two-view and few-view configurations at the beginning of incremental SfM pipeline for the optimization of coordinate MLPs, but we found that the strong inductive biases conveying in the 2D correspondences are feasible and promising to avoid those challenges by exploiting the relationship between the ray sampling schemes used in volumetric rendering and the sphere tracing of finding the zero-level set of implicit surfaces.

Based on this, we revisit the pipeline of incremental SfM and renew the key components of two-view geometry initialization, the camera pose registration, and the 3D points triangulation, as well as the Bundle Adjustment in a novel perspective of neural implicit surfaces. Because the coordinate MLPs unified the scene geometry in small MLP networks, our Level- S2fM treats the zero-level set of the implicit surface as an informative top-down regularization to manage the reconstructed 3D points, reject the outlier of correspondences by querying SDF, adjust the estimated geometries by NBA (Neural BA), finally yielding promising results of 3D reconstruction. Furthermore, our Level-S2fM alleviated the requirement of camera poses for neural 3D reconstruction.

Method

SDF-Based Triangulation

Neural Bundle Adjustment

Comparison

Sparse pointcloud of Ours and Colmap

Our pointcloud is painted by red, while pointcloud from Colmap is painted by green. The textured mesh is refused from the extracted mesh of our trained SDF. During our training, the global zero-level set palyed as a top-down regularization which easily erase the outliers and average the errors from matches!


Related Links

Level-S2fM provides a perspective to rethink Structure-from-Motion problem with neural field and neural rendering. However, we observe that there are also a lot of excellent works introduce some different solutions around the same time as ours. We sincerely hope this technology can be better optimized and developed in the future.

NoPe-NeRF introduces to incorporate the monocular depth priors to help the joint optimization of camera poses and NeRF. We take this as a new perspective to revisit global SfM.

L2G-NeRF achieves a technical progress of BARF by introducing a warp neural field and local and global alignment.

BibTeX

@article{xiao2022level,
      title={Level-S $\^{} 2$ fM: Structure from Motion on Neural Level Set of Implicit Surfaces},
      author={Xiao, Yuxi and Xue, Nan and Wu, Tianfu and Xia, Gui-Song},
      journal={arXiv preprint arXiv:2211.12018},
      year={2022}
    }