MoDA: Modeling Deformable 3D Objects from Casual Videos

IJCV 2024
1Nanyang Technological University, 2A*STAR

MoDA models the shape, texture and motion of deformable 3D objects.

Abstract

In this paper, we focus on the challenges of modeling deformable 3D objects from casual videos. With the popularity of neural radiance fields (NeRF), many works extend it to dynamic scenes with a canonical NeRF and a deformation model that achieves 3D point transformation between the observation space and the canonical space. Recent works rely on linear blend skinning (LBS) to achieve the canonical-observation transformation. However, the linearly weighted combination of rigid transformation matrices is not guaranteed to be rigid. As a matter of fact, unexpected scale and shear factors often appear. In practice, using LBS as the deformation model can always lead to skin-collapsing artifacts for bending or twisting motions.

To solve this problem, we propose neural dual quaternion blend skinning (NeuDBS) to achieve 3D point deformation, which can perform rigid transformation without skin-collapsing artifacts. Besides, we introduce a texture filtering approach for texture rendering that effectively minimizes the impact of noisy colors outside target deformable objects. Extensive experiments on real and synthetic datasets show that our approach can reconstruct 3D models for humans and animals with better qualitative and quantitative performance than state-of-the-art methods.

Video

Highlights

Propose neural dual quaternion blend skinning (NeuDBS) as our deformation model to replace LBS, which can resolve the skin-collapsing artifacts.

Introduce a texture filtering approach for texture rendering that effectively minimizes the impact of noisy colors outside target deformable objects.

Formulate the 2D-3D matching as an optimal transport problem that helps to refine the bad segmentation obtained from a off-the-shelf method and predict the consistent 3D shape.

Reconstruction

We compare reconstruction results of MoDA and BANMo, the skin-collapsing artifacts of BANMo are marked with red circles.

More reconstruction result can be found at Casual-adult (10 videos), Casual-human (10 videos), Casual-cat (11 videos), AMA (swing and samba of 16 videos).

Optimal transport for 2D-3D matching

By registering 2D pixels across different frames with optimal transport, we can refine the bad segmentation and predict the consistent 3D shape of the cat.

Ablation study on texture filtering

We show the effectiveness of texture filtering appraoch by adding it to both MoDA and BANMo.

Motion re-targeting

We compare the motion re-targeting results of MoDA and BANMo.

BibTeX

@article{song2024moda,
  title={Moda: Modeling deformable 3d objects from casual videos},
  author={Song, Chaoyue and Wei, Jiacheng and Chen, Tianyi and Chen, Yiwen and Foo, Chuan-Sheng and Liu, Fayao and Lin, Guosheng},
  journal={International Journal of Computer Vision},
  pages={1--20},
  year={2024},
  publisher={Springer}
}