REACTO: Reconstructing Articulated Objects from a Single Video

Abstract

In this paper, we address the challenge of reconstructing general articulated 3D objects from a single video. Existing works employing dynamic neural radiance fields have advanced the modeling of articulated objects like humans and animals from videos, but face challenges with piece-wise rigid general articulated objects due to limitations in their deformation models.

To tackle this, we propose Quasi-Rigid Blend Skinning, a novel deformation model that enhances the rigidity of each part while maintaining flexible deformation of the joints. Our primary insight combines three distinct approaches: 1) an enhanced bone rigging system for improved component modeling, 2) the use of quasi-sparse skinning weights to boost part rigidity and reconstruction fidelity, and 3) the application of geodesic point assignment for precise motion and seamless deformation. Our method outperforms previous works in producing higher-fidelity 3D reconstructions of general articulated objects, as demonstrated on both real and synthetic datasets.

Video

Poster

Method overview

We model an articulated 3D object from a single video using a shape and appearance model based on a canonical Neural Radiance Field (NeRF) and a deformation model for transforming 3D points between the observation space and the canonical space. Instead of linear blend skinning or dual quaternion blend skinning designed for human or animal motion modeling, we propose Quasi-Rigid Blend Skinning (QRBS) as our deformation model, with the learned quasi-sparse skinning weights, to accurately transform 3D points from the observation space to the canonical space. We visualize the 3 bones for glasses in the canonical space. The colors in skinning weights signify the assigned bone for each point.

Comparison results

we compare reconstruction results of BANMo, MoDA, PPR, and Ours on different videos.

Ablation study

We show the effectiveness of our proposed Quasi-rigid blend skinning (QRBS) by comparing it to other deformation models such as Displacement, Real-NVP, and rigid skinning. Additionally, we compare different rigging strategies, specifically our method of rigging on bones versus rigging on joints.

BibTeX

@inproceedings{song2024reacto,
  title={REACTO: Reconstructing Articulated Objects from a Single Video},
  author={Song, Chaoyue and Wei, Jiacheng and Foo, Chuan Sheng and Lin, Guosheng and Liu, Fayao},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={5384--5395},
  year={2024}
}