In this paper, we address the challenge of reconstructing general articulated 3D objects from a single video. Existing works employing dynamic neural radiance fields have advanced the modeling of articulated objects like humans and animals from videos, but face challenges with piece-wise rigid general articulated objects due to limitations in their deformation models.
To tackle this, we propose Quasi-Rigid Blend Skinning, a novel deformation model that enhances the rigidity of each part while maintaining flexible deformation of the joints. Our primary insight combines three distinct approaches: 1) an enhanced bone rigging system for improved component modeling, 2) the use of quasi-sparse skinning weights to boost part rigidity and reconstruction fidelity, and 3) the application of geodesic point assignment for precise motion and seamless deformation. Our method outperforms previous works in producing higher-fidelity 3D reconstructions of general articulated objects, as demonstrated on both real and synthetic datasets.
We model an articulated 3D object from a single video using a shape and appearance model based on a canonical Neural Radiance Field (NeRF) and a deformation model for transforming 3D points between the observation space and the canonical space. Instead of linear blend skinning or dual quaternion blend skinning designed for human or animal motion modeling, we propose Quasi-Rigid Blend Skinning (QRBS) as our deformation model, with the learned quasi-sparse skinning weights, to accurately transform 3D points from the observation space to the canonical space. We visualize the 3 bones for glasses in the canonical space. The colors in skinning weights signify the assigned bone for each point.
we compare reconstruction results of BANMo, MoDA, PPR, and Ours on different videos.
We show the effectiveness of our proposed Quasi-rigid blend skinning (QRBS) by comparing it to other deformation models such as Displacement, Real-NVP, and rigid skinning. Additionally, we compare different rigging strategies, specifically our method of rigging on bones versus rigging on joints.
Modeling general articulated objects:
A-SDF: Learning Disentangled Signed Distance Functions for Articulated Shape Representation
Self-supervised Neural Articulated Shape and Appearance Models
PARIS: Part-level Reconstruction and Motion Analysis for Articulated Objects
Modeling articulated objects (humans, animals, etc) from videos:
BANMo: Building Animatable 3D Neural Models from Many Casual Videos
@inproceedings{song2024reacto,
title={REACTO: Reconstructing Articulated Objects from a Single Video},
author={Song, Chaoyue and Wei, Jiacheng and Foo, Chuan Sheng and Lin, Guosheng and Liu, Fayao},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5384--5395},
year={2024}
}