Unsupervised 3D Pose Transfer with Cross Consistency and Dual Reconstruction
TPAMI 2023
Chaoyue Song
Jiacheng Wei
Ruibo Li
Fayao Liu
Guosheng Lin

[Paper]
[GitHub]


Abstract

The goal of 3D pose transfer is to transfer the pose from the source mesh to the target mesh while preserving the identity information (e.g., face, body shape) of the target mesh. Deep learning-based methods improved the efficiency and performance of 3D pose transfer. However, most of them are trained under the supervision of the ground truth, whose availability is limited in real-world scenarios. In this work, we present X-DualNet, a simple yet effective approach that enables unsupervised 3D pose transfer. In X-DualNet, we introduce a generator G which contains correspondence learning and pose transfer modules to achieve 3D pose transfer. We learn the shape correspondence by solving an optimal transport problem without any key point annotations and generate high-quality meshes with our elastic instance normalization (ElaIN) in the pose transfer module. With G as the basic component, we propose a cross consistency learning scheme and a dual reconstruction objective to learn the pose transfer without supervision. Besides that, we also adopt an as-rigid-as-possible deformer in the training process to fine-tune the body shape of the generated results. Extensive experiments on human and animal data demonstrate that our framework can successfully achieve comparable performance as the state-of-the-art supervised approaches.


Method

In X-DualNet, we introduce a generator G that contains correspondence learning and pose transfer modules to achieve the 3D pose transfer. We learn the shape correspondence by solving an optimal transport problem without any key point annotations and generate high-quality meshes with our proposed elastic instance normalization in the pose transfer module. With the generator G as the basic component, we propose a cross consistency learning scheme and a dual reconstruction objective to learn the pose transfer without supervision. Besides that, we also adopt an as-rigid-as-possible deformer to fine-tune the body shape of the generated results.



Comparison with other methods on human meshes

The identity and pose meshes are from SMPL. Our method and 3D-CoreNet can generate better results than NPT and the proposed unsupervised baseline. The surfaces of meshes generated by NPT are not always smooth. And the proposed unsupervised baseline cannot preserve the body shapes very well.



Comparison with other methods on animal meshes

The identity and pose meshes are from SMAL. Our method and 3D-CoreNet produce satisfactory results on the animal data. NPT produces many artifacts and cannot transfer the pose successfully. The proposed baseline sometimes cannot preserve the shape identity (e.g., the tail) well.



Ablation studies

we study the effectiveness of ElaIN, ARAP deformer, and backward correspondence loss in our model on human data.

When we replace our ElaIN with SPAdaIN, the surface of the mesh has clear artifacts and is not smooth.

When we do not add the ARAP deformer in the training loop, the generated results do not preserve the body shape well which can be shown in the green bounding boxes. In the third column, the body shape is sunken from the left and right sides. When we remove the backward correspondence loss in the training, the pose transfer results are not accurate which can be shown in the red bounding boxes. In the fourth column, the right arm is farther from the body than others.

Paper

C. Song, J. Wei, R. Li, F. Liu, G. Lin.
Unsupervised 3D Pose Transfer with Cross Consistency and Dual Reconstruction.
TPAMI, 2023.
(hosted on ArXiv)

[Bibtex]


Related projects

3D Pose Transfer with Correspondence Learning and Mesh Refinement. NeurIPS 2021.
Neural Pose Transfer by Spatially Adaptive Instance Normalization. CVPR 2020.



Acknowledgements

This template was originally made by Phillip Isola and Richard Zhang for a colorful ECCV project; the code can be found here.