CAP4D: 4D Avatars with Morphable Multi-View Diffusion Models

CAP4D introduces a unified framework for generating photorealistic and animate style rendering 4D portrait avatars from any number of reference images as well as even a single image. By combining Morphable Multi-View Diffusion Models (MMDMs) with 3D Gaussian Splatting, CAP4D enables real-time rendering and animation with state-of-the-art realism and identity consistency.

Key Highlights

Morphable Multi-View Diffusion Model (MMDM): Learns pose and expression-controlled view synthesis, producing self-consistent multi-view images from one or more references.
Stochastic I/O Conditioning: Allows scaling from 1 to 100 reference images, ensuring cross-view and temporal consistency without retraining.
4D Gaussian Representation: Trains an avatar, which can be animated, using 3D Gaussian splatting and achieving fine-grained, expression-dependent deformations.
Real-Time Rendering: Generates avatars that can be animated and viewed interactively, enabling live applications.
Superior Quality: Outperforms DiffusionRig, FlashAvatar, GaussianAvatars, and Portrait4D-v2 in realism, identity preservation, and expression fidelity.

Why It Matters

CAP4D bridges the gap between single-image generation and multi-view reconstruction, making 4D avatar creation scalable, photorealistic, and efficient. This breakthrough has implications for AR/VR, telepresence, digital content creation, and virtual humans, offering a pathway toward realistic, controllable avatars driven by minimal input data.

Explore More

Paper: arXiv:2412.12093
Project Page: https://felixtaubner.github.io/cap4d
Related LearnOpenCV Blog Posts:
- FramePack: https://learnopencv.com/framepack-video-diffusion/
- Video Generative Models: https://learnopencv.com/video-generation-models/

The post CAP4D: 4D Avatars with Morphable Multi-View Diffusion Models appeared first on OpenCV.