←back to Blog

CAP4D: 4D Avatars with Morphable Multi-View Diffusion Models

CAP4D introduces a unified framework for generating photorealistic and animate style rendering 4D portrait avatars from any number of reference images as well as even a single image. By combining Morphable Multi-View Diffusion Models (MMDMs) with 3D Gaussian Splatting, CAP4D enables real-time rendering and animation with state-of-the-art realism and identity consistency.

Key Highlights

  • Morphable Multi-View Diffusion Model (MMDM): Learns pose and expression-controlled view synthesis, producing self-consistent multi-view images from one or more references.
  • Stochastic I/O Conditioning: Allows scaling from 1 to 100 reference images, ensuring cross-view and temporal consistency without retraining.
  • 4D Gaussian Representation: Trains an avatar, which can be animated, using 3D Gaussian splatting and achieving fine-grained, expression-dependent deformations.
  • Real-Time Rendering: Generates avatars that can be animated and viewed interactively, enabling live applications.
  • Superior Quality: Outperforms DiffusionRig, FlashAvatar, GaussianAvatars, and Portrait4D-v2 in realism, identity preservation, and expression fidelity.

Why It Matters

 CAP4D bridges the gap between single-image generation and multi-view reconstruction, making 4D avatar creation scalable, photorealistic, and efficient. This breakthrough has implications for AR/VR, telepresence, digital content creation, and virtual humans, offering a pathway toward realistic, controllable avatars driven by minimal input data.

Explore More

The post CAP4D: 4D Avatars with Morphable Multi-View Diffusion Models appeared first on OpenCV.