HuMo for Wan

⭐ 0.0

⬇ 271 Downloads

👁 10 Views

🖼 28 Images

About this model

HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning

✨ Key Features

HuMo is a unified, human-centric video generation framework designed to produce high-quality, fine-grained, and controllable human videos from multimodal inputs—including text, images, and audio. It supports strong text prompt following, consistent subject preservation, synchronized audio-driven motion.

VideoGen from Text-Image - Customize character appearance, clothing, makeup, props, and scenes using text prompts combined with reference images.
VideoGen from Text-Audio - Generate audio-synchronized videos solely from text and audio inputs, removing the need for image references and enabling greater creative freedom.
VideoGen from Text-Image-Audio - Achieve the higher level of customization and control by combining text, image, and audio guidance.

Examples and models from the following sources reuploaded for your convenience here:
/> />Compatible with both 480P and 720P resolutions. 720P inference will achieve much better quality.

Model Info

Download Model

Type Checkpoint

Base Wan Video 14B t2v

Version HuMo 14B fp16

Creator Cyph3r

Rating 0.0

Downloads 271

Gallery 28 Images

HuMo for Wan

About this model

Tags

Related Models

Juggernaut XL

Pony Diffusion V6 XL

CyberRealistic Pony

CyberRealistic

epiCRealism XL

Nova Anime XL

LUSTIFY! [NSFW checkpoint]

Realism By Stable Yogi (Pony)