ComfyUI Flash Head Workflow: Ultrafast Head Lip-Sync

Video Introdution:

Click here to try workflow online:

(Notice：Some nodes are biulding by Runninghub ,if you downloading and running offline , may not work!)

Open Source Address: style="font-family:Calibri;font-size:10.5pt">

(Workflows can be downloaded via the links below—click the link and find the download button in the top right corner. Due to limited VRAM on my local machine, I haven't been able to test these myself. So if you're not familiar with running ComfyUI locally, it's best to run them online. The FlashHead node is built on RH.)

Workflow: AA--Ultra-Fast Digital Human FlashHead

Experience Link: style="font-family:Calibri;font-size:10.5pt">

Workflow: AA--Emotion Control Digital Human - Ultra-Fast FlashHead + Index Voice Cloning (8 Emotion Controls)

Experience Link: style="font-family:Calibri;font-size:10.5pt">

Workflow: AA--Preset Voice Ultra-Fast Digital Human - FlashHead + QwenTTS - One Image, 9 Voices

Experience Link: style="font-family:Calibri;font-size:10.5pt">

Workflow: AA--Fully Automatic Ultra-Fast Digital Human - FlashHead + Qwen Sound Design - Auto-Prompt from One Image - Digital Human Card Pull!

Experience Link: style="font-family:宋体;font-size:10.5pt">

### Introduction to Flash Head Digital Human Workflows

Flash Head is a digital human generation project running on ComfyUI, focused on speed. It achieves extreme video generation speed by only driving the head region for lip-sync, sacrificing dynamics in other parts of the body.

#### Core Features:

* Ultimate Speed: At 512p resolution, generating a 5-second video takes only about 30 seconds.

* Two Models: Offers Pro and Light versions. The Light version is three times faster than Pro but compromises on quality, suitable for quick validation.

* Image Requirement: Must use a facial close-up image; otherwise, the model cannot recognize the head and lips.

#### Main Workflows:

The following workflows are introduced to meet different application scenarios:

1. Basic Workflow

* The simplest version, containing only 6 core nodes.

2. Voice Cloning Digital Human

* Allows you to upload an image and reference audio to clone the voice and drive the digital human.

3. Voice Preset Digital Human

* Similar to cloning, but uses pre-set voices within the workflow, eliminating the need for user uploads.

4. Sound Design Digital Human

* Fully Automatic Workflow: You only need to upload an image. The model analyzes the image via a VQA prompting node, automatically generates a voice prompt, and then a TTS model designs and generates the sound based on that prompt.

#### Summary:

Overall, the Flash Head series of workflows performs well in scenarios that demand ultimate speed (such as real-time interaction, rapid prototyping) and are "worth trying out." However, there is still a gap in generation quality and stability compared to more mature solutions like Infinite Talk, so currently, they are "not recommended for productivity."