LTX-2 ControlNet in ComfyUI | Depth-Controlled Video Workflow

Sharp control, perfect sync, super clear AI video creation.

Who it's for: creators who want this pipeline in ComfyUI without assembling nodes from scratch. Not for: one-click results with zero tuning — you still choose inputs, prompts, and settings.

Open preloaded workflow on RunComfy

Open preloaded workflow on RunComfy (browser)

Why RunComfy first
- Fewer missing-node surprises — run the graph in a managed environment before you mirror it locally.
- Quick GPU tryout — useful if your local VRAM or install time is the bottleneck.
- Matches the published JSON — the zip follows the same runnable workflow you can open on RunComfy.

When downloading for local ComfyUI makes sense — you want full control over models on disk, batch scripting, or offline runs.

How to use (local ComfyUI)
1. Load inputs (images/video/audio) in the marked loader nodes.
2. Set prompts, resolution, and seeds; start with a short test run.
3. Export from the Save / Write nodes shown in the graph.

Expectations — First run may pull large weights; cloud runs may require a free RunComfy account.

Overview

This ControlNet-powered LTX-2 workflow enables highly accurate video generation guided by explicit structural conditions such as depth maps, canny edges, and human poses. By using ControlNet-style IC LoRA conditioning, it enforces strong spatial and motion constraints across all frames while generating synchronized audio and visuals in a unified latent space. The workflow supports text-to-video, image-to-video, and video-to-video pipelines, allowing creators to precisely control scene structure, movement, and continuity. Its two-stage architecture provides efficient upscaling and optimized memory usage, making it ideal for refined, controllable, and production-ready video synthesis.

Important nodes:

Key nodes in Comfyui LTX-2 ControlNet workflow

LTXVAddGuide (#132)
Merges text conditioning and IC LoRA controls into the AV latent, acting as the heart of LTX-2 ControlNet guidance. Adjust only the few controls that matter: choose the control LoRA that matches your path (depth, canny, or pose) and, when available, the image_strength that tunes how tightly the model follows guides. Reference implementation and node behavior are provided by the LTXVideo extension. Docs/Code
LTXVImgToVideoInplace (#149, #155)
Injects a first-frame image into the AV latent for consistent scene initialization. Use strength to balance faithfulness to the first frame versus freedom to evolve; keep it lower for more motion and higher for tighter anchors. Bypass it when you want purely text- or control-driven openings. Docs/Code
LTXVScheduler (#95)
Drives the denoising trajectory for the unified latent so both audio and video converge together. Increase steps for complex scenes and fine detail; shorten for drafts and quick iteration. Schedule settings interact with guidance strength, so avoid extreme values when guidance is strong. Docs/Code
LTXVLatentUpsampler (#112)
Performs the second-stage latent upscaling with the LTX-2 x2 spatial upscaler, improving sharpness with minimal VRAM growth. Use it after the first pass rather than increasing base resolution to keep iterations responsive. Upscaler model
DWPreprocessor (#158)
Generates clean human pose keypoints for the pose-control path. Verify detections with the preview; if hands or small limbs are noisy, scale inputs to a moderate max dimension before preprocessing. Provided by the ControlNet auxiliary suite. Repo