ERNIE‑Image
About this model
Originally Posted: is an open text-to-image model from the ERNIE-Image team at Baidu. Built on a single-stream Diffusion Transformer (DiT) with 8B parameters in a latent diffusion (LDM) framework, it ships with a lightweight Prompt Enhancer that expands brief inputs into richer, more structured prompts to better unlock the model's capabilities. With only 8B DiT parameters, ERNIE-Image achieves state-of-the-art performance among open weights text-to-image models — and it is built not just for visual appeal, but for controllability: accurate content depiction matters as much as aesthetics. In practice, it excels at complex instruction following, precise text rendering, and structured image generation — areas where many existing open weights models still fall short.
Key Features
•Competitive performance at compact scale: With only 8B DiT parameters, ERNIE-Image remains competitive with substantially larger models and achieves leading performance among open weights models on several challenging benchmarks.
•Precise text rendering: ERNIE-Image handles dense, long-form, and layout-sensitive text especially well, producing readable and faithful results in Chinese, English, and other languages.
•Robust instruction following: The model reliably handles complex prompts, multi-object relations, and knowledge-intensive descriptions, making it well suited for tasks that demand fine-grained control.
•Structured visual generation: ERNIE-Image is especially effective on images with clear layout or narrative structure — posters, manga/anime storyboards, multi-panel compositions, and cohesive multi-element visuals.
•Broad stylistic range: Beyond clean graphic design and illustration-style outputs, the model supports realistic photography and distinctive stylized aesthetics, including softer, more cinematic and film-like tones.
•Easy to deploy and adapt: Thanks to its compact size, ERNIE-Image runs on consumer-grade hardware (24G VRAM), bringing high-quality image generation within reach for research and production use. The moderate parameter count also makes fine-tuning and adaptation straightforward for researchers and developers.
Tags
Related Models
Similar AI models you may like
Juggernaut XL
Pony Diffusion V6 XL
CyberRealistic Pony
CyberRealistic
epiCRealism XL
Nova Anime XL
Realism By Stable Yogi (Pony)