2025.11.26阿里巴巴通义实验室开源Z-Image,这是一个强大且高效的图像生成模型,6B参数。

目前有两个版本:

Z-Image-Turbo:Z-Image 的蒸馏版本,在仅需 8 次函数评估(Number of Function Evaluation,NPE)的情况下,可达到或超过领先模型的性能。在H800 GPU 上达到亚秒级推理延迟,并能轻松在 16G 显存的消费级GPU上运行。它在照片级真实感图像生成、双语文本渲染(中文和英文)以及强大的指令遵循方面表现出色。

Z-Image-Edit:专为图像编辑任务在 Z-Image 基础上微调的版本。它支持创意图像到图像的生成,并具备出色的指令遵循能力,可根据自然语言提示进行精确编辑。

Tongyi Lab@Ali_TongyiLab

1/ 10 We are pleased to introduce Z-Image, an efficient 6-billion-parameter foundation model for image generation. Through systematic optimization, it proves that top-tier performance is achievable without relying on enormous model sizes, delivering strong results in photorealistic generation and bilingual text rendering that are comparable to leading commercial models.
1/10 我们很高兴介绍 Z-Image,一种高效的6-billion-parameter图像生成基础模型。通过系统优化,它证明了无需依赖巨大模型尺寸即可实现顶级性能,在逼真的生成和双语文本渲染方面取得了与领先商业模型相当的强劲成果。

2/ 10 At just 6 billion parameters, Z-Image produces photorealistic images on par with those from models an order of magnitude larger. It can run smoothly on consumer-grade graphics cards with less than 16GB of VRAM, making advanced image generation technology accessible to a wider audience.We are publicly releasing two specialized models on Z-Image: Z-Image-Turbo(Released) for generation and Z-Image-Edit(to-be-released) for editing.
2/10 仅有 60 亿个参数,Z-Image 就能生成与大一个数量级模型相媲美的写实图像。它可以在低于 16GB 显存的消费级显卡上流畅运行,使先进的图像生成技术被更广泛的用户所接触。我们将在 Z-Image 上公开发布两个专用型号:用于生成的 Z-Image-Turbo(已发布)和用于编辑的 Z-Image-Edit(待发布)。

3/ 10 Architecture: The Z-Image model adopts a Single-Stream Diffusion Transformer architecture. This design unifies the processing of various conditional inputs (like text and image embeddings) with the noisy image latents into a single sequence, which is then fed into the Transformer backbone.
3/10 架构:Z-Image 模型采用单流扩散变换器架构。该设计将各种条件输入(如文本和图像嵌入)与噪点图像潜在部分的处理统一为单一序列,然后输入 Transformer 骨干。

4/ 10Arena: According to the in-house elo-based arena, Z-Image shows highly competitive performance against other leading models, while achieving state-of-the-art results among open-source models.
4/10Arena:根据内部基于 elo 的竞技场,Z-Image 在与其他领先模型中表现出高度竞争力,同时在开源模型中取得了最先进的成绩。

5/ 10 Efficient Photorealistic Quality: Z-Image-Turbo excels at producing images with photography-level realism, demonstrating fine control over details, lighting, and textures. It balances high fidelity with strong aesthetic quality in composition and overall mood. The generated images are not only realistic but also visually appealing.
5/10 高效的写实品质:Z-Image-Turbo 擅长制作具有摄影级真实感的图像,展现了对细节、光影和纹理的细致控制。它在高保真度与强烈的美学品质之间取得了平衡,无论是构图还是整体氛围。生成的图像不仅逼真,视觉效果也很吸引人。

6/ 10 Excellent Bilingual Text Rendering: Z-Image-Turbo can accurately render Chinese and English text while preserving facial realism and overall aesthetic composition, with results comparable to top-tier closed-source models. In poster design, it demonstrates strong compositional skills and a good sense of typography. It can render high-quality text even in challenging scenarios with small font sizes, delivering designs that are both textually precise and visually compelling.
6/10 出色的双语文本渲染:Z-Image-Turbo 能够准确渲染中英文本,同时保持面部真实感和整体美学构图,效果可媲美顶级闭源模型。在海报设计方面,它展现了出色的构图技巧和良好的排版感。即使在字体较小的复杂场景中,它也能渲染高质量文本,呈现出既精准又视觉吸引的设计。

7/ 10 Rich World Knowledge and Cultural Understanding: Z-Image possesses a vast understanding of world knowledge and diverse cultural concepts. This allows it to accurately generate a wide array of subjects, including famous landmarks, well-known characters, and specific real-world objects.
7/10 丰富的世界知识与文化理解:Z-Image 对世界知识和多样文化概念有着广泛的理解。这使得它能够准确生成各种主题,包括著名地标、知名人物和特定的现实世界物体。

8/ 10 Deep Semantic Understanding with Priori Knowledge: The powerful prompt enhancer (PE) uses a structured reasoning chain to inject logic and common sense, enabling the model to handle complex tasks like the “chicken-and-rabbit problem” or visualizing classical Chinese poetry. In editing tasks, even when faced with ambiguous user instructions, the model can apply its reasoning capabilities to infer the underlying intent and ensure a logically coherent result.
8/10 先验知识的深度语义理解:强大的提示增强器(PE)利用结构化推理链注入逻辑和常识,使模型能够处理诸如“鸡兔问题”或中国古典诗歌的可视化等复杂任务。在编辑任务中,即使面对模糊的用户指令,模型也能运用推理能力推断潜在意图,确保逻辑连贯的结果。

9/ 10 Strong Instruction-Following and Creative Editing: Z-Image-Edit can precisely execute complex instructions, such as simultaneously changing and brightening the background. It can also modify text at specified locations and maintain character consistency during significant image transformations, demonstrating fine-grained control over image elements.
9/10 强指令遵循与创意编辑:Z-Image-Edit 能够精确执行复杂指令,如同时改变和调亮背景。它还能在指定位置修改文本,并在重大图像变换时保持字符一致性,展示了对图像元素的细粒度控制。

10/ 10 We invite the community’s active participation and feedback to help us build a generative AI ecosystem that is not only open and transparent but also more efficient, accessible, and sustainable.

GitHub: https://github.com/Tongyi-MAI/Z-Imag
eModelScope: https://modelscope.ai/models/Tongyi-MAI/Z-Image-Turbo/summary
HuggingFace: https://huggingface.co/Tongyi-MAI/Z-Image-Turbo
Z-Image gallry : https://modelscope.cn/studios/Tongyi-MAI/Z-Image-Gallery

Categories:

Tags:

Comments are closed