Diffusers 0.34.0 adds Wan VACE video pipelines, Video2World, and enhanced torch.compile
AI Impact Summary
Diffusers 0.34.0 unlocks new controllable video generation with Wan VACE (1.3B and 14B) and Video2World pipelines (Cosmos Predict2), plus long-format tooling (Hunyuan Framepack, F1 Framepack) and AccVideo/CausVid LoRAs for accelerated, high-quality video. It also expands image-generation options (Cosmos Predict2 Text2Image, Chroma, VisualCloze) and enhances 3D model loading (WanTransformer3DModel) with LoRA support (FusionX), enabling richer, end-to-end workflows. The upgrade improves torch.compile compatibility, offloading, and 4-bit quantization workflows to reduce memory and boost speed, but teams should validate LoRA loading and any PyTorch fullgraph prerequisites to avoid runtime issues in production.
Diffusers 0.34.0 adds Wan VACE video pipelines, Video2World, and enhanced torch.compile
Upgrade enables faster prototyping and memory-efficient video/image generation workflows, but teams should validate LoRA loading and any PyTorch fullgraph prerequisites to avoid runtime issues in production.
Models affected
new
Diffusers 0.34.0
sdk
new
Wan VACE
model
new
Cosmos Predict2
model
new
Cosmos Predict2 Video2World
model
new
LTX 0.9.7
model
new
LTX Distilled 0.9.7
model
new
Hunyuan Video Framepack
model
new
F1 Framepack
model
new
FusionX
model
new
Wan2.1-14B
model
new
WanTransformer3DModel
model
new
AccVideo
model
new
CausVid
model
new
Chroma
model
new
VisualCloze
model
new
FLUX.1-schnell
model
new
controlnet_aux
sdk
new
Wan2.1-T2V-14B-Diffusers
model
new
Wan14BT2VFusioniX
model
Risk domains
792%
Source text
📹 New video generation pipelines Wan VACE Wan VACE supports various generation techniques which achieve controllable video generation. It comes in two variants: a 1.3B model for fast iteration & prototyping, and a 14B for high quality generation. Some of the capabilities include: Control to Video (Depth, Pose, Sketch, Flow, Grayscale, Scribble, Layout, Boundary Box, etc.). Recommended library for preprocessing videos to obtain control videos: huggingface/controlnet_aux Image/Video to Video (first frame, last frame, starting clip, ending clip, random clips) Inpainting and Outpainting Subject to Video (faces, object, characters, etc.) Composition to Video (reference anything, animate anything, swap anything, expand anything, move anything, etc.) The code snippets available in this pull request demonstrate some examples of how videos can be generated with controllability signals. Check out the docs to learn more. Cosmos Predict2 Video2World Cosmos-Predict2 is a key branch of the Cosmos World Foundation Models (WFMs) ecosystem for Physical AI, specializing in future state prediction through advanced world modeling. It offers two powerful capabilities: text-to-image generation for creating high-quality images from text descriptions, and video-to-world generation for producing visual simulations from video inputs. The Video2World model comes in a 2B and 14B variant. Check out the docs to learn more. LTX 0.9.7 and Distilled LTX 0.9.7 and its distilled variants are the latest in the family of models released by Lightricks. Check out the docs to learn more. Hunyuan Video Framepack and F1 Framepack is a novel method for enabling long video generation. There are two released variants of Hunyuan Video trained using this technique. Check out the docs to learn more. FusionX The FusionX family of models and LoRAs , built on top of Wan2.1-14B, should already be supported. To load the model, use from_single_file() : from diffusers import WanTransformer3DModel transformer = WanTransf