Transformers 4.45.0: Dynamic Speculative Decoding becomes default for assisted generation
AI Impact Summary
Transformers 4.45.0 now defaults to dynamic speculative decoding, which splits generation into a fast draft and a verifier that runs in parallel to accelerate autoregressive decoding. This means no code changes are required to enable the faster path, and observed speedups can reach up to 2.7x on some model-task pairs. Teams should benchmark their own workloads against target models (e.g., Opt, Pythia, Llama variants, CodeGen, Flan-T5) and tune thresholds such as assistant_confidence_threshold to maintain accuracy while maximizing latency reductions.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info