Patch release v5.5.4 fixes Kimi-K2.5 tokenizer regression, DeepSpeed ZeRO-3 rotary IndexError, and Qwen2.5-VL RoPE scaling
AI Impact Summary
This patch resolves tokenizer regressions on Kimi-K2.5 and improves Mistral regex handling (_patch_mistral_regex), stabilizing tokenization workflows. It also fixes an IndexError in DeepSpeed ZeRO-3 when rotary kernels are active and corrects Qwen2.5-VL temporal RoPE scaling applied to still images, reducing training and inference instability for these models. Regression tests for training scenarios (GAS) were added to prevent recurrence.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info