Gemma4 on MLX (v0.20.8-rc0) — MoE + SWA performance optimizations
AI Impact Summary
The release of v0.20.8-rc0 introduces Gemma 4 to the MLX platform, leveraging a Mixture-of-Experts (MoE) architecture with Switchable Weight Averaging (SWA) prefill for improved performance. Key optimizations include memoizing the sliding-window prefill mask and applying Softmax only to selected experts within the Router.Forward pass, representing a targeted effort to enhance gemma4's efficiency on the text-only runtime.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info