LocalAI 3.12.0 Release: Multi-modal Realtime, Voxtral Backend, GPU Improvements
Action Required
Users can now leverage a broader range of AI models and media types within LocalAI, enhancing their applications' capabilities and user experiences. Migration to the new Voxtral backend is recommended for improved text-to-speech quality.
AI Impact Summary
LocalAI has released version 3.12.0, introducing significant new capabilities including multi-modal real-time conversations (text, image, audio), a new Voxtral backend for high-quality text-to-speech, and improved GPU support via Diffusers. The release also includes optimizations for legacy CPUs, stability fixes, and enhancements to the UI and logging. This update expands LocalAI's functionality and performance, particularly for applications requiring real-time interaction and diverse media formats.
Affected Systems
- Date
- 20 Feb 2026
- Change type
- capability
- Severity
- high