Sentence Transformers v5.4: Multimodal Embedding & Reranker Models Released
Action Required
Organizations can now leverage multimodal models for enhanced semantic search and retrieval, improving the accuracy and efficiency of applications that process diverse data types.
AI Impact Summary
Sentence Transformers has released v5.4, introducing multimodal embedding and reranker models based on the Qwen3-VL series. This update enables encoding and comparing text, images, audio, and video using a unified API, opening up new use cases like visual document retrieval and multimodal RAG pipelines. Users will need to install the necessary extras for the modalities they intend to use, and be aware of the GPU requirements (8GB+ VRAM for the 8B variants).
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- high