HighCapability

Sentence Transformers v5.4: Multimodal Embedding & Reranker Models Released

Action Required

Organizations can now leverage multimodal models for enhanced semantic search and retrieval, improving the accuracy and efficiency of applications that process diverse data types.

AI Impact Summary

Sentence Transformers has released v5.4, introducing multimodal embedding and reranker models based on the Qwen3-VL series. This update enables encoding and comparing text, images, audio, and video using a unified API, opening up new use cases like visual document retrieval and multimodal RAG pipelines. Users will need to install the necessary extras for the modalities they intend to use, and be aware of the GPU requirements (8GB+ VRAM for the 8B variants).

Affected Systems

Sentence Transformers

Date: Date not specified
Change type: capability
Severity: high

Sentence Transformers v5.4: Multimodal Embedding & Reranker Models Released

More from Hugging Face

Get alerts for Hugging Face