InfoCapability

SmolVLM introduces 256M and 500M vision-language models with ONNX/transformers support

AI Impact Summary

SmolVLM-256M and SmolVLM-500M expand the SmolVLM family to the smallest VLMs, using a 93M SigLIP base encoder and reworked tokenization to maintain performance. The release includes loadable checkpoints for transformers, MLX, and ONNX, plus WebGPU demos, signaling easier cross-platform deployment and browser-based inference. Enterprises should expect lower per-inference cost and better edge suitability for document understanding, image captioning, and retrieval, but must benchmark against larger models (2B/1.7B) for task-specific accuracy and latency.

Affected Systems

SmolVLM-256MSmolVLM-500M

Date: Date not specified
Change type: capability
Severity: info

SmolVLM introduces 256M and 500M vision-language models with ONNX/transformers support

More from Hugging Face

Get alerts for Hugging Face