InfoCapability

Vision Language Models update: April 2025 capabilities and models (LLaVA, LlavaNext, KOSMOS-2)

AI Impact Summary

The post documents an expanding ecosystem of vision-language models (VLMs), including LLaVA, KOSMOS-2, Fuyu-8B, and LlavaNext, plus tooling like trl for fine-tuning and prompts. It highlights grounding features and multiple evaluation suites (Vision Arena, Open VLM Leaderboard, LMMS-Eval) that shape model selection and benchmarking. For engineering teams, this lowers the barrier to add multimodal capabilities via zero-shot use or fine-tuning, with ongoing updates suggesting more models and capabilities; plan to implement an evaluation and governance pipeline to compare latency, grounding quality, and licensing across Hub-hosted models before promoting any to production.

Affected Systems

LLaVALlavaNext

Date: Date not specified
Change type: capability
Severity: info

Vision Language Models update: April 2025 capabilities and models (LLaVA, LlavaNext, KOSMOS-2)

More from Hugging Face

Get alerts for Hugging Face