InfoCapability

Google releases PaliGemma – open vision language model

AI Impact Summary

Google has released PaliGemma, a new family of vision-language models built on SigLIP-So400m and Gemma-2B, offering capabilities like image captioning, visual question answering, and object detection. The release includes pretrained, mix, and fine-tuned models available through the Hugging Face Hub, alongside a Transformers integration for streamlined development. The availability of different precisions (bfloat16, float16, float32) and resolutions (224x224, 448x448, 896x896) provides flexibility for various use cases and hardware constraints, though higher resolutions require more memory.

Affected Systems

SigLIP-So400mGemma-2B

Date: Date not specified
Change type: capability
Severity: info

Google releases PaliGemma – open vision language model

More from Hugging Face

Get alerts for Hugging Face