Google releases PaliGemma – open vision language model
AI Impact Summary
Google has released PaliGemma, a new family of vision-language models built on SigLIP-So400m and Gemma-2B, offering capabilities like image captioning, visual question answering, and object detection. The release includes pretrained, mix, and fine-tuned models available through the Hugging Face Hub, alongside a Transformers integration for streamlined development. The availability of different precisions (bfloat16, float16, float32) and resolutions (224x224, 448x448, 896x896) provides flexibility for various use cases and hardware constraints, though higher resolutions require more memory.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info