InfoCapability

Google Gemma open LLM family released: 2B/7B base and instruct variants with cloud and on-device deployment

AI Impact Summary

Google's Gemma opens a family of open LLMs in 2B and 7B variants, base and instruction-tuned, with 8K context and the ability to run on consumer hardware without quantization. The release is tightly integrated with the Hugging Face ecosystem (Hub and Transformers 4.38) and supports deployment via Vertex AI, Google Kubernetes Engine (GKE), and Hugging Face Inference Endpoints using Text Generation Inference, enabling cloud and edge use. Early benchmarking notes position Gemma 7B as competitive with Mistral 7B, while Gemma 2B lags behind the top 2B models, and the rollout emphasizes practical deployment details such as 4-bit quantization options and CUDA graphs for speedups.

Affected Systems

Gemma 2BGemma 7B

Date: Date not specified
Change type: capability
Severity: info

Google Gemma open LLM family released: 2B/7B base and instruct variants with cloud and on-device deployment

More from Hugging Face

Get alerts for Hugging Face