Google Gemma open LLM family released: 2B/7B base and instruct variants with cloud and on-device deployment
AI Impact Summary
Google's Gemma opens a family of open LLMs in 2B and 7B variants, base and instruction-tuned, with 8K context and the ability to run on consumer hardware without quantization. The release is tightly integrated with the Hugging Face ecosystem (Hub and Transformers 4.38) and supports deployment via Vertex AI, Google Kubernetes Engine (GKE), and Hugging Face Inference Endpoints using Text Generation Inference, enabling cloud and edge use. Early benchmarking notes position Gemma 7B as competitive with Mistral 7B, while Gemma 2B lags behind the top 2B models, and the rollout emphasizes practical deployment details such as 4-bit quantization options and CUDA graphs for speedups.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info