Gemma 3n now available in open-source ecosystem for on-device multimodal inference
AI Impact Summary
Gemma 3n is now released for on-device multimodal inference with two variants, gemma-3n-E2B and gemma-3n-E4B, offering real-time operation on modest VRAM (as low as 2 GB for E2B and 3 GB for E4B) despite 5B/8B parameter counts. The release spans major open-source stacks—transformers, timm, MLX, llama.cpp, Ollama, transformers.js, and Google AI Edge—facilitating image, text, audio, and video inputs locally. Architectural highlights include Vision Encoder MobileNet-v5-300, Audio Encoder USM, and the MatFormer design with Per-Layer Embeddings and KV Cache Sharing to optimize memory and long-context processing. Business impact: enables offline, on-device multimodal inference across common runtimes, reducing cloud egress and latency, but organizations must ensure hardware within the 2–3 GB VRAM range and maintain compatibility with their target libraries.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- medium