Falcon2-11B: Open-source 11B LLM and 11B VLM with multimodal, multilingual support
AI Impact Summary
Falcon2-11B introduces an 11B parameter LLM and an 11B vision-language model, expanding Falcon's open-source multimodal capabilities. It was trained on over 5,000 billion tokens from RefinedWeb and multilingual corpora, with a staged context-length improvement to 8192 tokens and distributed training on 1024 A100 GPUs using 3D parallelism, ZeRO, and Flash-Attention 2. The accompanying VLM leverages CLIP ViT-L/14 with a dedicated multimodal projector, trained on 558K image-caption pairs and 1.2M image-text instructions to enable image-based chat. With competitive benchmarking relative to larger Falcon variants and demonstrated code-generation capability (pass@1 29.59%), this release provides an open, cost-effective path for teams to deploy multilingual, multimodal applications, impacting product roadmaps that rely on on-prem or cloud inference of LLMs and vision-language features.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info