Open Arabic LLM Leaderboard 2 introduces native Arabic benchmarks and expanded evaluation
AI Impact Summary
Open Arabic LLM Leaderboard 2 replaces translated benchmarks with native Arabic tasks and expands coverage across Arabic-specific morphology, dialects, and safety considerations. The ecosystem now integrates OALL, Balsam Index, AraGen Leaderboard, and SEAL Arabic leaderboard to provide a more transparent, reproducible benchmarking environment beyond isolated community submissions. A silent bug in the AlGhafa task affected rankings, underscoring the need for robust centralized validation; expect model rankings to shift as the new suite emphasizes authentic Arabic evaluation and private-test metrics.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info