InfoCapability

Open Arabic LLM Leaderboard 2 introduces native Arabic benchmarks and expanded evaluation

AI Impact Summary

Open Arabic LLM Leaderboard 2 replaces translated benchmarks with native Arabic tasks and expands coverage across Arabic-specific morphology, dialects, and safety considerations. The ecosystem now integrates OALL, Balsam Index, AraGen Leaderboard, and SEAL Arabic leaderboard to provide a more transparent, reproducible benchmarking environment beyond isolated community submissions. A silent bug in the AlGhafa task affected rankings, underscoring the need for robust centralized validation; expect model rankings to shift as the new suite emphasizes authentic Arabic evaluation and private-test metrics.

Affected Systems

Open Arabic LLM Leaderboard (OALL)Balsam Index

Date: Date not specified
Change type: capability
Severity: info

Open Arabic LLM Leaderboard 2 introduces native Arabic benchmarks and expanded evaluation

More from Hugging Face

Get alerts for Hugging Face