Hugging Face Evaluation on the Hub enables zero-shot evaluation for 66B-parameter LLMs via AutoTrain
AI Impact Summary
Evaluation on the Hub now enables zero-shot evaluation of causal language models without writing code, using log-probability scoring over prompt-completion pairs. The capability supports models up to 66B parameters and includes tasks like WinoBias, providing insights on bias and inverse-scaling trends across model sizes. This democratizes benchmarking and could accelerate research and procurement decisions by reducing infrastructure and scripting overhead.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info