InfoCapability

LiveCodeBench Leaderboard — Holistic, contamination-free evaluation of code LLMs

AI Impact Summary

LiveCodeBench introduces a time-aware, contamination-resistant benchmark for code LLMs, evaluating Code Generation, Self Repair, Code Execution, and Test Output Prediction. Problems are sourced from LeetCode, AtCoder, and CodeForces with release-date annotations, enabling evaluation over time to detect training-data leakage. Observed leaders (GPT-4-Turbo across most scenarios, Claude-3-Opus in test-output prediction, Mistral-Large in natural-language reasoning tasks) give teams a concrete basis to compare models and prioritize improvements for coding workloads.

Affected Systems

LiveCodeBenchLeetCode

Date: Date not specified
Change type: capability
Severity: info

LiveCodeBench Leaderboard — Holistic, contamination-free evaluation of code LLMs

More from Hugging Face

Get alerts for Hugging Face