InfoCapability

Open Medical-LLM Leaderboard benchmarks healthcare LLMs on MedQA, MedMCQA, PubMedQA and MMLU subsets

AI Impact Summary

The Open Medical-LLM Leaderboard establishes a standardized evaluation platform for healthcare LLMs, aggregating tasks such as MedQA, MedMCQA, PubMedQA, and MMLU subsets to compare clinical knowledge and reasoning. It highlights the relative strengths of models like GPT-4-base, Med-PaLM-2, and Gemini Pro, while noting that several open-source, ~7B-parameter models (Starling-LM-7B, gemma-7b, Mistral-7B-v0.1, Hermes-2-Pro-Mistral-7B) can be competitive on targeted datasets. The submission workflow requires safetensors conversion, Transformers AutoClasses compatibility, and public accessibility, which standardizes how models are prepared for evaluation and reduces integration risk. This platform enables technically informed buy/build decisions for medical QA deployment by revealing domain-specific strengths and gaps across datasets and medical domains.

Affected Systems

Open Medical-LLM Leaderboard

Date: Date not specified
Change type: capability
Severity: info

Open Medical-LLM Leaderboard benchmarks healthcare LLMs on MedQA, MedMCQA, PubMedQA and MMLU subsets

More from Hugging Face

Get alerts for Hugging Face