Model performance across Medmarks.
Compare model win rates, detailed benchmark scores, size classes, and open-source availability across verifiable and open-ended medical tasks.
Open medical LLM benchmark suite for verifiable and open-ended evaluation.
Compare model win rates, detailed benchmark scores, size classes, and open-source availability across verifiable and open-ended medical tasks.
We are grateful to Prime Intellect for their generous support in running proprietary model APIs through their Inference platform.
Thanks to FAL AI for providing a compute grant that helped support this research.
If you are a model developer/frontier lab, we'd love to have your model added to our leaderboard. Please contact us!