Clinical text · benchmark

Medmarks Leaderboard

Open medical LLM benchmark suite for verifiable and open-ended evaluation.

Model performance across Medmarks.

Compare model win rates, detailed benchmark scores, size classes, and open-source availability across verifiable and open-ended medical tasks.

Updates

Support

We are grateful to Prime Intellect for their generous support in running proprietary model APIs through their Inference platform.

Thanks to FAL AI for providing a compute grant that helped support this research.

If you are a model developer/frontier lab, we'd love to have your model added to our leaderboard. Please contact us!