Benchmarks

Benchmarks

For more comparisons between our models, we recommend visiting our blog posts. During model releases, we may directly report public and private benchmarks for our models and competitive models.

You can also explore third-party performance metrics, such as:

Benchmark NameDescriptionLink
Artificial AnalysisCompares AI models across quality, price, output speed, latency, context window, and more.Visit
LMArena ArenaHuman-preference benchmark evaluating model output quality through direct comparisons.Visit
Scale AI LeaderboardReports public and private benchmarks in coding, instruction following, math, and other domains.Visit
OpenRouter RankingsRanks AI models based on general usage and popularity across different use cases and tasks.Visit
CTO BenchEvaluates AI models on real end-to-end coding tasks.Visit