Benchmarks
Benchmarks
For more comparisons between our models, we recommend visiting our blog posts. During model releases, we may directly report public and private benchmarks for our models and competitive models.
You can also explore third-party performance metrics, such as:
| Benchmark Name | Description | Link |
|---|---|---|
| Artificial Analysis | Compares AI models across quality, price, output speed, latency, context window, and more. | Visit |
| LMArena Arena | Human-preference benchmark evaluating model output quality through direct comparisons. | Visit |
| Scale AI Leaderboard | Reports public and private benchmarks in coding, instruction following, math, and other domains. | Visit |
| OpenRouter Rankings | Ranks AI models based on general usage and popularity across different use cases and tasks. | Visit |
| CTO Bench | Evaluates AI models on real end-to-end coding tasks. | Visit |