RankModelProviderScore (0-100)SamplesContextPrice / 1M tokens
1
A
claude-opus-4-7-thinking Anthropic
100.0
5.3K
1M
¥36 / ¥180Input/Output
2
A
claude-opus-4-7 Anthropic
98.8
4.9K
1M
¥36 / ¥180Input/Output
3
A
claude-opus-4-6-thinking Anthropic
97.5
7.9K
1M
¥36 / ¥180Input/Output
4
A
qwen3.7-max-20260517 Alibaba
96.3
1.5K
1M
¥18 / ¥54Input/Output
5
A
claude-opus-4-6 Anthropic
95.0
8.9K
1M
¥36 / ¥180Input/Output
6
Z
glm-5.1 Zai
93.8
3.6K
200K
¥0 / ¥0Input/Output
7
A
claude-sonnet-4-6 Anthropic
92.5
11.1K
1M
¥21.6 / ¥108Input/Output
8
M
kimi-k2.6 Moonshot
91.3
4K
262K
¥6.84 / ¥28.8Input/Output
9
M
muse-spark Meta
90.0
1.6K
-
-
10
G
gemini-3.5-flash Google
88.8
2.2K
1.05M
¥10.8 / ¥64.8Input/Output
11
O
gpt-5.5-xhigh (codex-harness) Openai
87.5
4.1K
400K
¥9 / ¥72Input/Output
12
A
claude-opus-4-5-20251101-thinking-32k Anthropic
86.3
13.1K
200K
¥108 / ¥540Input/Output
13
A
qwen3.6-max-preview Alibaba
85.0
2.5K
246K
¥9.5 / ¥56.9Input/Output
14
O
gpt-5.5-high (codex-harness) Openai
83.8
4.3K
400K
¥9 / ¥72Input/Output
15
MI
mimo-v2.5-pro Xiaomi
82.5
4.7K
1.05M
¥7.2 / ¥21.6Input/Output
16
A
claude-opus-4-5-20251101 Anthropic
81.3
15.3K
200K
¥36 / ¥180Input/Output
17
D
deepseek-v4-pro-thinking Deepseek
80.0
4K
1M
¥3.13 / ¥6.26Input/Output
18
A
qwen3.6-plus Alibaba
78.8
6.1K
1M
¥3.6 / ¥21.6Input/Output
19
O
gpt-5.4-high (codex-harness) Openai
77.5
1.5K
400K
¥9 / ¥72Input/Output
20
G
gemini-3.1-pro-preview Google
76.3
10.3K
1.05M
¥14.4 / ¥86.4Input/Output
21
O
gpt-5.5 (codex-harness) Openai
75.0
4.1K
400K
¥9 / ¥72Input/Output
22
Z
glm-4.7 Zai
73.8
4.9K
205K
¥0 / ¥0Input/Output
23
MI
mimo-v2.5 Xiaomi
72.5
3.7K
1.05M
¥2.88 / ¥14.4Input/Output
24
G
gemini-3-pro Google
71.3
17.2K
1.05M
¥14.4 / ¥86.4Input/Output
25
O
gpt-5.4-medium (codex-harness) Openai
70.0
1.4K
400K
¥9 / ¥72Input/Output
26
G
gemini-3-flash Google
68.8
13.3K
1.05M
¥3.6 / ¥21.6Input/Output
27
Z
glm-5 Zai
67.5
6.6K
205K
¥7.2 / ¥23Input/Output
28
MI
mimo-v2-pro Xiaomi
66.3
6.8K
1.05M
¥7.2 / ¥21.6Input/Output
29
M
kimi-k2.5-thinking Moonshot
65.0
10.7K
262K
¥4.32 / ¥21.6Input/Output
30
M
kimi-k2.5-instant Moonshot
63.8
3.6K
262K
¥4.32 / ¥21.6Input/Output
31
O
gpt-5.3-codex (codex-harness) Openai
62.5
3K
400K
¥9 / ¥72Input/Output
32
O
gpt-5.2 Openai
61.3
1.5K
400K
¥12.6 / ¥101Input/Output
33
O
gpt-5.4-mini-high Openai
60.0
5.5K
400K
¥5.4 / ¥32.4Input/Output
34
M
minimax-m2.7 Minimax
58.8
6.3K
205K
¥0 / ¥0Input/Output
35
X
grok-4.20-beta-0309-reasoning Xai
57.5
7.2K
2M
¥14.4 / ¥43.2Input/Output
36
O
gpt-5-medium Openai
56.3
3.8K
400K
¥9 / ¥72Input/Output
37
A
qwen3.5-397b-a17b Alibaba
55.0
9.7K
262K
¥3.1 / ¥18.6Input/Output
38
M
minimax-m2.1-preview Minimax
53.8
9.3K
205K
¥0 / ¥0Input/Output
39
O
gpt-5.1-medium Openai
52.5
6.1K
400K
¥9 / ¥72Input/Output
40
O
gpt-5.4 Openai
51.3
239
1.05M
¥18 / ¥108Input/Output
41
A
claude-sonnet-4-5-20250929-thinking-32k Anthropic
50.0
15.7K
200K
¥21.6 / ¥108Input/Output
42
G
gemini-3-flash (thinking-minimal) Google
48.8
16.4K
1.05M
¥3.6 / ¥21.6Input/Output
43
A
claude-sonnet-4-5-20250929 Anthropic
47.5
18.4K
200K
¥21.6 / ¥108Input/Output
44
A
claude-opus-4-1-20250805 Anthropic
46.3
8.6K
200K
¥108 / ¥540Input/Output
45
M
minimax-m2.5 Minimax
45.0
7.8K
205K
¥0 / ¥0Input/Output
46
G
gemma-4-31b Google
43.8
3.4K
262K
¥3.24 / ¥7.2Input/Output
47
X
grok-4.3 Xai
42.5
3.5K
1M
¥9 / ¥18Input/Output
48
O
gpt-5.3-codex (codex-harness) Openai
41.3
3.5K
400K
¥9 / ¥72Input/Output
49
D
deepseek-v3.2-thinking Deepseek
40.0
7.9K
128K
¥2.09 / ¥3.1Input/Output
50
TE
hunyuan-hy3-preview Tencent
38.7
1.3K
256K
¥0 / ¥0Input/Output
51
A
qwen3.5-122b-a10b Alibaba
37.5
8.1K
262K
¥2.88 / ¥23Input/Output
52
G
gemma-4-26b-a4b Google
36.3
1.5K
262K
¥0.94 / ¥2.88Input/Output
53
A
qwen3.5-27b Alibaba
35.0
7.7K
262K
¥2.16 / ¥17.3Input/Output
54
Z
glm-4.6 Zai
33.8
8.3K
205K
¥4.32 / ¥15.8Input/Output
55
O
gpt-5.1 Openai
32.5
12.9K
400K
¥9 / ¥72Input/Output
56
MI
mimo-v2-flash (non-thinking) Xiaomi
31.3
6.7K
262K
¥0.72 / ¥2.16Input/Output
57
O
gpt-5.2-codex Openai
30.0
7.8K
400K
¥12.6 / ¥101Input/Output
58
D
deepseek-v3.2 Deepseek
28.8
10.5K
128K
¥2.09 / ¥3.1Input/Output
59
M
kimi-k2-thinking-turbo Moonshot
27.5
15.3K
262K
¥17.3 / ¥72Input/Output
60
O
gpt-5.1-codex Openai
26.3
6.2K
400K
¥9 / ¥72Input/Output
61
A
claude-haiku-4-5-20251001 Anthropic
25.0
20.6K
200K
¥7.2 / ¥36Input/Output
62
M
minimax-m2 Minimax
23.8
8.4K
197K
¥0 / ¥0Input/Output
63
MI
mimo-v2-flash (thinking) Xiaomi
22.5
2.1K
262K
¥0.72 / ¥2.16Input/Output
64
D
deepseek-v3.2-exp Deepseek
21.3
4.9K
128K
¥0 / ¥0Input/Output
65
A
qwen3-coder-480b-a35b-instruct Alibaba
20.0
15.2K
262K
¥6.2 / ¥24.8Input/Output
66
UNKAT-Coder-Pro-V1
-
18.8
1.9K
256K
¥0.22 / ¥8.64Input/Output
67
A
qwen3.5-35b-a3b Alibaba
17.5
1.8K
262K
¥1.8 / ¥14.4Input/Output
68
G
gemini-3.1-flash-lite-preview Google
16.3
9.3K
1.05M
¥1.8 / ¥10.8Input/Output
69
UNtrinity-large-thinking
-
15.0
1.3K
262K
¥1.8 / ¥6.48Input/Output
70
O
gpt-5.1-codex-mini Openai
13.8
1.4K
400K
¥1.8 / ¥14.4Input/Output
71
A
qwen3.5-flash Alibaba
12.5
1.6K
1M
¥1.24 / ¥12.4Input/Output
72
X
grok-4-1-fast-reasoning Xai
11.3
6.9K
2M
¥1.44 / ¥3.6Input/Output
73
MA
mistral-large-3 Mistral
10.0
1K
262K
¥3.6 / ¥10.8Input/Output
74
X
grok-4.1-thinking Xai
8.8
1.2K
200K
¥14.4 / ¥72Input/Output
75
G
gemini-2.5-pro Google
7.5
3.3K
1.05M
¥9 / ¥72Input/Output
76
IB
granite-4.1-8b Ibm
6.3
1.7K
131K
¥0.36 / ¥0.72Input/Output
77
MA
devstral-2 Mistral
5.0
1.6K
262K
¥2.88 / ¥14.4Input/Output
78
IA
mercury-2 Inception Ai
3.8
946
128K
¥1.8 / ¥5.4Input/Output
79
X
grok-4-fast-reasoning Xai
2.5
933
2M
¥1.44 / ¥3.6Input/Output
80
X
grok-code-fast-1 Xai
1.3
982
256K
¥1.44 / ¥10.8Input/Output
81
MA
devstral-medium-2507 Mistral
0.0
992
262K
¥2.88 / ¥14.4Input/Output
Top model analysisclaude-opus-4-7-thinking why it ranks first
claude-opus-4-7-thinking ranks first with a percent score of 100.0 and 5.3K samples. Use it as the first option for this leaderboard, then compare price, context and availability.
How to chooseDo not only look at rank #1
Start with the leaderboard closest to your task. Compare the top models by score and sample size, then check price, context length, open or closed access, and provider availability.
Related leaderboardsCompare adjacent capabilities