RankModelProviderScore (0-100)SamplesContextPrice / 1M tokens
1
A
claude-opus-4-7-thinking Anthropic
100.0
4.6K
1M
¥36 / ¥180Input/Output
2
A
claude-opus-4-7 Anthropic
98.4
4.3K
1M
¥36 / ¥180Input/Output
3
A
qwen3.7-max-20260517 Alibaba
96.9
1.3K
1M
¥18 / ¥54Input/Output
4
A
claude-opus-4-6-thinking Anthropic
95.3
6.7K
1M
¥36 / ¥180Input/Output
5
A
claude-opus-4-6 Anthropic
93.8
7.6K
1M
¥36 / ¥180Input/Output
6
Z
glm-5.1 Zai
92.2
3.1K
200K
¥0 / ¥0Input/Output
7
A
claude-sonnet-4-6 Anthropic
90.6
9.5K
1M
¥21.6 / ¥108Input/Output
8
M
kimi-k2.6 Moonshot
89.1
3.5K
262K
¥6.84 / ¥28.8Input/Output
9
G
gemini-3.5-flash Google
87.5
1.9K
1.05M
¥10.8 / ¥64.8Input/Output
10
M
muse-spark Meta
85.9
1.4K
-
-
11
O
gpt-5.5-xhigh (codex-harness) Openai
84.4
3.6K
400K
¥9 / ¥72Input/Output
12
A
qwen3.6-max-preview Alibaba
82.8
2.2K
246K
¥9.5 / ¥56.9Input/Output
13
A
claude-opus-4-5-20251101-thinking-32k Anthropic
81.3
5.1K
200K
¥108 / ¥540Input/Output
14
O
gpt-5.5-high (codex-harness) Openai
79.7
3.8K
400K
¥9 / ¥72Input/Output
15
MI
mimo-v2.5-pro Xiaomi
78.1
4.1K
1.05M
¥7.2 / ¥21.6Input/Output
16
A
claude-opus-4-5-20251101 Anthropic
76.6
6.9K
200K
¥36 / ¥180Input/Output
17
D
deepseek-v4-pro-thinking Deepseek
75.0
3.4K
1M
¥3.13 / ¥6.26Input/Output
18
A
qwen3.6-plus Alibaba
73.4
5.3K
1M
¥3.6 / ¥21.6Input/Output
19
O
gpt-5.4-high (codex-harness) Openai
71.9
1.3K
400K
¥9 / ¥72Input/Output
20
Z
glm-4.7 Zai
70.3
119
205K
¥0 / ¥0Input/Output
21
MI
mimo-v2.5 Xiaomi
68.8
3.3K
1.05M
¥2.88 / ¥14.4Input/Output
22
G
gemini-3.1-pro-preview Google
67.2
9K
1.05M
¥14.4 / ¥86.4Input/Output
23
O
gpt-5.5 (codex-harness) Openai
65.6
3.6K
400K
¥9 / ¥72Input/Output
24
G
gemini-3-flash Google
64.1
4K
1.05M
¥3.6 / ¥21.6Input/Output
25
O
gpt-5.4-medium (codex-harness) Openai
62.5
1.3K
400K
¥9 / ¥72Input/Output
26
MI
mimo-v2-pro Xiaomi
60.9
6K
1.05M
¥7.2 / ¥21.6Input/Output
27
Z
glm-5 Zai
59.4
5.8K
205K
¥7.2 / ¥23Input/Output
28
M
kimi-k2.5-thinking Moonshot
57.8
9.1K
262K
¥4.32 / ¥21.6Input/Output
29
G
gemini-3-pro Google
56.3
3.4K
1.05M
¥14.4 / ¥86.4Input/Output
30
O
gpt-5.3-codex (codex-harness) Openai
54.7
2.6K
400K
¥9 / ¥72Input/Output
31
M
kimi-k2.5-instant Moonshot
53.1
3K
262K
¥4.32 / ¥21.6Input/Output
32
M
minimax-m2.7 Minimax
51.6
5.5K
205K
¥0 / ¥0Input/Output
33
O
gpt-5.4-mini-high Openai
50.0
4.8K
400K
¥5.4 / ¥32.4Input/Output
34
X
grok-4.20-beta-0309-reasoning Xai
48.4
6.3K
2M
¥14.4 / ¥43.2Input/Output
35
A
claude-sonnet-4-5-20250929 Anthropic
46.9
5.4K
200K
¥21.6 / ¥108Input/Output
36
A
qwen3.5-397b-a17b Alibaba
45.3
8.4K
262K
¥3.1 / ¥18.6Input/Output
37
A
claude-sonnet-4-5-20250929-thinking-32k Anthropic
43.8
4.4K
200K
¥21.6 / ¥108Input/Output
38
G
gemma-4-31b Google
42.2
3K
262K
¥3.24 / ¥7.2Input/Output
39
G
gemini-3-flash (thinking-minimal) Google
40.6
9.8K
1.05M
¥3.6 / ¥21.6Input/Output
40
X
grok-4.3 Xai
39.1
3K
1M
¥9 / ¥18Input/Output
41
M
minimax-m2.1-preview Minimax
37.5
2.5K
205K
¥0 / ¥0Input/Output
42
M
minimax-m2.5 Minimax
35.9
6.8K
205K
¥0 / ¥0Input/Output
43
O
gpt-5.4 Openai
34.4
216
1.05M
¥18 / ¥108Input/Output
44
D
deepseek-v3.2-thinking Deepseek
32.8
3.9K
128K
¥2.09 / ¥3.1Input/Output
45
TE
hunyuan-hy3-preview Tencent
31.3
1.2K
256K
¥0 / ¥0Input/Output
46
O
gpt-5.3-codex (codex-harness) Openai
29.7
3.1K
400K
¥9 / ¥72Input/Output
47
A
qwen3.5-122b-a10b Alibaba
28.1
7.1K
262K
¥2.88 / ¥23Input/Output
48
G
gemma-4-26b-a4b Google
26.6
1.3K
262K
¥0.94 / ¥2.88Input/Output
49
A
qwen3.5-27b Alibaba
25.0
6.7K
262K
¥2.16 / ¥17.3Input/Output
50
D
deepseek-v3.2 Deepseek
23.4
5.2K
128K
¥2.09 / ¥3.1Input/Output
51
O
gpt-5.2-codex Openai
21.9
4.6K
400K
¥12.6 / ¥101Input/Output
52
A
claude-haiku-4-5-20251001 Anthropic
20.3
8.9K
200K
¥7.2 / ¥36Input/Output
53
M
kimi-k2-thinking-turbo Moonshot
18.8
5.4K
262K
¥17.3 / ¥72Input/Output
54
MI
mimo-v2-flash (non-thinking) Xiaomi
17.2
2.6K
262K
¥0.72 / ¥2.16Input/Output
55
O
gpt-5.1 Openai
15.6
2.9K
400K
¥9 / ¥72Input/Output
56
A
qwen3-coder-480b-a35b-instruct Alibaba
14.1
4.4K
262K
¥6.2 / ¥24.8Input/Output
57
G
gemini-3.1-flash-lite-preview Google
12.5
8.2K
1.05M
¥1.8 / ¥10.8Input/Output
58
MI
mimo-v2-flash (thinking) Xiaomi
10.9
913
262K
¥0.72 / ¥2.16Input/Output
59
UNtrinity-large-thinking
-
9.4
1.1K
262K
¥1.8 / ¥6.48Input/Output
60
A
qwen3.5-35b-a3b Alibaba
7.8
1.6K
262K
¥1.8 / ¥14.4Input/Output
61
X
grok-4-1-fast-reasoning Xai
6.3
1.5K
2M
¥1.44 / ¥3.6Input/Output
62
A
qwen3.5-flash Alibaba
4.7
1.4K
1M
¥1.24 / ¥12.4Input/Output
63
IB
granite-4.1-8b Ibm
3.1
1.5K
131K
¥0.36 / ¥0.72Input/Output
64
MA
devstral-2 Mistral
1.6
225
262K
¥2.88 / ¥14.4Input/Output
65
IA
mercury-2 Inception Ai
0.0
845
128K
¥1.8 / ¥5.4Input/Output
Top model analysisclaude-opus-4-7-thinking why it ranks first
claude-opus-4-7-thinking ranks first with a percent score of 100.0 and 4.6K samples. Use it as the first option for this leaderboard, then compare price, context and availability.
How to chooseDo not only look at rank #1
Start with the leaderboard closest to your task. Compare the top models by score and sample size, then check price, context length, open or closed access, and provider availability.
Related leaderboardsCompare adjacent capabilities