Chat · Text · Hard Prompts English Leaderboard

Ranking for Text / Hard Prompts English, based on public preference data.

Selection guide

Hard Prompts English model ranking guide

Ranking for Text / Hard Prompts English, based on public preference data.

claude-opus-4-6-thinkingclaude-opus-4-6claude-opus-4-7-thinkingclaude-opus-4-7mimo-v2.5-pro
Current DirectoryChat · Text · Hard Prompts English
Models359
Published2026/05/27
Arena public preference evaluationOriginal leaderboard: Text / Hard Prompts EnglishPublished: 2026/05/27Leaderboard dataset: LMArena latest parquetOpen Arena sourceOpen leaderboard dataset
1
claude-opus-4-6-thinking
Anthropic
100.0
9.8K
1M
¥36 / ¥180Input/Output
2
claude-opus-4-6
Anthropic
99.7
11K
1M
¥36 / ¥180Input/Output
3
claude-opus-4-7-thinking
Anthropic
99.4
6.5K
1M
¥36 / ¥180Input/Output
4
claude-opus-4-7
Anthropic
99.2
6.9K
1M
¥36 / ¥180Input/Output
5
mimo-v2.5-pro
Xiaomi
98.9
4.9K
1.05M
¥7.2 / ¥21.6Input/Output
6
claude-sonnet-4-6
Anthropic
98.6
8.4K
1M
¥21.6 / ¥108Input/Output
7
glm-5.1
Zai
98.3
4.4K
200K
¥0 / ¥0Input/Output
8
gpt-5.4-high
Openai
98.0
8.5K
1.05M
¥18 / ¥108Input/Output
9
gpt-5.5-high
Openai
97.8
5.3K
1.05M
¥36 / ¥216Input/Output
10
qwen3.5-max-preview
Alibaba
97.5
6.2K
-
-
11
gemini-3.1-pro-preview
Google
97.2
13.2K
1.05M
¥14.4 / ¥86.4Input/Output
12
ernie-5.1
Baidu
96.9
4.6K
119K
¥5.4 / ¥21.6Input/Output
13
claude-opus-4-5-20251101
Anthropic
96.6
18.4K
200K
¥36 / ¥180Input/Output
14
claude-opus-4-5-20251101-thinking-32k
Anthropic
96.4
9.4K
200K
¥108 / ¥540Input/Output
15
gemini-3-pro
Google
96.1
10.7K
1.05M
¥14.4 / ¥86.4Input/Output
16
gemini-3.5-flash
Google
95.8
3K
1.05M
¥10.8 / ¥64.8Input/Output
17
kimi-k2.6
Moonshot
95.5
4.8K
262K
¥6.84 / ¥28.8Input/Output
18
muse-spark
Meta
95.3
3.8K
-
-
19
qwen3.7-max-preview
Alibaba
95.0
1.3K
1M
¥18 / ¥54Input/Output
20
gpt-5.5
Openai
94.7
5.6K
1.05M
¥36 / ¥216Input/Output
21
claude-sonnet-4-5-20250929-thinking-32k
Anthropic
94.4
21.3K
200K
¥21.6 / ¥108Input/Output
22
mimo-v2-pro
Xiaomi
94.1
6.7K
1.05M
¥7.2 / ¥21.6Input/Output
23
gpt-5.4
Openai
93.9
9.3K
1.05M
¥18 / ¥108Input/Output
24
claude-sonnet-4-5-20250929
Anthropic
93.6
21.6K
200K
¥21.6 / ¥108Input/Output
25
mimo-v2.5
Xiaomi
93.3
5.2K
1.05M
¥2.88 / ¥14.4Input/Output
26
deepseek-v4-pro-thinking
Deepseek
93.0
5K
1M
¥3.13 / ¥6.26Input/Output
27
longcat-flash-chat-2602-exp
Meituan
92.7
7.4K
128K
¥1.08 / ¥10.8Input/Output
28
amazon-nova-experimental-chat-26-02-10
Amazon
92.5
930
-
-
29
gemini-3-flash
Google
92.2
7.9K
1.05M
¥3.6 / ¥21.6Input/Output
30
kimi-k2.5-thinking
Moonshot
91.9
10.4K
262K
¥4.32 / ¥21.6Input/Output
31
gpt-5.1-high
Openai
91.6
10.4K
400K
¥9 / ¥72Input/Output
32
glm-5
Zai
91.3
6.3K
205K
¥7.2 / ¥23Input/Output
33
qwen3.5-397b-a17b
Alibaba
91.1
9.8K
262K
¥3.1 / ¥18.6Input/Output
34
gemma-4-31b
Google
90.8
1.5K
262K
¥3.24 / ¥7.2Input/Output
35
claude-opus-4-1-20250805-thinking-16k
Anthropic
90.5
12.5K
200K
¥108 / ¥540Input/Output
36
deepseek-v4-pro
Deepseek
90.2
5.4K
1M
¥3.13 / ¥6.26Input/Output
37
grok-4.20-beta-0309-reasoning
Xai
89.9
9K
2M
¥14.4 / ¥43.2Input/Output
38
glm-4.7
Zai
89.7
3.1K
205K
¥0 / ¥0Input/Output
39
qwen3.6-plus
Alibaba
89.4
5.8K
1M
¥3.6 / ¥21.6Input/Output
40
gemini-2.5-pro
Google
89.1
31.6K
1.05M
¥9 / ¥72Input/Output
41
claude-opus-4-1-20250805
Anthropic
88.8
19.4K
200K
¥108 / ¥540Input/Output
42
kimi-k2.5-instant
Moonshot
88.5
2.2K
262K
¥4.32 / ¥21.6Input/Output
43
amazon-nova-experimental-chat-12-10
Amazon
88.3
901
-
-
44
dola-seed-2.0-pro
Bytedance
88.0
11.2K
-
-
45
ernie-5.0-0110
Baidu
87.7
9.3K
128K
¥7.92 / ¥14.4Input/Output
46
qwen3.6-max-preview
Alibaba
87.4
1.4K
246K
¥9.5 / ¥56.9Input/Output
47
gpt-5.2-chat-latest-20260210
Openai
87.2
9.7K
400K
¥12.6 / ¥101Input/Output
48
deepseek-v3.2-thinking
Deepseek
86.9
10.2K
128K
¥2.09 / ¥3.1Input/Output
49
grok-4.20-multi-agent-beta-0309
Xai
86.6
8.8K
2M
¥14.4 / ¥43.2Input/Output
50
deepseek-v3.2-exp-thinking
Deepseek
86.3
2.4K
128K
¥0 / ¥0Input/Output
51
qwen3-max-preview
Alibaba
86.0
6.8K
262K
¥6.2 / ¥24.8Input/Output
52
longcat-flash-chat
Meituan
85.8
2.9K
128K
¥1.08 / ¥10.8Input/Output
53
deepseek-v4-flash
Deepseek
85.5
5.3K
1M
¥1.01 / ¥2.02Input/Output
54
glm-4.6
Zai
85.2
9.6K
205K
¥4.32 / ¥15.8Input/Output
55
gemma-4-26b-a4b
Google
84.9
1.5K
262K
¥0.94 / ¥2.88Input/Output
56
grok-4.20-beta1
Xai
84.6
7.4K
2M
¥14.4 / ¥43.2Input/Output
57
deepseek-v3.2
Deepseek
84.4
12K
128K
¥2.09 / ¥3.1Input/Output
58
grok-3-preview-02-24
Xai
84.1
6.3K
1M
¥9 / ¥18Input/Output
59
qwen3-vl-235b-a22b-instruct
Alibaba
83.8
3K
128K
¥2.16 / ¥8.64Input/Output
60
amazon-nova-experimental-chat-26-01-10
Amazon
83.5
884
-
-
61
grok-4.1
Xai
83.2
17.9K
200K
¥14.4 / ¥72Input/Output
62
deepseek-v3.2-exp
Deepseek
83.0
3.2K
128K
¥0 / ¥0Input/Output
63
mistral-large-3
Mistral
82.7
11.5K
262K
¥3.6 / ¥10.8Input/Output
64
gemini-3-flash (thinking-minimal)
Google
82.4
15.3K
1.05M
¥3.6 / ¥21.6Input/Output
65
qwen3-235b-a22b-instruct-2507
Alibaba
82.1
24.9K
128K
¥2.09 / ¥8.23Input/Output
66
gpt-5.1
Openai
81.8
11.3K
400K
¥9 / ¥72Input/Output
67
kimi-k2-thinking-turbo
Moonshot
81.6
16.4K
262K
¥17.3 / ¥72Input/Output
68
mimo-v2-flash (non-thinking)
Xiaomi
81.3
12.4K
262K
¥0.72 / ¥2.16Input/Output
69
deepseek-v4-flash-thinking
Deepseek
81.0
5.2K
1M
¥1.01 / ¥2.02Input/Output
70
grok-4.1-thinking
Xai
80.7
17.4K
200K
¥14.4 / ¥72Input/Output
71
qwen3-max-2025-09-23
Alibaba
80.4
2.5K
258K
¥6.19 / ¥24.7Input/Output
72
qwen3-next-80b-a3b-instruct
Alibaba
80.2
6.1K
131K
¥1.04 / ¥4.13Input/Output
73
mistral-medium-2508
Mistral
79.9
25K
262K
¥2.88 / ¥14.4Input/Output
74
deepseek-v3.1-terminus-thinking
Deepseek
79.6
840
128K
¥1.8 / ¥5.04Input/Output
75
mimo-v2-omni
Xiaomi
79.3
930
262K
¥2.88 / ¥14.4Input/Output
76
claude-haiku-4-5-20251001
Anthropic
79.1
22K
200K
¥7.2 / ¥36Input/Output
77
minimax-m2.7
Minimax
78.8
6.9K
205K
¥0 / ¥0Input/Output
78
ernie-5.0-preview-1203
Baidu
78.5
2.5K
128K
¥7.92 / ¥14.4Input/Output
79
qwen3-235b-a22b-thinking-2507
Alibaba
78.2
2K
131K
¥2.07 / ¥8.26Input/Output
80
qwen3.5-122b-a10b
Alibaba
77.9
8K
262K
¥2.88 / ¥23Input/Output
81
gpt-5.2-high
Openai
77.7
13K
400K
¥12.6 / ¥101Input/Output
82
amazon-nova-experimental-chat-11-10
Amazon
77.4
6.5K
-
-
83
grok-4-fast-chat
Xai
77.1
1.6K
2M
¥1.44 / ¥3.6Input/Output
84
gpt-5.5-instant
Openai
76.8
8.4K
400K
¥9 / ¥72Input/Output
85
glm-4.5
Zai
76.5
5.7K
131K
¥4.32 / ¥15.8Input/Output
86
deepseek-v3.1-thinking
Deepseek
76.3
2.5K
128K
¥1.44 / ¥5.04Input/Output
87
qwen3.5-27b
Alibaba
76.0
7.8K
262K
¥2.16 / ¥17.3Input/Output
88
gpt-5.4-mini-high
Openai
75.7
8.3K
400K
¥5.4 / ¥32.4Input/Output
89
minimax-m2.1-preview
Minimax
75.4
4.2K
205K
¥0 / ¥0Input/Output
90
chatgpt-4o-latest-20250326
Openai
75.1
20K
128K
¥18 / ¥72Input/Output
91
step-3.5-flash
Stepfun
74.9
9.4K
256K
¥0.69 / ¥2.07Input/Output
92
hunyuan-hy3-preview
Tencent
74.6
1.9K
256K
¥0 / ¥0Input/Output
93
hunyuan-vision-1.5-thinking
Tencent
74.3
583
-
-
94
gpt-5.2
Openai
74.0
13.7K
400K
¥12.6 / ¥101Input/Output
95
gpt-5-high
Openai
73.7
7.7K
400K
¥9 / ¥72Input/Output
96
gemini-2.5-flash
Google
73.5
31.3K
1.05M
¥2.16 / ¥18Input/Output
97
deepseek-v3.1
Deepseek
73.2
3.4K
128K
¥1.44 / ¥5.04Input/Output
98
deepseek-r1-0528
Deepseek
72.9
3.5K
164K
¥3.6 / ¥15.5Input/Output
99
ernie-5.0-preview-1022
Baidu
72.6
1.4K
128K
¥7.92 / ¥14.4Input/Output
100
mimo-v2-flash (thinking)
Xiaomi
72.3
2.8K
262K
¥0.72 / ¥2.16Input/Output
101
qwen3-vl-235b-a22b-thinking
Alibaba
72.1
2.1K
131K
¥2.06 / ¥8.26Input/Output
102
grok-4-fast-reasoning
Xai
71.8
5.2K
2M
¥1.44 / ¥3.6Input/Output
103
grok-4-0709
Xai
71.5
10.4K
256K
¥21.6 / ¥108Input/Output
104
amazon-nova-experimental-chat-10-20
Amazon
71.2
3K
-
-
105
grok-4.3
Xai
70.9
5.3K
1M
¥9 / ¥18Input/Output
106
grok-4-1-fast-reasoning
Xai
70.7
15.1K
2M
¥1.44 / ¥3.6Input/Output
107
deepseek-v3.1-terminus
Deepseek
70.4
1K
128K
¥1.8 / ¥5.04Input/Output
108
qwen3.5-35b-a3b
Alibaba
70.1
8.1K
262K
¥1.8 / ¥14.4Input/Output
109
gemini-2.5-flash-preview-09-2025
Google
69.8
8.8K
1M
¥2.16 / ¥18Input/Output
110
claude-opus-4-20250514-thinking-16k
Anthropic
69.6
8.4K
200K
¥108 / ¥540Input/Output
111
gpt-4.5-preview-2025-02-27
Openai
69.3
2.3K
8.19K
¥216 / ¥432Input/Output
112
qwen3.5-flash
Alibaba
69.0
8.7K
1M
¥1.24 / ¥12.4Input/Output
113
o3-2025-04-16
Openai
68.7
14K
200K
¥14.4 / ¥57.6Input/Output
114
qwen3-30b-a3b-instruct-2507
Alibaba
68.4
5.6K
262K
¥2.16 / ¥3.6Input/Output
115
gpt-5-chat
Openai
68.2
7.8K
400K
¥9 / ¥72Input/Output
116
gemini-3.1-flash-lite-preview
Google
67.9
10.7K
1.05M
¥1.8 / ¥10.8Input/Output
117
nvidia-nemotron-3-super-120b-a12b
Nvidia
67.6
2K
262K
¥1.44 / ¥5.76Input/Output
118
gpt-5.3-chat-latest
Openai
67.3
9.2K
128K
¥12.6 / ¥101Input/Output
119
hunyuan-t1-20250711
Tencent
67.0
931
131K
¥0 / ¥0Input/Output
120
qwen3-235b-a22b-no-thinking
Alibaba
66.8
8.6K
131K
¥2.07 / ¥8.26Input/Output
121
claude-sonnet-4-20250514-thinking-32k
Anthropic
66.5
8.1K
200K
¥21.6 / ¥108Input/Output
122
gpt-5.4-nano-high
Openai
66.2
8.2K
400K
¥1.44 / ¥9Input/Output
123
gpt-4.1-2025-04-14
Openai
65.9
11.9K
1.05M
¥14.4 / ¥57.6Input/Output
124
glm-4.6v
Zai
65.6
718
128K
¥2.16 / ¥6.48Input/Output
125
glm-4.5-air
Zai
65.4
7.7K
131K
¥0 / ¥0Input/Output
126
qwen3-next-80b-a3b-thinking
Alibaba
65.1
3.5K
131K
¥1.04 / ¥10.3Input/Output
127
gpt-5-mini-high
Openai
64.8
6.5K
400K
¥1.8 / ¥14.4Input/Output
128
minimax-m2.5
Minimax
64.5
10.6K
205K
¥0 / ¥0Input/Output
129
claude-opus-4-20250514
Anthropic
64.2
9.9K
200K
¥108 / ¥540Input/Output
130
qwen3-coder-480b-a35b-instruct
Alibaba
64.0
6K
262K
¥6.2 / ¥24.8Input/Output
131
kimi-k2-0905-preview
Moonshot
63.7
2.8K
262K
¥4.32 / ¥18Input/Output
132
ling-flash-2.0
Ant Group
63.4
1.8K
131K
¥1.01 / ¥4.1Input/Output
133
nova-2-lite
Amazon
63.1
3.2K
128K
¥2.38 / ¥19.8Input/Output
134
gemini-2.5-flash-lite-preview-09-2025-no-thinking
Google
62.8
12.5K
1.05M
¥0.72 / ¥2.88Input/Output
135
grok-3-mini-high
Xai
62.6
4.1K
128K
¥0 / ¥0Input/Output
136
o1-preview
Openai
62.3
4.9K
128K
¥108 / ¥432Input/Output
137
o3-mini-high
Openai
62.0
2.9K
200K
¥7.92 / ¥31.7Input/Output
138
deepseek-v3-0324
Deepseek
61.7
10.2K
75K
¥1.44 / ¥5.76Input/Output
139
mistral-medium-2505
Mistral
61.5
7.3K
262K
¥2.88 / ¥14.4Input/Output
140
mercury-2
Inception Ai
61.2
865
128K
¥1.8 / ¥5.4Input/Output
141
deepseek-r1
Deepseek
60.9
2.7K
164K
¥5.04 / ¥18Input/Output
142
ring-flash-2.0
Ant Group
60.6
1.9K
131K
¥1.01 / ¥4.1Input/Output
143
minimax-m2
Minimax
60.3
1.9K
197K
¥0 / ¥0Input/Output
144
step-3
Stepfun
60.1
1.6K
65.5K
¥1.8 / ¥4.68Input/Output
145
gemini-2.5-flash-lite-preview-06-17-thinking
Google
59.8
7.5K
65.5K
¥0.72 / ¥2.88Input/Output
146
grok-3-mini-beta
Xai
59.5
5.3K
1M
¥9 / ¥18Input/Output
147
o1-2024-12-17
Openai
59.2
4.2K
128K
¥108 / ¥432Input/Output
148
nvidia-nemotron-3-nano-30b-a3b-bf16
Nvidia
58.9
3.9K
131K
¥0 / ¥0Input/Output
149
intellect-3
-
58.7
1.4K
131K
¥1.44 / ¥7.92Input/Output
150
hunyuan-turbos-20250416
Tencent
58.4
2.2K
131K
¥0 / ¥0Input/Output
151
glm-4.7-flash
Zai
58.1
3.1K
200K
¥0 / ¥0Input/Output
152
qwen3-235b-a22b
Alibaba
57.8
5.8K
131K
¥2.07 / ¥8.26Input/Output
153
kimi-k2-0711-preview
Moonshot
57.5
6.4K
131K
¥4.32 / ¥18Input/Output
154
gpt-oss-120b
Openai
57.3
7.6K
131K
¥1.08 / ¥4.32Input/Output
155
o4-mini-2025-04-16
Openai
57.0
10.5K
200K
¥7.92 / ¥31.7Input/Output
156
trinity-large-preview
-
56.7
8.4K
262K
¥1.8 / ¥6.48Input/Output
157
claude-sonnet-4-20250514
Anthropic
56.4
9.3K
200K
¥21.6 / ¥108Input/Output
158
qwen2.5-max
Alibaba
56.1
6K
32K
¥11.5 / ¥46Input/Output
159
gpt-4.1-mini-2025-04-14
Openai
55.9
8.8K
1.05M
¥2.88 / ¥11.5Input/Output
160
amazon-nova-experimental-chat-10-09
Amazon
55.6
779
-
-
161
mistral-small-2506
Mistral
55.3
4.1K
262K
¥2.88 / ¥14.4Input/Output
162
trinity-large-thinking
-
55.0
7.6K
262K
¥1.8 / ¥6.48Input/Output
163
glm-4.5v
Zai
54.7
1.2K
64K
¥4.32 / ¥13Input/Output
164
gemini-2.0-flash-001
Google
54.5
8.5K
1.05M
¥1.08 / ¥4.32Input/Output
165
step-1o-turbo-202506
Stepfun
54.2
1.9K
-
-
166
o1-mini
Openai
53.9
8.4K
128K
¥7.92 / ¥31.7Input/Output
167
minimax-m1
Minimax
53.6
8.2K
1M
¥0.95 / ¥9.03Input/Output
168
gemma-3-27b-it
Google
53.4
9.9K
128K
¥2.15 / ¥2.15Input/Output
169
qwen3-32b
Alibaba
53.1
729
131K
¥2.07 / ¥8.26Input/Output
170
olmo-3.1-32b-instruct
Allenai
52.8
3K
200K
¥14.4 / ¥57.6Input/Output
171
o3-mini
Openai
52.5
11.3K
200K
¥7.92 / ¥31.7Input/Output
172
claude-3-7-sonnet-20250219-thinking-32k
Anthropic
52.2
7.8K
-
-
173
qwq-32b
Alibaba
52.0
5.1K
131K
¥2.07 / ¥6.2Input/Output
174
llama-3.3-nemotron-49b-super-v1
Nvidia
51.7
344
131K
¥0 / ¥0Input/Output
175
gpt-5-nano-high
Openai
51.4
2K
400K
¥0.36 / ¥2.88Input/Output
176
hunyuan-turbos-20250226
Tencent
51.1
367
131K
¥0 / ¥0Input/Output
177
nvidia-llama-3.3-nemotron-super-49b-v1.5
Nvidia
50.8
710
131K
¥2.88 / ¥2.88Input/Output
178
llama-3.1-nemotron-ultra-253b-v1
Nvidia
50.6
448
128K
¥4.32 / ¥13Input/Output
179
command-a-03-2025
Cohere
50.3
12.7K
256K
¥18 / ¥72Input/Output
180
qwen-plus-0125
Alibaba
50.0
964
1M
¥0.83 / ¥2.07Input/Output
181
olmo-3-32b-think
Allenai
49.7
1.5K
128K
¥2.16 / ¥3.24Input/Output
182
qwen3-30b-a3b
Alibaba
49.4
5.8K
128K
¥0.79 / ¥7.78Input/Output
183
gemini-2.0-flash-lite-preview-02-05
Google
49.2
4.1K
1.05M
¥0.54 / ¥2.16Input/Output
184
hunyuan-turbo-0110
Tencent
48.9
343
-
-
185
deepseek-v3
Deepseek
48.6
3.6K
128K
¥0 / ¥0Input/Output
186
yi-lightning
-
48.3
3.9K
12K
¥1.44 / ¥1.44Input/Output
187
granite-4.1-8b
Ibm
48.0
1.2K
131K
¥0.36 / ¥0.72Input/Output
188
claude-3-7-sonnet-20250219
Anthropic
47.8
8.8K
200K
¥21.6 / ¥108Input/Output
189
olmo-3.1-32b-think
Allenai
47.5
2.1K
200K
¥14.4 / ¥57.6Input/Output
190
qwen2.5-plus-1127
Alibaba
47.2
1.7K
-
-
191
gemma-3-12b-it
Google
46.9
700
128K
¥1.96 / ¥1.96Input/Output
192
mercury
Inception Ai
46.6
519
128K
¥1.8 / ¥5.4Input/Output
193
claude-3-5-sonnet-20241022
Anthropic
46.4
15.8K
200K
¥21.6 / ¥108Input/Output
194
step-2-16k-exp-202412
Stepfun
46.1
786
16.4K
¥37.5 / ¥118Input/Output
195
athene-v2-chat
-
45.8
4K
-
-
196
deepseek-v2.5-1210
Deepseek
45.5
1.1K
1M
¥1.01 / ¥2.02Input/Output
197
gemini-1.5-pro-002
Google
45.3
9.1K
-
-
198
molmo-2-8b
Allenai
45.0
218
-
-
199
mistral-small-3.1-24b-instruct-2503
Mistral
44.7
7.6K
262K
¥2.88 / ¥14.4Input/Output
200
llama-4-maverick-17b-128e-instruct
Meta
44.4
8.7K
1M
¥1.8 / ¥6.26Input/Output
201
hunyuan-large-2025-02-10
Tencent
44.1
601
-
-
202
gpt-4o-2024-05-13
Openai
43.9
19.6K
128K
¥36 / ¥108Input/Output
203
glm-4-plus-0111
Zai
43.6
955
128K
¥72 / ¥72Input/Output
204
llama-3.1-405b-instruct-bf16
Meta
43.3
6.8K
128K
¥0 / ¥0Input/Output
205
gpt-4.1-nano-2025-04-14
Openai
43.0
1.1K
1.05M
¥14.4 / ¥57.6Input/Output
206
gemma-3n-e4b-it
Google
42.7
4.6K
128K
¥0 / ¥0Input/Output
207
gpt-oss-20b
Openai
42.5
2.4K
131K
¥0.32 / ¥1.3Input/Output
208
llama-4-scout-17b-16e-instruct
Meta
42.2
6.7K
128K
¥1.44 / ¥5.62Input/Output
209
magistral-medium-2506
Mistral
41.9
2.8K
128K
¥14.4 / ¥36Input/Output
210
llama-3.1-405b-instruct-fp8
Meta
41.6
10K
128K
¥0 / ¥0Input/Output
211
llama-3.1-nemotron-70b-instruct
Nvidia
41.3
1.1K
128K
¥0 / ¥0Input/Output
212
claude-3-5-sonnet-20240620
Anthropic
41.1
14.3K
200K
¥21.6 / ¥108Input/Output
213
qwen-max-0919
Alibaba
40.8
2.5K
131K
¥2.48 / ¥9.91Input/Output
214
deepseek-v2.5
Deepseek
40.5
4.1K
1M
¥1.01 / ¥2.02Input/Output
215
grok-2-2024-08-13
Xai
40.2
10.5K
1M
¥9 / ¥18Input/Output
216
glm-4-plus
Zai
39.9
4K
128K
¥54 / ¥54Input/Output
217
qwen2.5-72b-instruct
Alibaba
39.7
6.2K
131K
¥4.13 / ¥12.4Input/Output
218
gpt-4o-mini-2024-07-18
Openai
39.4
11.4K
128K
¥1.08 / ¥4.32Input/Output
219
llama-3.3-70b-instruct
Meta
39.1
10.1K
128K
¥0 / ¥0Input/Output
220
gpt-4o-2024-08-06
Openai
38.8
8K
128K
¥18 / ¥72Input/Output
221
hunyuan-large-vision
Tencent
38.5
1.2K
-
-
222
hunyuan-standard-2025-02-10
Tencent
38.3
587
-
-
223
qwen2.5-coder-32b-instruct
Alibaba
38.0
759
131K
¥2.07 / ¥6.2Input/Output
224
mistral-large-2407
Mistral
37.7
7.8K
131K
¥14.4 / ¥43.2Input/Output
225
grok-2-mini-2024-08-13
Xai
37.4
8.7K
1M
¥9 / ¥18Input/Output
226
mistral-large-2411
Mistral
37.2
4.5K
128K
¥14.4 / ¥43.2Input/Output
227
ibm-granite-h-small
Ibm
36.9
1.6K
-
-
228
gpt-4-turbo-2024-04-09
Openai
36.6
17.9K
128K
¥72 / ¥216Input/Output
229
claude-3-5-haiku-20241022
Anthropic
36.3
13.4K
200K
¥5.76 / ¥28.8Input/Output
230
gemini-1.5-flash-002
Google
36.0
5.5K
2M
¥0.54 / ¥2.2Input/Output
231
llama-3.1-70b-instruct
Meta
35.8
9.1K
131K
¥2.88 / ¥2.88Input/Output
232
gemini-1.5-pro-001
Google
35.5
13.6K
-
-
233
amazon-nova-pro-v1.0
Amazon
35.2
4.1K
300K
¥5.76 / ¥23Input/Output
234
athene-70b-0725
-
34.9
3.6K
-
-
235
gpt-4-1106-preview
Openai
34.6
18.3K
8.19K
¥216 / ¥432Input/Output
236
gemma-3-4b-it
Google
34.4
764
128K
¥1.44 / ¥1.44Input/Output
237
gpt-4-0125-preview
Openai
34.1
16.8K
8.19K
¥216 / ¥432Input/Output
238
gemini-advanced-0514
Google
33.8
8.4K
-
-
239
claude-3-opus-20240229
Anthropic
33.5
34.3K
200K
¥108 / ¥540Input/Output
240
mistral-small-24b-instruct-2501
Mistral
33.2
2.4K
262K
¥2.88 / ¥14.4Input/Output
241
olmo-2-0325-32b-instruct
Allenai
33.0
538
-
-
242
llama-3.1-tulu-3-70b
Allenai
32.7
499
-
-
243
jamba-1.5-large
-
32.4
1.5K
256K
¥0 / ¥0Input/Output
244
llama-3-70b-instruct
Meta
32.1
29.6K
8.19K
¥3.67 / ¥5.33Input/Output
245
phi-4
Microsoft
31.8
3.8K
128K
¥0.9 / ¥3.6Input/Output
246
hunyuan-standard-256k
Tencent
31.6
370
-
-
247
amazon-nova-lite-v1.0
Amazon
31.3
3.1K
300K
¥0.43 / ¥1.73Input/Output
248
gemini-1.5-flash-001
Google
31.0
11K
2M
¥0.54 / ¥2.2Input/Output
249
glm-4-0520
Zai
30.7
1.8K
128K
¥108 / ¥108Input/Output
250
deepseek-coder-v2
Deepseek
30.4
2.7K
1M
¥1.01 / ¥2.02Input/Output
251
reka-core-20240904
-
30.2
1.3K
-
-
252
gemini-1.5-flash-8b-001
Google
29.9
5.6K
2M
¥0.54 / ¥2.2Input/Output
253
gpt-4-0314
Openai
29.6
9.8K
8.19K
¥216 / ¥432Input/Output
254
ministral-8b-2410
Mistral
29.3
703
128K
¥0.72 / ¥0.72Input/Output
255
qwen2-72b-instruct
Alibaba
29.1
6.5K
131K
¥4.13 / ¥12.4Input/Output
256
gemma-2-27b-it
Google
28.8
12.6K
8.19K
¥0.58 / ¥0.58Input/Output
257
llama-3.1-nemotron-51b-instruct
Nvidia
28.5
578
128K
¥0 / ¥0Input/Output
258
amazon-nova-micro-v1.0
Amazon
28.2
3.1K
128K
¥0.25 / ¥1.01Input/Output
259
claude-3-sonnet-20240229
Anthropic
27.9
19.2K
200K
¥21.6 / ¥108Input/Output
260
nemotron-4-340b-instruct
Nvidia
27.7
3.3K
-
-
261
gemma-2-9b-it-simpo
-
27.4
1.7K
8.19K
¥1.44 / ¥1.44Input/Output
262
c4ai-aya-expanse-32b
Cohere
27.1
4.3K
-
-
263
llama-3.1-8b-instruct
Meta
26.8
8.2K
131K
¥0.79 / ¥0.79Input/Output
264
command-r-plus-08-2024
Cohere
26.5
1.7K
128K
¥18 / ¥72Input/Output
265
internlm2_5-20b-chat
-
26.3
1.5K
-
-
266
gpt-4-0613
Openai
26.0
16.2K
8.19K
¥216 / ¥432Input/Output
267
reka-flash-20240904
-
25.7
1.4K
65.5K
¥0.72 / ¥1.44Input/Output
268
qwen1.5-110b-chat
Alibaba
25.4
4.6K
-
-
269
jamba-1.5-mini
-
25.1
1.5K
256K
¥0 / ¥0Input/Output
270
claude-3-haiku-20240307
Anthropic
24.9
20.7K
200K
¥1.8 / ¥9Input/Output
271
yi-1.5-34b-chat
-
24.6
3.9K
-
-
272
mistral-large-2402
Mistral
24.3
11.3K
262K
¥2.88 / ¥14.4Input/Output
273
llama-3.1-tulu-3-8b
Allenai
24.0
452
-
-
274
gemma-2-9b-it
Google
23.7
9.1K
8.19K
¥1.44 / ¥1.44Input/Output
275
command-r-plus
Cohere
23.5
14K
128K
¥18 / ¥72Input/Output
276
qwq-32b-preview
Alibaba
23.2
519
131K
¥2.07 / ¥6.2Input/Output
277
command-r-08-2024
Cohere
22.9
1.8K
128K
¥18 / ¥72Input/Output
278
llama-3-8b-instruct
Meta
22.6
19.9K
8.19K
¥0.29 / ¥0.29Input/Output
279
granite-3.1-8b-instruct
Ibm
22.3
517
-
-
280
mixtral-8x22b-instruct-v0.1
Mistral
22.1
9.5K
64K
¥14.4 / ¥43.2Input/Output
281
qwen1.5-72b-chat
Alibaba
21.8
7.6K
-
-
282
mistral-medium
Mistral
21.5
6.4K
262K
¥2.88 / ¥14.4Input/Output
283
granite-3.1-2b-instruct
Ibm
21.2
511
-
-
284
c4ai-aya-expanse-8b
Cohere
20.9
1.6K
-
-
285
reka-flash-21b-20240226-online
-
20.7
2.9K
-
-
286
qwen1.5-32b-chat
Alibaba
20.4
3.8K
-
-
287
reka-flash-21b-20240226
-
20.1
4.8K
-
-
288
phi-3-medium-4k-instruct
Microsoft
19.8
4.1K
4.1K
¥1.22 / ¥4.9Input/Output
289
llama-3.2-3b-instruct
Meta
19.6
1.4K
131K
¥0.22 / ¥0.35Input/Output
290
dbrx-instruct-preview
-
19.3
5.5K
-
-
291
mixtral-8x7b-instruct-v0.1
Mistral
19.0
13.4K
32K
¥5.04 / ¥5.04Input/Output
292
zephyr-orpo-141b-A35b-v0.1
-
18.7
820
200K
¥108 / ¥432Input/Output
293
starling-lm-7b-beta
-
18.4
2.9K
200K
¥5.4 / ¥18.7Input/Output
294
gemma-2-2b-it
Google
18.2
7.6K
128K
¥0 / ¥0Input/Output
295
phi-3-small-8k-instruct
Microsoft
17.9
3.1K
8.19K
¥1.08 / ¥4.32Input/Output
296
tulu-2-dpo-70b
-
17.6
1.2K
-
-
297
qwen1.5-14b-chat
Alibaba
17.3
3K
-
-
298
command-r
Cohere
17.0
9.3K
128K
¥18 / ¥72Input/Output
299
gpt-3.5-turbo-0125
Openai
16.8
12.4K
16.4K
¥3.6 / ¥10.8Input/Output
300
granite-3.0-8b-instruct
Ibm
16.5
900
-
-
301
yi-34b-chat
-
16.2
2.8K
-
-
302
gemini-pro-dev-api
Google
15.9
3.3K
1.05M
¥14.4 / ¥86.4Input/Output
303
phi-3-mini-4k-instruct-june-2024
Microsoft
15.6
2K
4.1K
¥0.94 / ¥3.74Input/Output
304
gpt-3.5-turbo-1106
Openai
15.4
3.2K
16.4K
¥7.2 / ¥14.4Input/Output
305
gemini-pro
Google
15.1
1.2K
1.05M
¥14.4 / ¥86.4Input/Output
306
phi-3-mini-4k-instruct
Microsoft
14.8
3.4K
4.1K
¥0.94 / ¥3.74Input/Output
307
wizardlm-70b
Microsoft
14.5
1.5K
-
-
308
openchat-3.5-0106
-
14.2
2.5K
-
-
309
starling-lm-7b-alpha
-
14.0
1.9K
200K
¥5.4 / ¥18.7Input/Output
310
llama-2-70b-chat
Meta
13.7
7K
-
-
311
gemma-1.1-7b-it
Google
13.4
4.2K
-
-
312
mistral-7b-instruct-v0.2
Mistral
13.1
3.7K
262K
¥2.88 / ¥14.4Input/Output
313
snowflake-arctic-instruct
-
12.8
6.5K
-
-
314
granite-3.0-2b-instruct
Ibm
12.6
974
-
-
315
llama-3.2-1b-instruct
Meta
12.3
1.4K
16.4K
¥0.07 / ¥0.08Input/Output
316
openhermes-2.5-mistral-7b
-
12.0
917
1M
¥36 / ¥180Input/Output
317
deepseek-llm-67b-chat
Deepseek
11.7
946
1M
¥1.01 / ¥2.02Input/Output
318
mpt-30b-chat
-
11.5
359
-
-
319
vicuna-33b
-
11.2
4K
-
-
320
codellama-70b-instruct
Meta
10.9
207
-
-
321
solar-10.7b-instruct-v1.0
-
10.6
765
128K
¥0 / ¥0Input/Output
322
qwen1.5-7b-chat
Alibaba
10.3
1K
-
-
323
llama-2-13b-chat
Meta
10.1
3.3K
-
-
324
openchat-3.5
-
9.8
1.4K
-
-
325
smollm2-1.7b-instruct
-
9.5
334
-
-
326
dolphin-2.2.1-mistral-7b
-
9.2
286
262K
¥2.88 / ¥14.4Input/Output
327
nous-hermes-2-mixtral-8x7b-dpo
-
8.9
929
1M
¥36 / ¥180Input/Output
328
gemma-7b-it
Google
8.7
1.6K
-
-
329
llama2-70b-steerlm-chat
Nvidia
8.4
645
-
-
330
phi-3-mini-128k-instruct
Microsoft
8.1
4.1K
128K
¥0.94 / ¥3.74Input/Output
331
zephyr-7b-beta
-
7.8
1.8K
-
-
332
codellama-34b-instruct
Meta
7.5
1.3K
-
-
333
zephyr-7b-alpha
-
7.3
278
-
-
334
qwen-14b-chat
Alibaba
7.0
830
32.8K
¥1.04 / ¥3.1Input/Output
335
vicuna-13b
-
6.7
3.2K
-
-
336
palm-2
Google
6.4
1.4K
-
-
337
wizardlm-13b
Microsoft
6.1
1.1K
-
-
338
llama-2-7b-chat
Meta
5.9
2.6K
128K
¥4.03 / ¥48Input/Output
339
gemma-1.1-2b-it
Google
5.6
1.9K
-
-
340
mistral-7b-instruct
Mistral
5.3
1.6K
262K
¥2.88 / ¥14.4Input/Output
341
guanaco-33b
-
5.0
434
200K
¥14.4 / ¥57.6Input/Output
342
stripedhyena-nous-7b
-
4.7
1K
-
-
343
olmo-7b-instruct
Allenai
4.5
1.1K
-
-
344
vicuna-7b
-
4.2
1.1K
-
-
345
gemma-2b-it
Google
3.9
832
-
-
346
qwen1.5-4b-chat
Alibaba
3.6
1.4K
-
-
347
chatglm3-6b
-
3.4
835
200K
¥5.4 / ¥18.7Input/Output
348
gpt4all-13b-snoozy
-
3.1
287
1M
¥36 / ¥216Input/Output
349
koala-13b
-
2.8
1.1K
-
-
350
chatglm2-6b
-
2.5
408
200K
¥5.4 / ¥18.7Input/Output
351
mpt-7b-chat
-
2.2
626
-
-
352
RWKV-4-Raven-14B
-
2.0
771
-
-
353
oasst-pythia-12b
-
1.7
1K
-
-
354
chatglm-6b
-
1.4
732
200K
¥5.4 / ¥18.7Input/Output
355
stablelm-tuned-alpha-7b
-
1.1
497
-
-
356
alpaca-13b
-
0.8
899
-
-
357
fastchat-t5-3b
-
0.6
690
-
-
358
dolly-v2-12b
-
0.3
523
-
-
359
llama-13b
Meta
0.0
371
-
-
Top model analysis

claude-opus-4-6-thinking why it ranks first

claude-opus-4-6-thinking ranks first with a percent score of 100.0 and 9.8K samples. Use it as the first option for this leaderboard, then compare price, context and availability.

How to choose

Do not only look at rank #1

Start with the leaderboard closest to your task. Compare the top models by score and sample size, then check price, context length, open or closed access, and provider availability.

FAQ

FAQ

复杂提示词(英文)排行榜看什么指标?

主要看排名、百分制分数、样本量和来源。分数用于快速比较同一榜单内模型表现,样本量用于判断结果稳定性。

为什么不同榜单不能直接混合成总分?

不同榜单的任务、样本和评测口径不同,模力榜默认只在同一榜单内排序,避免把写作、代码、图像等能力强行合并。

复杂提示词(英文)模型应该怎么选?

优先看与你任务最接近的榜单,再结合价格、上下文长度、开源闭源和厂商可用性。排名靠前不代表适合所有预算和部署方式。

榜单多久更新?

页面展示的是最新成功采集的公开榜单数据。当前优先使用 LMArena leaderboard dataset,并在页面来源中保留原始链接。