Code · Web Development · Overall Leaderboard

Ranking for Web Development / Overall, based on public preference data.

Selection guide

Overall model ranking guide

Ranking for Web Development / Overall, based on public preference data.

claude-opus-4-7-thinkingclaude-opus-4-7claude-opus-4-6-thinkingqwen3.7-max-20260517claude-opus-4-6
Current DirectoryCode · Web Development · Overall
Models81
Published2026/05/25
Arena public preference evaluationOriginal leaderboard: WebDev / OverallPublished: 2026/05/25Leaderboard dataset: LMArena latest parquetOpen Arena sourceOpen leaderboard dataset
1
claude-opus-4-7-thinking
Anthropic
100.0
5.3K
1M
¥36 / ¥180Input/Output
2
claude-opus-4-7
Anthropic
98.8
4.9K
1M
¥36 / ¥180Input/Output
3
claude-opus-4-6-thinking
Anthropic
97.5
7.9K
1M
¥36 / ¥180Input/Output
4
qwen3.7-max-20260517
Alibaba
96.3
1.5K
1M
¥18 / ¥54Input/Output
5
claude-opus-4-6
Anthropic
95.0
8.9K
1M
¥36 / ¥180Input/Output
6
glm-5.1
Zai
93.8
3.6K
200K
¥0 / ¥0Input/Output
7
claude-sonnet-4-6
Anthropic
92.5
11.1K
1M
¥21.6 / ¥108Input/Output
8
kimi-k2.6
Moonshot
91.3
4K
262K
¥6.84 / ¥28.8Input/Output
9
muse-spark
Meta
90.0
1.6K
-
-
10
gemini-3.5-flash
Google
88.8
2.2K
1.05M
¥10.8 / ¥64.8Input/Output
11
gpt-5.5-xhigh (codex-harness)
Openai
87.5
4.1K
400K
¥9 / ¥72Input/Output
12
claude-opus-4-5-20251101-thinking-32k
Anthropic
86.3
13.1K
200K
¥108 / ¥540Input/Output
13
qwen3.6-max-preview
Alibaba
85.0
2.5K
246K
¥9.5 / ¥56.9Input/Output
14
gpt-5.5-high (codex-harness)
Openai
83.8
4.3K
400K
¥9 / ¥72Input/Output
15
mimo-v2.5-pro
Xiaomi
82.5
4.7K
1.05M
¥7.2 / ¥21.6Input/Output
16
claude-opus-4-5-20251101
Anthropic
81.3
15.3K
200K
¥36 / ¥180Input/Output
17
deepseek-v4-pro-thinking
Deepseek
80.0
4K
1M
¥3.13 / ¥6.26Input/Output
18
qwen3.6-plus
Alibaba
78.8
6.1K
1M
¥3.6 / ¥21.6Input/Output
19
gpt-5.4-high (codex-harness)
Openai
77.5
1.5K
400K
¥9 / ¥72Input/Output
20
gemini-3.1-pro-preview
Google
76.3
10.3K
1.05M
¥14.4 / ¥86.4Input/Output
21
gpt-5.5 (codex-harness)
Openai
75.0
4.1K
400K
¥9 / ¥72Input/Output
22
glm-4.7
Zai
73.8
4.9K
205K
¥0 / ¥0Input/Output
23
mimo-v2.5
Xiaomi
72.5
3.7K
1.05M
¥2.88 / ¥14.4Input/Output
24
gemini-3-pro
Google
71.3
17.2K
1.05M
¥14.4 / ¥86.4Input/Output
25
gpt-5.4-medium (codex-harness)
Openai
70.0
1.4K
400K
¥9 / ¥72Input/Output
26
gemini-3-flash
Google
68.8
13.3K
1.05M
¥3.6 / ¥21.6Input/Output
27
glm-5
Zai
67.5
6.6K
205K
¥7.2 / ¥23Input/Output
28
mimo-v2-pro
Xiaomi
66.3
6.8K
1.05M
¥7.2 / ¥21.6Input/Output
29
kimi-k2.5-thinking
Moonshot
65.0
10.7K
262K
¥4.32 / ¥21.6Input/Output
30
kimi-k2.5-instant
Moonshot
63.8
3.6K
262K
¥4.32 / ¥21.6Input/Output
31
gpt-5.3-codex (codex-harness)
Openai
62.5
3K
400K
¥9 / ¥72Input/Output
32
gpt-5.2
Openai
61.3
1.5K
400K
¥12.6 / ¥101Input/Output
33
gpt-5.4-mini-high
Openai
60.0
5.5K
400K
¥5.4 / ¥32.4Input/Output
34
minimax-m2.7
Minimax
58.8
6.3K
205K
¥0 / ¥0Input/Output
35
grok-4.20-beta-0309-reasoning
Xai
57.5
7.2K
2M
¥14.4 / ¥43.2Input/Output
36
gpt-5-medium
Openai
56.3
3.8K
400K
¥9 / ¥72Input/Output
37
qwen3.5-397b-a17b
Alibaba
55.0
9.7K
262K
¥3.1 / ¥18.6Input/Output
38
minimax-m2.1-preview
Minimax
53.8
9.3K
205K
¥0 / ¥0Input/Output
39
gpt-5.1-medium
Openai
52.5
6.1K
400K
¥9 / ¥72Input/Output
40
gpt-5.4
Openai
51.3
239
1.05M
¥18 / ¥108Input/Output
41
claude-sonnet-4-5-20250929-thinking-32k
Anthropic
50.0
15.7K
200K
¥21.6 / ¥108Input/Output
42
gemini-3-flash (thinking-minimal)
Google
48.8
16.4K
1.05M
¥3.6 / ¥21.6Input/Output
43
claude-sonnet-4-5-20250929
Anthropic
47.5
18.4K
200K
¥21.6 / ¥108Input/Output
44
claude-opus-4-1-20250805
Anthropic
46.3
8.6K
200K
¥108 / ¥540Input/Output
45
minimax-m2.5
Minimax
45.0
7.8K
205K
¥0 / ¥0Input/Output
46
gemma-4-31b
Google
43.8
3.4K
262K
¥3.24 / ¥7.2Input/Output
47
grok-4.3
Xai
42.5
3.5K
1M
¥9 / ¥18Input/Output
48
gpt-5.3-codex (codex-harness)
Openai
41.3
3.5K
400K
¥9 / ¥72Input/Output
49
deepseek-v3.2-thinking
Deepseek
40.0
7.9K
128K
¥2.09 / ¥3.1Input/Output
50
hunyuan-hy3-preview
Tencent
38.7
1.3K
256K
¥0 / ¥0Input/Output
51
qwen3.5-122b-a10b
Alibaba
37.5
8.1K
262K
¥2.88 / ¥23Input/Output
52
gemma-4-26b-a4b
Google
36.3
1.5K
262K
¥0.94 / ¥2.88Input/Output
53
qwen3.5-27b
Alibaba
35.0
7.7K
262K
¥2.16 / ¥17.3Input/Output
54
glm-4.6
Zai
33.8
8.3K
205K
¥4.32 / ¥15.8Input/Output
55
gpt-5.1
Openai
32.5
12.9K
400K
¥9 / ¥72Input/Output
56
mimo-v2-flash (non-thinking)
Xiaomi
31.3
6.7K
262K
¥0.72 / ¥2.16Input/Output
57
gpt-5.2-codex
Openai
30.0
7.8K
400K
¥12.6 / ¥101Input/Output
58
deepseek-v3.2
Deepseek
28.8
10.5K
128K
¥2.09 / ¥3.1Input/Output
59
kimi-k2-thinking-turbo
Moonshot
27.5
15.3K
262K
¥17.3 / ¥72Input/Output
60
gpt-5.1-codex
Openai
26.3
6.2K
400K
¥9 / ¥72Input/Output
61
claude-haiku-4-5-20251001
Anthropic
25.0
20.6K
200K
¥7.2 / ¥36Input/Output
62
minimax-m2
Minimax
23.8
8.4K
197K
¥0 / ¥0Input/Output
63
mimo-v2-flash (thinking)
Xiaomi
22.5
2.1K
262K
¥0.72 / ¥2.16Input/Output
64
deepseek-v3.2-exp
Deepseek
21.3
4.9K
128K
¥0 / ¥0Input/Output
65
qwen3-coder-480b-a35b-instruct
Alibaba
20.0
15.2K
262K
¥6.2 / ¥24.8Input/Output
66
KAT-Coder-Pro-V1
-
18.8
1.9K
256K
¥0.22 / ¥8.64Input/Output
67
qwen3.5-35b-a3b
Alibaba
17.5
1.8K
262K
¥1.8 / ¥14.4Input/Output
68
gemini-3.1-flash-lite-preview
Google
16.3
9.3K
1.05M
¥1.8 / ¥10.8Input/Output
69
trinity-large-thinking
-
15.0
1.3K
262K
¥1.8 / ¥6.48Input/Output
70
gpt-5.1-codex-mini
Openai
13.8
1.4K
400K
¥1.8 / ¥14.4Input/Output
71
qwen3.5-flash
Alibaba
12.5
1.6K
1M
¥1.24 / ¥12.4Input/Output
72
grok-4-1-fast-reasoning
Xai
11.3
6.9K
2M
¥1.44 / ¥3.6Input/Output
73
mistral-large-3
Mistral
10.0
1K
262K
¥3.6 / ¥10.8Input/Output
74
grok-4.1-thinking
Xai
8.8
1.2K
200K
¥14.4 / ¥72Input/Output
75
gemini-2.5-pro
Google
7.5
3.3K
1.05M
¥9 / ¥72Input/Output
76
granite-4.1-8b
Ibm
6.3
1.7K
131K
¥0.36 / ¥0.72Input/Output
77
devstral-2
Mistral
5.0
1.6K
262K
¥2.88 / ¥14.4Input/Output
78
mercury-2
Inception Ai
3.8
946
128K
¥1.8 / ¥5.4Input/Output
79
grok-4-fast-reasoning
Xai
2.5
933
2M
¥1.44 / ¥3.6Input/Output
80
grok-code-fast-1
Xai
1.3
982
256K
¥1.44 / ¥10.8Input/Output
81
devstral-medium-2507
Mistral
0.0
992
262K
¥2.88 / ¥14.4Input/Output
Top model analysis

claude-opus-4-7-thinking why it ranks first

claude-opus-4-7-thinking ranks first with a percent score of 100.0 and 5.3K samples. Use it as the first option for this leaderboard, then compare price, context and availability.

How to choose

Do not only look at rank #1

Start with the leaderboard closest to your task. Compare the top models by score and sample size, then check price, context length, open or closed access, and provider availability.

FAQ

FAQ

总榜排行榜看什么指标?

主要看排名、百分制分数、样本量和来源。分数用于快速比较同一榜单内模型表现,样本量用于判断结果稳定性。

为什么不同榜单不能直接混合成总分?

不同榜单的任务、样本和评测口径不同,模力榜默认只在同一榜单内排序,避免把写作、代码、图像等能力强行合并。

总榜模型应该怎么选?

优先看与你任务最接近的榜单,再结合价格、上下文长度、开源闭源和厂商可用性。排名靠前不代表适合所有预算和部署方式。

榜单多久更新?

页面展示的是最新成功采集的公开榜单数据。当前优先使用 LMArena leaderboard dataset,并在页面来源中保留原始链接。