Chat · Search · Overall Leaderboard

Ranking for Search / Overall, based on public preference data.

Selection guide

Overall model ranking guide

Ranking for Search / Overall, based on public preference data.

claude-opus-4-6-searchgpt-5.5-searchclaude-opus-4-7ernie-5.1claude-sonnet-4-6-search
Current DirectoryChat · Search · Overall
Models29
Published2026/05/12
Arena public preference evaluationOriginal leaderboard: Search / OverallPublished: 2026/05/12Leaderboard dataset: LMArena latest parquetOpen Arena sourceOpen leaderboard dataset
1
claude-opus-4-6-search
Anthropic
100.0
48.7K
200K
¥108 / ¥540Input/Output
2
gpt-5.5-search
Openai
96.4
9.5K
400K
¥9 / ¥72Input/Output
3
claude-opus-4-7
Anthropic
92.9
9.7K
1M
¥36 / ¥180Input/Output
4
ernie-5.1
Baidu
89.3
2.3K
119K
¥5.4 / ¥21.6Input/Output
5
claude-sonnet-4-6-search
Anthropic
85.7
48.1K
200K
¥21.6 / ¥108Input/Output
6
gemini-3.1-pro-grounding
Google
82.1
28K
1.05M
¥14.4 / ¥86.4Input/Output
7
gpt-5.2-search
Openai
78.6
47.1K
400K
¥12.6 / ¥101Input/Output
8
grok-4.20-multi-agent-beta-0309
Xai
75.0
27.5K
2M
¥14.4 / ¥43.2Input/Output
9
gemini-3-pro-grounding
Google
71.4
37.3K
1.05M
¥14.4 / ¥86.4Input/Output
10
gemini-3-flash-grounding
Google
67.9
62.9K
1.05M
¥3.6 / ¥21.6Input/Output
11
gpt-5.1-search
Openai
64.3
53.7K
400K
¥9 / ¥72Input/Output
12
gpt-5.4-search
Openai
60.7
27.9K
400K
¥9 / ¥72Input/Output
13
grok-4.20-beta1
Xai
57.1
49K
2M
¥14.4 / ¥43.2Input/Output
14
grok-4.3
Xai
53.6
6.9K
1M
¥9 / ¥18Input/Output
15
claude-opus-4-5-search
Anthropic
50.0
53.4K
200K
¥108 / ¥540Input/Output
16
gpt-5.2-search-non-reasoning
Openai
46.4
65.5K
400K
¥12.6 / ¥101Input/Output
17
grok-4-1-fast-search
Xai
42.9
71.6K
1M
¥9 / ¥18Input/Output
18
grok-4-fast-search
Xai
39.3
43K
1M
¥9 / ¥18Input/Output
19
claude-sonnet-4-5-search
Anthropic
35.7
45.7K
200K
¥21.6 / ¥108Input/Output
20
claude-opus-4-1-search
Anthropic
32.1
71.3K
200K
¥108 / ¥540Input/Output
21
o3-search
Openai
28.6
20.7K
200K
¥14.4 / ¥57.6Input/Output
22
gemini-2.5-pro-grounding
Google
25.0
76.8K
1.05M
¥9 / ¥72Input/Output
23
grok-4-search
Xai
21.4
19.3K
1M
¥9 / ¥18Input/Output
24
ppl-sonar-reasoning-pro-high
Perplexity
17.9
29.1K
128K
¥7.2 / ¥7.2Input/Output
25
gpt-5-search
Openai
14.3
20.8K
400K
¥9 / ¥72Input/Output
26
ppl-sonar-pro-high
Perplexity
10.7
28.6K
128K
¥7.2 / ¥7.2Input/Output
27
claude-opus-4-search
Anthropic
7.1
31.1K
200K
¥108 / ¥540Input/Output
28
diffbot-small-xl
Diffbot
3.6
6.4K
-
-
29
api-gpt-4o-search
Openai
0.0
3.4K
128K
¥18 / ¥72Input/Output
Top model analysis

claude-opus-4-6-search why it ranks first

claude-opus-4-6-search ranks first with a percent score of 100.0 and 48.7K samples. Use it as the first option for this leaderboard, then compare price, context and availability.

How to choose

Do not only look at rank #1

Start with the leaderboard closest to your task. Compare the top models by score and sample size, then check price, context length, open or closed access, and provider availability.

Related leaderboards

Compare adjacent capabilities

FAQ

FAQ

总榜排行榜看什么指标?

主要看排名、百分制分数、样本量和来源。分数用于快速比较同一榜单内模型表现,样本量用于判断结果稳定性。

为什么不同榜单不能直接混合成总分?

不同榜单的任务、样本和评测口径不同,模力榜默认只在同一榜单内排序,避免把写作、代码、图像等能力强行合并。

总榜模型应该怎么选?

优先看与你任务最接近的榜单,再结合价格、上下文长度、开源闭源和厂商可用性。排名靠前不代表适合所有预算和部署方式。

榜单多久更新?

页面展示的是最新成功采集的公开榜单数据。当前优先使用 LMArena leaderboard dataset,并在页面来源中保留原始链接。