Chat · Vision · Entity Recognition Leaderboard

Ranking for Vision / Entity Recognition, based on public preference data.

Selection guide

Entity Recognition model ranking guide

Ranking for Vision / Entity Recognition, based on public preference data.

gemini-3-progemini-3-flashgemini-3.1-pro-previewgemini-3-flash (thinking-minimal)qwen3.5-397b-a17b
Current DirectoryChat · Vision · Entity Recognition
Models32
Published2026/05/18
Arena public preference evaluationOriginal leaderboard: Vision / Entity RecognitionPublished: 2026/05/18Leaderboard dataset: LMArena latest parquetOpen Arena sourceOpen leaderboard dataset
1
gemini-3-pro
Google
100.0
243
1.05M
¥14.4 / ¥86.4Input/Output
2
gemini-3-flash
Google
96.8
206
1.05M
¥3.6 / ¥21.6Input/Output
3
gemini-3.1-pro-preview
Google
93.5
171
1.05M
¥14.4 / ¥86.4Input/Output
4
gemini-3-flash (thinking-minimal)
Google
90.3
182
1.05M
¥3.6 / ¥21.6Input/Output
5
qwen3.5-397b-a17b
Alibaba
87.1
95
262K
¥3.1 / ¥18.6Input/Output
6
grok-4-0709
Xai
83.9
341
256K
¥21.6 / ¥108Input/Output
7
gemini-2.5-pro
Google
80.6
865
1.05M
¥9 / ¥72Input/Output
8
gemini-3.1-flash-lite-preview
Google
77.4
105
1.05M
¥1.8 / ¥10.8Input/Output
9
kimi-k2.5-thinking
Moonshot
74.2
136
262K
¥4.32 / ¥21.6Input/Output
10
gpt-5-high
Openai
71.0
434
400K
¥9 / ¥72Input/Output
11
qwen3-vl-235b-a22b-instruct
Alibaba
67.7
148
128K
¥2.16 / ¥8.64Input/Output
12
chatgpt-4o-latest-20250326
Openai
64.5
295
128K
¥18 / ¥72Input/Output
13
gemini-2.5-flash
Google
61.3
579
1.05M
¥2.16 / ¥18Input/Output
14
gpt-5.1-high
Openai
58.1
106
400K
¥9 / ¥72Input/Output
15
o3-2025-04-16
Openai
54.8
506
200K
¥14.4 / ¥57.6Input/Output
16
gpt-5-mini-high
Openai
51.6
283
400K
¥1.8 / ¥14.4Input/Output
17
gpt-5-chat
Openai
48.4
383
400K
¥9 / ¥72Input/Output
18
o4-mini-2025-04-16
Openai
45.2
398
200K
¥7.92 / ¥31.7Input/Output
19
gpt-5.2-high
Openai
41.9
119
400K
¥12.6 / ¥101Input/Output
20
grok-4-1-fast-reasoning
Xai
38.7
63
2M
¥1.44 / ¥3.6Input/Output
21
gpt-5.2-chat-latest-20260210
Openai
35.5
84
400K
¥12.6 / ¥101Input/Output
22
gpt-4.1-2025-04-14
Openai
32.3
404
1.05M
¥14.4 / ¥57.6Input/Output
23
gemma-3-27b-it
Google
29.0
326
128K
¥2.15 / ¥2.15Input/Output
24
gpt-5.2
Openai
25.8
129
400K
¥12.6 / ¥101Input/Output
25
gpt-4.1-mini-2025-04-14
Openai
22.6
363
1.05M
¥2.88 / ¥11.5Input/Output
26
gemini-2.5-flash-lite-preview-06-17-thinking
Google
19.4
422
65.5K
¥0.72 / ¥2.88Input/Output
27
gpt-5.1
Openai
16.1
141
400K
¥9 / ¥72Input/Output
28
gemma-4-31b
Google
12.9
93
262K
¥3.24 / ¥7.2Input/Output
29
mistral-small-3.1-24b-instruct-2503
Mistral
9.7
249
262K
¥2.88 / ¥14.4Input/Output
30
mistral-small-2506
Mistral
6.5
212
262K
¥2.88 / ¥14.4Input/Output
31
mistral-medium-2508
Mistral
3.2
444
262K
¥2.88 / ¥14.4Input/Output
32
mistral-medium-2505
Mistral
0.0
201
262K
¥2.88 / ¥14.4Input/Output
Top model analysis

gemini-3-pro why it ranks first

gemini-3-pro ranks first with a percent score of 100.0 and 243 samples. Use it as the first option for this leaderboard, then compare price, context and availability.

How to choose

Do not only look at rank #1

Start with the leaderboard closest to your task. Compare the top models by score and sample size, then check price, context length, open or closed access, and provider availability.

FAQ

FAQ

实体识别排行榜看什么指标?

主要看排名、百分制分数、样本量和来源。分数用于快速比较同一榜单内模型表现,样本量用于判断结果稳定性。

为什么不同榜单不能直接混合成总分?

不同榜单的任务、样本和评测口径不同,模力榜默认只在同一榜单内排序,避免把写作、代码、图像等能力强行合并。

实体识别模型应该怎么选?

优先看与你任务最接近的榜单,再结合价格、上下文长度、开源闭源和厂商可用性。排名靠前不代表适合所有预算和部署方式。

榜单多久更新?

页面展示的是最新成功采集的公开榜单数据。当前优先使用 LMArena leaderboard dataset,并在页面来源中保留原始链接。