Chat · Vision · Captioning Leaderboard

Ranking for Vision / Captioning, based on public preference data.

Selection guide

Captioning model ranking guide

Ranking for Vision / Captioning, based on public preference data.

gemini-3-progemini-3.1-pro-previewgemini-2.5-progemini-3-flashgpt-5.2-high
Current DirectoryChat · Vision · Captioning
Models29
Published2026/05/18
Arena public preference evaluationOriginal leaderboard: Vision / CaptioningPublished: 2026/05/18Leaderboard dataset: LMArena latest parquetOpen Arena sourceOpen leaderboard dataset
1
gemini-3-pro
Google
100.0
224
1.05M
¥14.4 / ¥86.4Input/Output
2
gemini-3.1-pro-preview
Google
96.4
86
1.05M
¥14.4 / ¥86.4Input/Output
3
gemini-2.5-pro
Google
92.9
808
1.05M
¥9 / ¥72Input/Output
4
gemini-3-flash
Google
89.3
157
1.05M
¥3.6 / ¥21.6Input/Output
5
gpt-5.2-high
Openai
85.7
79
400K
¥12.6 / ¥101Input/Output
6
qwen3-vl-235b-a22b-instruct
Alibaba
82.1
148
128K
¥2.16 / ¥8.64Input/Output
7
gemini-2.5-flash
Google
78.6
596
1.05M
¥2.16 / ¥18Input/Output
8
kimi-k2.5-thinking
Moonshot
75.0
89
262K
¥4.32 / ¥21.6Input/Output
9
gpt-5.1-high
Openai
71.4
100
400K
¥9 / ¥72Input/Output
10
grok-4-0709
Xai
67.9
377
256K
¥21.6 / ¥108Input/Output
11
gemini-3-flash (thinking-minimal)
Google
64.3
138
1.05M
¥3.6 / ¥21.6Input/Output
12
gemma-4-31b
Google
60.7
50
262K
¥3.24 / ¥7.2Input/Output
13
chatgpt-4o-latest-20250326
Openai
57.1
286
128K
¥18 / ¥72Input/Output
14
gpt-5-chat
Openai
53.6
403
400K
¥9 / ¥72Input/Output
15
gpt-5-mini-high
Openai
50.0
302
400K
¥1.8 / ¥14.4Input/Output
16
o3-2025-04-16
Openai
46.4
564
200K
¥14.4 / ¥57.6Input/Output
17
gpt-4.1-2025-04-14
Openai
42.9
443
1.05M
¥14.4 / ¥57.6Input/Output
18
gpt-5-high
Openai
39.3
382
400K
¥9 / ¥72Input/Output
19
o4-mini-2025-04-16
Openai
35.7
442
200K
¥7.92 / ¥31.7Input/Output
20
gpt-5.1
Openai
32.1
125
400K
¥9 / ¥72Input/Output
21
gemini-2.5-flash-lite-preview-06-17-thinking
Google
28.6
406
65.5K
¥0.72 / ¥2.88Input/Output
22
gpt-4.1-mini-2025-04-14
Openai
25.0
410
1.05M
¥2.88 / ¥11.5Input/Output
23
gpt-5.2
Openai
21.4
87
400K
¥12.6 / ¥101Input/Output
24
mistral-medium-2508
Mistral
17.9
412
262K
¥2.88 / ¥14.4Input/Output
25
mistral-small-3.1-24b-instruct-2503
Mistral
14.3
281
262K
¥2.88 / ¥14.4Input/Output
26
gemma-3-27b-it
Google
10.7
273
128K
¥2.15 / ¥2.15Input/Output
27
gemini-2.0-flash-001
Google
7.1
110
1.05M
¥1.08 / ¥4.32Input/Output
28
mistral-medium-2505
Mistral
3.6
168
262K
¥2.88 / ¥14.4Input/Output
29
mistral-small-2506
Mistral
0.0
194
262K
¥2.88 / ¥14.4Input/Output
Top model analysis

gemini-3-pro why it ranks first

gemini-3-pro ranks first with a percent score of 100.0 and 224 samples. Use it as the first option for this leaderboard, then compare price, context and availability.

How to choose

Do not only look at rank #1

Start with the leaderboard closest to your task. Compare the top models by score and sample size, then check price, context length, open or closed access, and provider availability.

FAQ

FAQ

图像描述排行榜看什么指标?

主要看排名、百分制分数、样本量和来源。分数用于快速比较同一榜单内模型表现,样本量用于判断结果稳定性。

为什么不同榜单不能直接混合成总分?

不同榜单的任务、样本和评测口径不同,模力榜默认只在同一榜单内排序,避免把写作、代码、图像等能力强行合并。

图像描述模型应该怎么选?

优先看与你任务最接近的榜单,再结合价格、上下文长度、开源闭源和厂商可用性。排名靠前不代表适合所有预算和部署方式。

榜单多久更新?

页面展示的是最新成功采集的公开榜单数据。当前优先使用 LMArena leaderboard dataset,并在页面来源中保留原始链接。