Chat · Vision · Creative Writing Leaderboard

Ranking for Vision / Creative Writing, based on public preference data.

Selection guide

Creative Writing model ranking guide

Ranking for Vision / Creative Writing, based on public preference data.

gemini-3-proernie-5.0-preview-1220gemini-3-flashgpt-5.1gemini-2.5-flash-preview-09-2025
Current DirectoryChat · Vision · Creative Writing
Models34
Published2026/01/09
Arena public preference evaluationOriginal leaderboard: Vision / Creative WritingPublished: 2026/01/09Leaderboard dataset: LMArena latest parquetOpen Arena sourceOpen leaderboard dataset
1
gemini-3-pro
Google
100.0
766
1.05M
¥14.4 / ¥86.4Input/Output
2
ernie-5.0-preview-1220
Baidu
97.0
176
128K
¥7.92 / ¥14.4Input/Output
3
gemini-3-flash
Google
93.9
217
1.05M
¥3.6 / ¥21.6Input/Output
4
gpt-5.1
Openai
90.9
411
400K
¥9 / ¥72Input/Output
5
gemini-2.5-flash-preview-09-2025
Google
87.9
555
1M
¥2.16 / ¥18Input/Output
6
gpt-5.1-high
Openai
84.8
385
400K
¥9 / ¥72Input/Output
7
gemini-2.5-pro
Google
81.8
2.4K
1.05M
¥9 / ¥72Input/Output
8
grok-4-0709
Xai
78.8
1.5K
256K
¥21.6 / ¥108Input/Output
9
claude-opus-4-20250514
Anthropic
75.8
192
200K
¥108 / ¥540Input/Output
10
gemini-2.5-flash
Google
72.7
1.7K
1.05M
¥2.16 / ¥18Input/Output
11
chatgpt-4o-latest-20250326
Openai
69.7
1K
128K
¥18 / ¥72Input/Output
12
gpt-5-chat
Openai
66.7
1.5K
400K
¥9 / ¥72Input/Output
13
qwen3-vl-235b-a22b-thinking
Alibaba
63.6
219
131K
¥2.06 / ¥8.26Input/Output
14
qwen3-vl-235b-a22b-instruct
Alibaba
60.6
690
128K
¥2.16 / ¥8.64Input/Output
15
mistral-medium-2508
Mistral
57.6
1.4K
262K
¥2.88 / ¥14.4Input/Output
16
gpt-4.1-2025-04-14
Openai
54.5
1.4K
1.05M
¥14.4 / ¥57.6Input/Output
17
gemini-2.5-flash-lite-preview-09-2025-no-thinking
Google
51.5
565
1.05M
¥0.72 / ¥2.88Input/Output
18
gemini-2.0-flash-001
Google
48.5
292
1.05M
¥1.08 / ¥4.32Input/Output
19
o3-2025-04-16
Openai
45.5
1.9K
200K
¥14.4 / ¥57.6Input/Output
20
claude-sonnet-4-20250514
Anthropic
42.4
156
200K
¥21.6 / ¥108Input/Output
21
gpt-5-high
Openai
39.4
1.6K
400K
¥9 / ¥72Input/Output
22
gemini-2.5-flash-lite-preview-06-17-thinking
Google
36.4
1.3K
65.5K
¥0.72 / ¥2.88Input/Output
23
mistral-small-2506
Mistral
33.3
442
262K
¥2.88 / ¥14.4Input/Output
24
qwen-vl-max-2025-08-13
Alibaba
30.3
191
131K
¥1.66 / ¥4.13Input/Output
25
gemma-3-27b-it
Google
27.3
732
128K
¥2.15 / ¥2.15Input/Output
26
gpt-5-mini-high
Openai
24.2
1.1K
400K
¥1.8 / ¥14.4Input/Output
27
hunyuan-vision-1.5-thinking
Tencent
21.2
275
-
-
28
mistral-medium-2505
Mistral
18.2
518
262K
¥2.88 / ¥14.4Input/Output
29
gpt-4.1-mini-2025-04-14
Openai
15.2
1.3K
1.05M
¥2.88 / ¥11.5Input/Output
30
o4-mini-2025-04-16
Openai
12.1
1.4K
200K
¥7.92 / ¥31.7Input/Output
31
mistral-small-3.1-24b-instruct-2503
Mistral
9.1
777
262K
¥2.88 / ¥14.4Input/Output
32
llama-4-scout-17b-16e-instruct
Meta
6.1
265
128K
¥1.44 / ¥5.62Input/Output
33
gpt-5-nano-high
Openai
3.0
206
400K
¥0.36 / ¥2.88Input/Output
34
llama-4-maverick-17b-128e-instruct
Meta
0.0
250
1M
¥1.8 / ¥6.26Input/Output
Top model analysis

gemini-3-pro why it ranks first

gemini-3-pro ranks first with a percent score of 100.0 and 766 samples. Use it as the first option for this leaderboard, then compare price, context and availability.

How to choose

Do not only look at rank #1

Start with the leaderboard closest to your task. Compare the top models by score and sample size, then check price, context length, open or closed access, and provider availability.

FAQ

FAQ

创意写作排行榜看什么指标?

主要看排名、百分制分数、样本量和来源。分数用于快速比较同一榜单内模型表现,样本量用于判断结果稳定性。

为什么不同榜单不能直接混合成总分?

不同榜单的任务、样本和评测口径不同,模力榜默认只在同一榜单内排序,避免把写作、代码、图像等能力强行合并。

创意写作模型应该怎么选?

优先看与你任务最接近的榜单,再结合价格、上下文长度、开源闭源和厂商可用性。排名靠前不代表适合所有预算和部署方式。

榜单多久更新?

页面展示的是最新成功采集的公开榜单数据。当前优先使用 LMArena leaderboard dataset,并在页面来源中保留原始链接。