Chat · Vision · Homework Leaderboard

Ranking for Vision / Homework, based on public preference data.

Selection guide

Homework model ranking guide

Ranking for Vision / Homework, based on public preference data.

claude-opus-4-7claude-opus-4-7-thinkinggpt-5.5-highgpt-5.5claude-opus-4-6-thinking
Current DirectoryChat · Vision · Homework
Models78
Published2026/05/18
Arena public preference evaluationOriginal leaderboard: Vision / HomeworkPublished: 2026/05/18Leaderboard dataset: LMArena latest parquetOpen Arena sourceOpen leaderboard dataset
1
claude-opus-4-7
Anthropic
100.0
1.1K
1M
¥36 / ¥180Input/Output
2
claude-opus-4-7-thinking
Anthropic
98.7
1.1K
1M
¥36 / ¥180Input/Output
3
gpt-5.5-high
Openai
97.4
743
1.05M
¥36 / ¥216Input/Output
4
gpt-5.5
Openai
96.1
788
1.05M
¥36 / ¥216Input/Output
5
claude-opus-4-6-thinking
Anthropic
94.8
1.2K
1M
¥36 / ¥180Input/Output
6
gpt-5.4
Openai
93.5
1K
1.05M
¥18 / ¥108Input/Output
7
claude-opus-4-6
Anthropic
92.2
1.4K
1M
¥36 / ¥180Input/Output
8
gemini-3.1-pro-preview
Google
90.9
2.6K
1.05M
¥14.4 / ¥86.4Input/Output
9
kimi-k2.6
Moonshot
89.6
937
262K
¥6.84 / ¥28.8Input/Output
10
gemini-3-flash
Google
88.3
3.2K
1.05M
¥3.6 / ¥21.6Input/Output
11
gemini-3-pro
Google
87.0
1.8K
1.05M
¥14.4 / ¥86.4Input/Output
12
gpt-5.4-high
Openai
85.7
951
1.05M
¥18 / ¥108Input/Output
13
claude-sonnet-4-6
Anthropic
84.4
1.5K
1M
¥21.6 / ¥108Input/Output
14
gemma-4-26b-a4b
Google
83.1
1.9K
262K
¥0.94 / ¥2.88Input/Output
15
gemma-4-31b
Google
81.8
3.1K
262K
¥3.24 / ¥7.2Input/Output
16
gemini-3-flash (thinking-minimal)
Google
80.5
2.9K
1.05M
¥3.6 / ¥21.6Input/Output
17
muse-spark
Meta
79.2
782
-
-
18
kimi-k2.5-thinking
Moonshot
77.9
2.1K
262K
¥4.32 / ¥21.6Input/Output
19
dola-seed-2.0-pro
Bytedance
76.6
1.6K
-
-
20
qwen3.7-plus-preview
Alibaba
75.3
587
131K
¥3.6 / ¥21.6Input/Output
21
glm-5v-turbo
Zai
74.0
1.2K
200K
¥0 / ¥0Input/Output
22
gpt-5.2-chat-latest-20260210
Openai
72.7
2K
400K
¥12.6 / ¥101Input/Output
23
qwen3.5-397b-a17b
Alibaba
71.4
1.7K
262K
¥3.1 / ¥18.6Input/Output
24
gpt-5.4-mini-high
Openai
70.1
1.4K
400K
¥5.4 / ¥32.4Input/Output
25
gpt-5-chat
Openai
68.8
1.4K
400K
¥9 / ¥72Input/Output
26
qwen3-vl-235b-a22b-instruct
Alibaba
67.5
1.6K
128K
¥2.16 / ¥8.64Input/Output
27
gemini-2.5-pro
Google
66.2
5.1K
1.05M
¥9 / ¥72Input/Output
28
gemini-2.5-flash-preview-09-2025
Google
64.9
625
1M
¥2.16 / ¥18Input/Output
29
gpt-5.2-high
Openai
63.6
2.4K
400K
¥12.6 / ¥101Input/Output
30
gemini-3.1-flash-lite-preview
Google
62.3
2.2K
1.05M
¥1.8 / ¥10.8Input/Output
31
gpt-5.1-high
Openai
61.0
1.2K
400K
¥9 / ¥72Input/Output
32
grok-4.20-multi-agent-beta-0309
Xai
59.7
1.4K
2M
¥14.4 / ¥43.2Input/Output
33
gpt-5.5-instant
Openai
58.4
609
400K
¥9 / ¥72Input/Output
34
qwen3.5-122b-a10b
Alibaba
57.1
1.5K
262K
¥2.88 / ¥23Input/Output
35
gpt-5.2
Openai
55.8
2.5K
400K
¥12.6 / ¥101Input/Output
36
kimi-k2.5-instant
Moonshot
54.5
564
262K
¥4.32 / ¥21.6Input/Output
37
qwen3.5-27b
Alibaba
53.2
1.4K
262K
¥2.16 / ¥17.3Input/Output
38
grok-4.3
Xai
51.9
633
1M
¥9 / ¥18Input/Output
39
grok-4.20-beta-0309-reasoning
Xai
50.6
1.5K
2M
¥14.4 / ¥43.2Input/Output
40
mimo-v2.5
Xiaomi
49.4
1.3K
1.05M
¥2.88 / ¥14.4Input/Output
41
ernie-5.0-preview-1220
Baidu
48.1
360
128K
¥7.92 / ¥14.4Input/Output
42
chatgpt-4o-latest-20250326
Openai
46.8
2.2K
128K
¥18 / ¥72Input/Output
43
mimo-v2-omni
Xiaomi
45.5
1.2K
262K
¥2.88 / ¥14.4Input/Output
44
hunyuan-vision-1.5-thinking
Tencent
44.2
313
-
-
45
gpt-4.1-2025-04-14
Openai
42.9
1.5K
1.05M
¥14.4 / ¥57.6Input/Output
46
gemini-2.5-flash
Google
41.6
4.6K
1.05M
¥2.16 / ¥18Input/Output
47
gpt-5.1
Openai
40.3
1.4K
400K
¥9 / ¥72Input/Output
48
qwen3-vl-235b-a22b-thinking
Alibaba
39.0
314
131K
¥2.06 / ¥8.26Input/Output
49
o3-2025-04-16
Openai
37.7
2K
200K
¥14.4 / ¥57.6Input/Output
50
o4-mini-2025-04-16
Openai
36.4
1.7K
200K
¥7.92 / ¥31.7Input/Output
51
gpt-5-high
Openai
35.1
1.5K
400K
¥9 / ¥72Input/Output
52
claude-opus-4-20250514
Anthropic
33.8
336
200K
¥108 / ¥540Input/Output
53
gpt-5.4-nano-high
Openai
32.5
1.4K
400K
¥1.44 / ¥9Input/Output
54
gpt-4.1-mini-2025-04-14
Openai
31.2
1.4K
1.05M
¥2.88 / ¥11.5Input/Output
55
gemini-2.5-flash-lite-preview-09-2025-no-thinking
Google
29.9
625
1.05M
¥0.72 / ¥2.88Input/Output
56
claude-sonnet-4-20250514
Anthropic
28.6
245
200K
¥21.6 / ¥108Input/Output
57
gpt-5-mini-high
Openai
27.3
1K
400K
¥1.8 / ¥14.4Input/Output
58
claude-opus-4-20250514-thinking-16k
Anthropic
26.0
170
200K
¥108 / ¥540Input/Output
59
claude-3-7-sonnet-20250219
Anthropic
24.7
241
200K
¥21.6 / ¥108Input/Output
60
gemini-2.5-flash-lite-preview-06-17-thinking
Google
23.4
1.4K
65.5K
¥0.72 / ¥2.88Input/Output
61
step-3
Stepfun
22.1
228
65.5K
¥1.8 / ¥4.68Input/Output
62
claude-3-7-sonnet-20250219-thinking-32k
Anthropic
20.8
184
-
-
63
gemini-2.0-flash-001
Google
19.5
792
1.05M
¥1.08 / ¥4.32Input/Output
64
mistral-small-2506
Mistral
18.2
632
262K
¥2.88 / ¥14.4Input/Output
65
grok-4-0709
Xai
16.9
1.1K
256K
¥21.6 / ¥108Input/Output
66
grok-4-1-fast-reasoning
Xai
15.6
1.7K
2M
¥1.44 / ¥3.6Input/Output
67
glm-4.5v
Zai
14.3
223
64K
¥4.32 / ¥13Input/Output
68
gemma-3-27b-it
Google
13.0
1.1K
128K
¥2.15 / ¥2.15Input/Output
69
mistral-medium-2508
Mistral
11.7
2.2K
262K
¥2.88 / ¥14.4Input/Output
70
mistral-medium-2505
Mistral
10.4
846
262K
¥2.88 / ¥14.4Input/Output
71
llama-4-maverick-17b-128e-instruct
Meta
9.1
549
1M
¥1.8 / ¥6.26Input/Output
72
gpt-5-nano-high
Openai
7.8
272
400K
¥0.36 / ¥2.88Input/Output
73
glm-4.6v
Zai
6.5
282
128K
¥2.16 / ¥6.48Input/Output
74
step-1o-turbo-202506
Stepfun
5.2
251
-
-
75
mistral-small-3.1-24b-instruct-2503
Mistral
3.9
843
262K
¥2.88 / ¥14.4Input/Output
76
llama-4-scout-17b-16e-instruct
Meta
2.6
500
128K
¥1.44 / ¥5.62Input/Output
77
claude-3-5-sonnet-20241022
Anthropic
1.3
226
200K
¥21.6 / ¥108Input/Output
78
claude-3-5-haiku-20241022
Anthropic
0.0
232
200K
¥5.76 / ¥28.8Input/Output
Top model analysis

claude-opus-4-7 why it ranks first

claude-opus-4-7 ranks first with a percent score of 100.0 and 1.1K samples. Use it as the first option for this leaderboard, then compare price, context and availability.

How to choose

Do not only look at rank #1

Start with the leaderboard closest to your task. Compare the top models by score and sample size, then check price, context length, open or closed access, and provider availability.

FAQ

FAQ

作业辅导排行榜看什么指标?

主要看排名、百分制分数、样本量和来源。分数用于快速比较同一榜单内模型表现,样本量用于判断结果稳定性。

为什么不同榜单不能直接混合成总分?

不同榜单的任务、样本和评测口径不同,模力榜默认只在同一榜单内排序,避免把写作、代码、图像等能力强行合并。

作业辅导模型应该怎么选?

优先看与你任务最接近的榜单,再结合价格、上下文长度、开源闭源和厂商可用性。排名靠前不代表适合所有预算和部署方式。

榜单多久更新?

页面展示的是最新成功采集的公开榜单数据。当前优先使用 LMArena leaderboard dataset,并在页面来源中保留原始链接。