Operator benchmark hub

Benchmarks technical buyers can actually use

Put runtime shape, throughput, public pricing, batch economics, and SiliconFlow deltas on one page before you move traffic into a route.

Open model hub See competitor comparisons

Public benchmarked routes

Only live, priced routes are included

Median interactive TTFT

N/A

Only routes with observed traffic in the current 5-minute window are counted

Max savings vs SiliconFlow

75%

Shown only where both list prices exist

Start from the workload, not the vendor logo

Shortlists organized around actual deployment questions

The shortlists below are organized around the questions real buyers ask first: coding, reasoning, batch economics, multimodal throughput, and enterprise fit.

Pair this with pricing source of truth

Coding & agents

Long-horizon engineering loops, tool use, and agent flows.

MiniMax

MiniMax M2.5

Multimodal by default

SWE-Bench 80.2%

$0.100 / $0.400

TTFT N/A

Z.ai

GLM-5.1

8-hour autonomous coding loops

SWE-Bench Pro #1

$0.500 / $1.500

TTFT N/A

Z.ai

GLM-4.7

Cheaper GLM lane with strong coding profile

SWE-bench 73.8%

$0.150 / $0.800

TTFT N/A

Reasoning

Planning, math, research, and harder multi-step analysis.

Z.ai

GLM-5.1

8-hour autonomous coding loops

SWE-Bench Pro #1

$0.500 / $1.500

TTFT N/A

Z.ai

GLM-4.7

Cheaper GLM lane with strong coding profile

SWE-bench 73.8%

$0.150 / $0.800

TTFT N/A

DeepSeek

DeepSeek R1

Reasoning-first chain quality

o1-class reasoning

$0.180 / $0.600

TTFT N/A

Multimodal & media

Vision, OCR, image, video, and audio routes.

CosyVoice

CosyVoice2-0.5B

About one-third of SiliconFlow audio pricing

$5.00 / 1M bytes

TTFT N/A

Fish Audio

Fish-Speech-1.5

Strong expressive voice output

$4.00 / 1M bytes

TTFT N/A

Black Forest Labs

FLUX.1-dev

Fast turnaround for creative teams

$0.003 / image

TTFT N/A

Batch & retrieval

Batch jobs, RAG, OCR, embedding, and document pipelines.

Baidu

PaddleOCR-VL-1.5

Good fit for document AI

FREE

$0.000 / $0.000

TTFT N/A

DeepSeek

DeepSeek V3

Cheap enough for bulk inference

$0.080 / $0.280

TTFT N/A

DeepSeek

DeepSeek V3.1 Terminus

Longer completion-heavy workflows

$0.100 / $0.350

TTFT N/A

Fastest interactive routes

Best for support, copilots, agent conversations, and any workload where first-token feel is visible to users.

No routed samples were observed in the current 5-minute window.

Lowest batch economics

Built for high-volume offline work, document pipelines, RAG preparation, and lower-cost nightly throughput.

1Alibaba
Qwen3.5-4B
Text inference
Batch price
$0.000
2Baidu
PaddleOCR-VL-1.5
Vision multimodal
Batch price
$0.000
3OpenAI OSS
GPT-OSS-20B
Text inference
Batch price
$0.005
4StepFun
Step-3.5-Flash
Text inference
Batch price
$0.005
5Xiaomi
MiMo-V2-Flash
Text inference
Batch price
$0.005

Longest context routes

Better fit for long dossiers, policy packets, audit corpora, and structured multi-document reasoning.

1NVIDIA
Nemotron 3 Super
Text inference
Context
1M
2Meta
Llama 4 Maverick
Text inference
Context
1M
3Alibaba
Qwen3-32B
Text inference
Context
256K
4Alibaba
Qwen3.5-397B-A17B
Text inference
Context
256K
5Alibaba
Qwen3.5-122B-A10B
Text inference
Context
256K

Largest SiliconFlow delta

The fastest way to screen routes when your team already knows the comparison baseline.

1OpenAI OSS
GPT-OSS-20B
Text inference
Savings
75% lower
2DeepSeek
DeepSeek V3
Text inference
Savings
70% lower
3MiniMax
MiniMax M2.5
Vision multimodal
Savings
67% lower
4CosyVoice
CosyVoice2-0.5B
Speech & audio
Savings
67% lower
5Fish Audio
Fish-Speech-1.5
Speech & audio
Savings
67% lower

Public route benchmark matrix

Put every public route into one working matrix

Use this core matrix to build the shortlist first, then continue into the model hub for the full live catalog when product, procurement, and platform teams need the next cut.

Competitor compare pages Bring proof to the team

Route	Lane	TTFT	5m RPM	Context	Standard	Batch	Cached	Vs SF
DeepSeek DeepSeek R1 DeepSeek	Text inference	N/A	N/A	160K	$0.180 / $0.600	$0.090 / $0.300	$0.063	64% lower $0.180
DeepSeek DeepSeek V3 DeepSeek	Text inference	N/A	N/A	160K	$0.080 / $0.280	$0.040 / $0.140	$0.028	70% lower $0.080
DeepSeek DeepSeek V3.1 Terminus DeepSeek	Text inference	N/A	N/A	160K	$0.100 / $0.350	$0.050 / $0.175	$0.035	63% lower $0.100
DeepSeek DeepSeek V3.2 DeepSeek	Text inference	N/A	N/A	160K	$0.100 / $0.150	$0.050 / $0.075	$0.035	63% lower $0.100
Mistral Devstral 2 Mistral	Text inference	N/A	N/A	128K	$0.060 / $0.250	$0.030 / $0.125	$0.021	N/A $0.060
Baidu ERNIE 4.5-300B Baidu	Text inference	N/A	N/A	131K	$0.100 / $0.380	$0.050 / $0.190	$0.035	N/A $0.100
Z.ai GLM-4.7 Z.ai	Text inference	N/A	N/A	198K	$0.150 / $0.800	$0.075 / $0.400	$0.052	64% lower $0.150
Z.ai GLM-5 Z.ai	Text inference	N/A	N/A	198K	$0.350 / $0.900	$0.175 / $0.450	$0.122	63% lower $0.350
Z.ai GLM-5.1 Z.ai	Text inference	N/A	N/A	198K	$0.500 / $1.500	$0.250 / $0.750	$0.175	64% lower $0.500
OpenAI OSS GPT-OSS-120B OpenAI OSS	Text inference	N/A	N/A	131K	$0.020 / $0.150	$0.010 / $0.075	$0.007	60% lower $0.020

Methodology

Use this page to narrow the field before heavier evaluation

TTFT and throughput come from the authenticated `/v1/status/benchmarks` 5-minute routed window. When no recent traffic exists, the UI shows N/A instead of falling back to estimates.
Standard, batch, and cached prices come from the current public catalog; multimodal routes keep their native billing units.
SiliconFlow deltas appear only when a route has a directly comparable public price on both sides.

BatchIn

Need a benchmark pack tied to your real workload?

BatchIn can map routes against your traffic shape, procurement boundary, batch volume, and dedicated-capacity plans so the shortlist is deployment-ready.

Proof chain

Turn the shortlist into a rollout thread, not just a score table

Public benchmarks are good for narrowing the field, not for replacing rollout review. Bring one inspectable request before the sales, support, or procurement thread starts.

Keep one X-Request-Id that already reflects the target workload, use request lookup to reopen the route reason, settled cost, and cache boundaries, and only then move the winning routes into batch, white-label, or dedicated-capacity review.

Best paired with this motion

Validate publicly first with trial and load checks

Then move into batch, white-label, or dedicated capacity

Only then finalize SLA, procurement, and rollout boundaries

Bring proof into the shortlist review Continue into model detail Open docs proof handoff Open request lookup guide Review trust hub