Public benchmarked routes
33
Only live, priced routes are included
Operator benchmark hub
Put runtime shape, throughput, public pricing, batch economics, and SiliconFlow deltas on one page before you move traffic into a route.
Public benchmarked routes
33
Only live, priced routes are included
Median interactive TTFT
N/A
Only routes with observed traffic in the current 5-minute window are counted
Max savings vs SiliconFlow
75%
Shown only where both list prices exist
Start from the workload, not the vendor logo
The shortlists below are organized around the questions real buyers ask first: coding, reasoning, batch economics, multimodal throughput, and enterprise fit.
Coding & agents
Long-horizon engineering loops, tool use, and agent flows.
Reasoning
Planning, math, research, and harder multi-step analysis.
Multimodal & media
Vision, OCR, image, video, and audio routes.
Batch & retrieval
Batch jobs, RAG, OCR, embedding, and document pipelines.
Fastest interactive routes
Best for support, copilots, agent conversations, and any workload where first-token feel is visible to users.
No routed samples were observed in the current 5-minute window.
Lowest batch economics
Built for high-volume offline work, document pipelines, RAG preparation, and lower-cost nightly throughput.
Qwen3.5-4B
Text inference
Batch price
$0.000
PaddleOCR-VL-1.5
Vision multimodal
Batch price
$0.000
GPT-OSS-20B
Text inference
Batch price
$0.005
Step-3.5-Flash
Text inference
Batch price
$0.005
MiMo-V2-Flash
Text inference
Batch price
$0.005
Longest context routes
Better fit for long dossiers, policy packets, audit corpora, and structured multi-document reasoning.
Nemotron 3 Super
Text inference
Context
1M
Llama 4 Maverick
Text inference
Context
1M
Qwen3-32B
Text inference
Context
256K
Qwen3.5-397B-A17B
Text inference
Context
256K
Qwen3.5-122B-A10B
Text inference
Context
256K
Largest SiliconFlow delta
The fastest way to screen routes when your team already knows the comparison baseline.
GPT-OSS-20B
Text inference
Savings
75% lower
DeepSeek V3
Text inference
Savings
70% lower
MiniMax M2.5
Vision multimodal
Savings
67% lower
CosyVoice2-0.5B
Speech & audio
Savings
67% lower
Fish-Speech-1.5
Speech & audio
Savings
67% lower
Public route benchmark matrix
Use this core matrix to build the shortlist first, then continue into the model hub for the full live catalog when product, procurement, and platform teams need the next cut.
| Route | Lane | TTFT | 5m RPM | Context | Standard | Batch | Cached | Vs SF |
|---|---|---|---|---|---|---|---|---|
DeepSeek DeepSeek R1 DeepSeek | Text inference | N/A | N/A | 160K | $0.180 / $0.600 | $0.090 / $0.300 | $0.063 | 64% lower $0.180 |
DeepSeek DeepSeek V3 DeepSeek | Text inference | N/A | N/A | 160K | $0.080 / $0.280 | $0.040 / $0.140 | $0.028 | 70% lower $0.080 |
DeepSeek DeepSeek V3.1 Terminus DeepSeek | Text inference | N/A | N/A | 160K | $0.100 / $0.350 | $0.050 / $0.175 | $0.035 | 63% lower $0.100 |
DeepSeek DeepSeek V3.2 DeepSeek | Text inference | N/A | N/A | 160K | $0.100 / $0.150 | $0.050 / $0.075 | $0.035 | 63% lower $0.100 |
Mistral Devstral 2 Mistral | Text inference | N/A | N/A | 128K | $0.060 / $0.250 | $0.030 / $0.125 | $0.021 | N/A $0.060 |
Baidu ERNIE 4.5-300B Baidu | Text inference | N/A | N/A | 131K | $0.100 / $0.380 | $0.050 / $0.190 | $0.035 | N/A $0.100 |
Z.ai GLM-4.7 Z.ai | Text inference | N/A | N/A | 198K | $0.150 / $0.800 | $0.075 / $0.400 | $0.052 | 64% lower $0.150 |
Z.ai GLM-5 Z.ai | Text inference | N/A | N/A | 198K | $0.350 / $0.900 | $0.175 / $0.450 | $0.122 | 63% lower $0.350 |
Z.ai GLM-5.1 Z.ai | Text inference | N/A | N/A | 198K | $0.500 / $1.500 | $0.250 / $0.750 | $0.175 | 64% lower $0.500 |
OpenAI OSS GPT-OSS-120B OpenAI OSS | Text inference | N/A | N/A | 131K | $0.020 / $0.150 | $0.010 / $0.075 | $0.007 | 60% lower $0.020 |
Methodology
BatchIn
BatchIn can map routes against your traffic shape, procurement boundary, batch volume, and dedicated-capacity plans so the shortlist is deployment-ready.
Proof chain
Public benchmarks are good for narrowing the field, not for replacing rollout review. Bring one inspectable request before the sales, support, or procurement thread starts.
Keep one X-Request-Id that already reflects the target workload, use request lookup to reopen the route reason, settled cost, and cache boundaries, and only then move the winning routes into batch, white-label, or dedicated-capacity review.
Best paired with this motion
Validate publicly first with trial and load checks
Then move into batch, white-label, or dedicated capacity
Only then finalize SLA, procurement, and rollout boundaries