Operator benchmark hub

Benchmarks technical buyers can actually use

Put runtime shape, throughput, public pricing, batch economics, and SiliconFlow deltas on one page before you move traffic into a route.

Public benchmarked routes

33

Only live, priced routes are included

Median interactive TTFT

N/A

Only routes with observed traffic in the current 5-minute window are counted

Max savings vs SiliconFlow

75%

Shown only where both list prices exist

Start from the workload, not the vendor logo

Shortlists organized around actual deployment questions

The shortlists below are organized around the questions real buyers ask first: coding, reasoning, batch economics, multimodal throughput, and enterprise fit.

Pair this with pricing source of truth

Fastest interactive routes

Best for support, copilots, agent conversations, and any workload where first-token feel is visible to users.

No routed samples were observed in the current 5-minute window.

Lowest batch economics

Built for high-volume offline work, document pipelines, RAG preparation, and lower-cost nightly throughput.

  1. 1Alibaba

    Qwen3.5-4B

    Text inference

    Batch price

    $0.000

  2. 2Baidu

    PaddleOCR-VL-1.5

    Vision multimodal

    Batch price

    $0.000

  3. 3OpenAI OSS

    GPT-OSS-20B

    Text inference

    Batch price

    $0.005

  4. 4StepFun

    Step-3.5-Flash

    Text inference

    Batch price

    $0.005

  5. 5Xiaomi

    MiMo-V2-Flash

    Text inference

    Batch price

    $0.005

Longest context routes

Better fit for long dossiers, policy packets, audit corpora, and structured multi-document reasoning.

  1. 1NVIDIA

    Nemotron 3 Super

    Text inference

    Context

    1M

  2. 2Meta

    Llama 4 Maverick

    Text inference

    Context

    1M

  3. 3Alibaba

    Qwen3-32B

    Text inference

    Context

    256K

  4. 4Alibaba

    Qwen3.5-397B-A17B

    Text inference

    Context

    256K

  5. 5Alibaba

    Qwen3.5-122B-A10B

    Text inference

    Context

    256K

Largest SiliconFlow delta

The fastest way to screen routes when your team already knows the comparison baseline.

  1. 1OpenAI OSS

    GPT-OSS-20B

    Text inference

    Savings

    75% lower

  2. 2DeepSeek

    DeepSeek V3

    Text inference

    Savings

    70% lower

  3. 3MiniMax

    MiniMax M2.5

    Vision multimodal

    Savings

    67% lower

  4. 4CosyVoice

    CosyVoice2-0.5B

    Speech & audio

    Savings

    67% lower

  5. 5Fish Audio

    Fish-Speech-1.5

    Speech & audio

    Savings

    67% lower

Public route benchmark matrix

Put every public route into one working matrix

Use this core matrix to build the shortlist first, then continue into the model hub for the full live catalog when product, procurement, and platform teams need the next cut.

RouteLaneTTFT5m RPMContextStandardBatchCachedVs SF
DeepSeek

DeepSeek R1

DeepSeek

Text inferenceN/AN/A
160K
$0.180 / $0.600$0.090 / $0.300$0.063

64% lower

$0.180

DeepSeek

DeepSeek V3

DeepSeek

Text inferenceN/AN/A
160K
$0.080 / $0.280$0.040 / $0.140$0.028

70% lower

$0.080

DeepSeek

DeepSeek V3.1 Terminus

DeepSeek

Text inferenceN/AN/A
160K
$0.100 / $0.350$0.050 / $0.175$0.035

63% lower

$0.100

DeepSeek

DeepSeek V3.2

DeepSeek

Text inferenceN/AN/A
160K
$0.100 / $0.150$0.050 / $0.075$0.035

63% lower

$0.100

Mistral

Devstral 2

Mistral

Text inferenceN/AN/A
128K
$0.060 / $0.250$0.030 / $0.125$0.021

N/A

$0.060

Baidu

ERNIE 4.5-300B

Baidu

Text inferenceN/AN/A
131K
$0.100 / $0.380$0.050 / $0.190$0.035

N/A

$0.100

Z.ai

GLM-4.7

Z.ai

Text inferenceN/AN/A
198K
$0.150 / $0.800$0.075 / $0.400$0.052

64% lower

$0.150

Z.ai

GLM-5

Z.ai

Text inferenceN/AN/A
198K
$0.350 / $0.900$0.175 / $0.450$0.122

63% lower

$0.350

Z.ai

GLM-5.1

Z.ai

Text inferenceN/AN/A
198K
$0.500 / $1.500$0.250 / $0.750$0.175

64% lower

$0.500

OpenAI OSS

GPT-OSS-120B

OpenAI OSS

Text inferenceN/AN/A
131K
$0.020 / $0.150$0.010 / $0.075$0.007

60% lower

$0.020

Methodology

Use this page to narrow the field before heavier evaluation

  • TTFT and throughput come from the authenticated `/v1/status/benchmarks` 5-minute routed window. When no recent traffic exists, the UI shows N/A instead of falling back to estimates.
  • Standard, batch, and cached prices come from the current public catalog; multimodal routes keep their native billing units.
  • SiliconFlow deltas appear only when a route has a directly comparable public price on both sides.

BatchIn

Need a benchmark pack tied to your real workload?

BatchIn can map routes against your traffic shape, procurement boundary, batch volume, and dedicated-capacity plans so the shortlist is deployment-ready.

Proof chain

Turn the shortlist into a rollout thread, not just a score table

Public benchmarks are good for narrowing the field, not for replacing rollout review. Bring one inspectable request before the sales, support, or procurement thread starts.

Keep one X-Request-Id that already reflects the target workload, use request lookup to reopen the route reason, settled cost, and cache boundaries, and only then move the winning routes into batch, white-label, or dedicated-capacity review.

Best paired with this motion

Validate publicly first with trial and load checks

Then move into batch, white-label, or dedicated capacity

Only then finalize SLA, procurement, and rollout boundaries

AI Assistant