Request proof
Start with X-Request-Id
Streaming output can finish before the final cost and routing metadata are flushed. Keep the request id, then reopen the settled record through request lookup.
These pages are built for real buying decisions: pricing proof, batch control, verifiable inference, GPU paths, and the handoff from prototype to production.
This round covers
Pricing and procurement proof
Clear route pricing, buying boundaries, and rollout posture.
Trust and verification
See whether the platform can actually prove what happened.
Capacity handoff
Know whether you are buying an API, a GPU platform, or a real operator stack.
Reviewed against public product surfaces on April 12, 2026.
Procurement proof chain
Every price, reliability, or rollout claim on these compare pages should map back to the same evidence chain: response headers, request lookup, settled cost truth, and the cache boundary.
Request proof
Streaming output can finish before the final cost and routing metadata are flushed. Keep the request id, then reopen the settled record through request lookup.
Route reason
Every claim on these compare pages should map back to a route reason: local direct, queue spill, upstream fallback, or durable response-cache replay.
Cost truth
`X-BatchIn-Effective-Cost-Cents` is the settled billed truth. `X-BatchIn-Uncached-Cost-Cents` is the counterfactual without cache discounts or replay.
Cache boundary
Prompt-cache discounts still represent a real model invocation. Durable response-cache replay is a separate path and should stay explicit.
Marketplace breadth versus a tighter production boundary for teams graduating from experimentation.
Discovery posture
Curated, priced routes
Broad provider marketplace
Comparison UX
Workload calculator + procurement proof
Rankings, filters, and model compare
Go-live boundary
Audit, batch, and leased GPU in one stack
API aggregation first
A broader AI-native cloud versus a narrower platform tuned for production inference economics and control.
Platform scope
Inference API, batch, audit, leased GPU
Full-stack AI cloud
Batch posture
High / Low / Fill tradeoffs
Serverless plus batch inference lanes
Audit surface
Verifiable inference records
No equivalent public audit product
Selected-route pricing plus auditability, batch control, and GPU-operating boundary.
Selected route pricing
Lower on featured GLM/DeepSeek lanes
Higher public references on same routes
Verification
Ed25519 audit + browser verification
No equivalent public verification layer
Capacity model
Dedicated GPU path with operator control
Reserved GPU without the same operator boundary
Serverless GPU infrastructure versus a productized inference stack with leased capacity when you need it.
Primary abstraction
Inference product + leased GPU
Serverless GPU endpoints
Runtime handoff
Managed API surface, optional SSH-root capacity
You assemble the serving layer
Buyer motion
Prototype to rollout inside one product story
Infrastructure-first with more platform assembly