OA

OpenAI OSS

gpt-oss-120b

GPT-OSS-120B

OpenAI open-weight MoE with pragmatic pricing for general chat, agents, and product workflows.

Public model detailMoE Transformer

Params

120B MoE

Context

131K

Max Output

32K

License

Apache 2.0

TTFT

310ms

Throughput

72 tok/s

Why pick it

  • Open-weight OpenAI family model
  • Cheap enough for broad deployment

Pricing

TierStandardCachedSiliconFlowSavings
Realtime$0.02 / $0.15$0.007$0.05 / $0.4560%
Batch$0.01 / $0.07$0.007$0.05 / $0.4560%

Quick start

OpenAI-compatible surface. Swap the base URL and ship.

Python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.luminapath.tech/v1",
    api_key="BATCHIN_API_KEY"
)

resp = client.chat.completions.create(
    model="gpt-oss-120b",
    messages=[{"role": "user", "content": "Summarize why this model is a fit for my workload."}]
)

print(resp.choices[0].message.content)
JavaScript
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.luminapath.tech/v1",
  apiKey: process.env.BATCHIN_API_KEY,
});

const resp = await client.chat.completions.create({
  model: "gpt-oss-120b",
  messages: [{ role: "user", content: "Summarize why this model is a fit for my workload." }],
});

console.log(resp.choices[0]?.message?.content);
cURL
curl https://api.luminapath.tech/v1/chat/completions \
  -H "Authorization: Bearer $BATCHIN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-oss-120b",
    "messages": [{"role":"user","content":"Summarize why this model is a fit for my workload."}]
  }'

Specs

Architecture

MoE Transformer

Vendor group

OpenAI

Context window

131K

Max output

32K

Best for

open-source
general

Related models

Back to model center
OA

OpenAI OSS

gpt-oss-20b

GPT-OSS-20B

Compact OpenAI open-weight option for fast chat, routing, and lower-cost product features.

View detail
Qwen

Alibaba

qwen3.5-27b

Qwen3.5-27B

Lean Qwen route aimed at lower-cost chat, agent routing, and product copilot features.

View detail
BatchIn

StepFun

step-3.5-flash

Step-3.5-Flash

High-traffic StepFun flash model tuned for cheap fast inference and agent loops.

View detail
Z.ai

Z.ai

glm-5.1

GLM-5.1

Open-source coding flagship built for long-horizon autonomous engineering and deep reasoning.

View detail