Qwen

Alibaba

qwen3.5-4b

Qwen3.5-4B

Ultra-light Qwen tier for prototypes, evals, guardrails, and developer onboarding flows.

FREE tierPublic model detailDense Transformer

Params

4B

Context

256K

Max Output

16K

License

Apache 2.0

TTFT

160ms

Throughput

120 tok/s

Why pick it

  • Useful for free-tier and routing experiments
  • Lower capability, very low cost

Pricing

TierStandardCachedSiliconFlowSavings
Realtime$0.00 / $0.00N/AN/AN/A
Batch$0.00 / $0.00N/AN/AN/A
Free tier model.

Quick start

OpenAI-compatible surface. Swap the base URL and ship.

Python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.luminapath.tech/v1",
    api_key="BATCHIN_API_KEY"
)

resp = client.chat.completions.create(
    model="qwen3.5-4b",
    messages=[{"role": "user", "content": "Summarize why this model is a fit for my workload."}]
)

print(resp.choices[0].message.content)
JavaScript
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.luminapath.tech/v1",
  apiKey: process.env.BATCHIN_API_KEY,
});

const resp = await client.chat.completions.create({
  model: "qwen3.5-4b",
  messages: [{ role: "user", content: "Summarize why this model is a fit for my workload." }],
});

console.log(resp.choices[0]?.message?.content);
cURL
curl https://api.luminapath.tech/v1/chat/completions \
  -H "Authorization: Bearer $BATCHIN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.5-4b",
    "messages": [{"role":"user","content":"Summarize why this model is a fit for my workload."}]
  }'

Specs

Architecture

Dense Transformer

Vendor group

Qwen

Context window

256K

Max output

16K

Best for

qwen
free

Related models

Back to model center
Qwen

Alibaba

qwen3.5-9b

Qwen3.5-9B

Compact long-context Qwen option for cost-sensitive API traffic and routing layers.

View detail
OA

OpenAI OSS

gpt-oss-20b

GPT-OSS-20B

Compact OpenAI open-weight option for fast chat, routing, and lower-cost product features.

View detail
BatchIn

StepFun

step-3.5-flash

Step-3.5-Flash

High-traffic StepFun flash model tuned for cheap fast inference and agent loops.

View detail
Z.ai

Z.ai

glm-5.1

GLM-5.1

Open-source coding flagship built for long-horizon autonomous engineering and deep reasoning.

View detail