Skip to content

Chat completions

Generates a model response from a conversation. Compatible with OpenAI’s POST /v1/chat/completions.

POST https://api.norlen.io/v1/chat/completions
Authorization: Bearer YOUR_API_KEY
Content-Type: application/json
FieldTypeRequiredDescription
modelstringyesModel id, e.g. qwen3.6-35b, qwen3-coder, gemma-4-12b
messagesarrayyesList of { role, content } messages. rolesystem, user, assistant
temperaturenumbernoRandomness (0–2). Default 1
top_pnumbernoNucleus sampling (0–1)
max_tokensintegernoMaximum tokens in the response
streambooleannoIf true, sends the response token by token via SSE
stopstring | arraynoSequence(s) that halt generation
frequency_penaltynumbernoPenalizes repetition (-2 to 2)
presence_penaltynumbernoEncourages new topics (-2 to 2)
seedintegernoMakes the output more reproducible
Terminal window
curl https://api.norlen.io/v1/chat/completions \
-H "Authorization: Bearer $NORLEN_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.6-35b",
"messages": [
{"role": "system", "content": "You are a concise assistant."},
{"role": "user", "content": "What is RAG in one sentence?"}
],
"temperature": 0.7
}'
{
"id": "chatcmpl-...",
"object": "chat.completion",
"created": 1750000000,
"model": "qwen3.6-35b",
"choices": [
{
"index": 0,
"message": { "role": "assistant", "content": "RAG combines document retrieval with generation..." },
"finish_reason": "stop"
}
],
"usage": { "prompt_tokens": 32, "completion_tokens": 28, "total_tokens": 60 }
}

With stream: true, the response arrives in chunks as Server-Sent Events. Each chunk carries a delta; the stream ends with data: [DONE].

stream = client.chat.completions.create(
model="qwen3.6-35b",
messages=[{"role": "user", "content": "Write a haiku about infrastructure."}],
stream=True,
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="", flush=True)