Chat completions
Generates a model response from a conversation. Compatible with OpenAI’s POST /v1/chat/completions.
POST https://api.norlen.io/v1/chat/completionsAuthorization: Bearer YOUR_API_KEYContent-Type: application/jsonParameters
Section titled “Parameters”| Field | Type | Required | Description |
|---|---|---|---|
model | string | yes | Model id, e.g. qwen3.6-35b, qwen3-coder, gemma-4-12b |
messages | array | yes | List of { role, content } messages. role ∈ system, user, assistant |
temperature | number | no | Randomness (0–2). Default 1 |
top_p | number | no | Nucleus sampling (0–1) |
max_tokens | integer | no | Maximum tokens in the response |
stream | boolean | no | If true, sends the response token by token via SSE |
stop | string | array | no | Sequence(s) that halt generation |
frequency_penalty | number | no | Penalizes repetition (-2 to 2) |
presence_penalty | number | no | Encourages new topics (-2 to 2) |
seed | integer | no | Makes the output more reproducible |
Example
Section titled “Example”curl https://api.norlen.io/v1/chat/completions \ -H "Authorization: Bearer $NORLEN_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "qwen3.6-35b", "messages": [ {"role": "system", "content": "You are a concise assistant."}, {"role": "user", "content": "What is RAG in one sentence?"} ], "temperature": 0.7 }'from openai import OpenAI
client = OpenAI(base_url="https://api.norlen.io/v1", api_key="your-token")
resp = client.chat.completions.create( model="qwen3.6-35b", messages=[ {"role": "system", "content": "You are a concise assistant."}, {"role": "user", "content": "What is RAG in one sentence?"}, ], temperature=0.7,)print(resp.choices[0].message.content)import OpenAI from "openai";
const client = new OpenAI({ baseURL: "https://api.norlen.io/v1", apiKey: process.env.NORLEN_API_KEY,});
const resp = await client.chat.completions.create({ model: "qwen3.6-35b", messages: [ { role: "system", content: "You are a concise assistant." }, { role: "user", content: "What is RAG in one sentence?" }, ], temperature: 0.7,});console.log(resp.choices[0].message.content);Response
Section titled “Response”{ "id": "chatcmpl-...", "object": "chat.completion", "created": 1750000000, "model": "qwen3.6-35b", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "RAG combines document retrieval with generation..." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 32, "completion_tokens": 28, "total_tokens": 60 }}Streaming
Section titled “Streaming”With stream: true, the response arrives in chunks as Server-Sent Events. Each chunk carries a delta; the stream ends with data: [DONE].
stream = client.chat.completions.create( model="qwen3.6-35b", messages=[{"role": "user", "content": "Write a haiku about infrastructure."}], stream=True,)for chunk in stream: print(chunk.choices[0].delta.content or "", end="", flush=True)const stream = await client.chat.completions.create({ model: "qwen3.6-35b", messages: [{ role: "user", content: "Write a haiku about infrastructure." }], stream: true,});for await (const chunk of stream) { process.stdout.write(chunk.choices[0]?.delta?.content ?? "");}curl https://api.norlen.io/v1/chat/completions \ -H "Authorization: Bearer $NORLEN_API_KEY" \ -H "Content-Type: application/json" \ -d '{"model":"qwen3.6-35b","messages":[{"role":"user","content":"Hello"}],"stream":true}'