Chat completions

Generates a model response from a conversation. Compatible with OpenAI’s POST /v1/chat/completions.

POST https://api.norlen.io/v1/chat/completions
Authorization: Bearer YOUR_API_KEY
Content-Type: application/json

Parameters

Field	Type	Required	Description
`model`	string	yes	Model `id`, e.g. `qwen3.6-35b`, `qwen3-coder`, `gemma-4-12b`
`messages`	array	yes	List of `{ role, content }` messages. `role` ∈ `system`, `user`, `assistant`
`temperature`	number	no	Randomness (0–2). Default `1`
`top_p`	number	no	Nucleus sampling (0–1)
`max_tokens`	integer	no	Maximum tokens in the response
`stream`	boolean	no	If `true`, sends the response token by token via SSE
`stop`	string \| array	no	Sequence(s) that halt generation
`frequency_penalty`	number	no	Penalizes repetition (-2 to 2)
`presence_penalty`	number	no	Encourages new topics (-2 to 2)
`seed`	integer	no	Makes the output more reproducible

Example

curl https://api.norlen.io/v1/chat/completions \
  -H "Authorization: Bearer $NORLEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.6-35b",
    "messages": [
      {"role": "system", "content": "You are a concise assistant."},
      {"role": "user", "content": "What is RAG in one sentence?"}
    ],
    "temperature": 0.7
  }'

from openai import OpenAI

client = OpenAI(base_url="https://api.norlen.io/v1", api_key="your-token")

resp = client.chat.completions.create(
    model="qwen3.6-35b",
    messages=[
        {"role": "system", "content": "You are a concise assistant."},
        {"role": "user", "content": "What is RAG in one sentence?"},
    ],
    temperature=0.7,
)
print(resp.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.norlen.io/v1",
  apiKey: process.env.NORLEN_API_KEY,
});

const resp = await client.chat.completions.create({
  model: "qwen3.6-35b",
  messages: [
    { role: "system", content: "You are a concise assistant." },
    { role: "user", content: "What is RAG in one sentence?" },
  ],
  temperature: 0.7,
});
console.log(resp.choices[0].message.content);

Response

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1750000000,
  "model": "qwen3.6-35b",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "RAG combines document retrieval with generation..." },
      "finish_reason": "stop"
    }
  ],
  "usage": { "prompt_tokens": 32, "completion_tokens": 28, "total_tokens": 60 }
}

Streaming

With stream: true, the response arrives in chunks as Server-Sent Events. Each chunk carries a delta; the stream ends with data: [DONE].

stream = client.chat.completions.create(
    model="qwen3.6-35b",
    messages=[{"role": "user", "content": "Write a haiku about infrastructure."}],
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

const stream = await client.chat.completions.create({
  model: "qwen3.6-35b",
  messages: [{ role: "user", content: "Write a haiku about infrastructure." }],
  stream: true,
});
for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

curl https://api.norlen.io/v1/chat/completions \
  -H "Authorization: Bearer $NORLEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen3.6-35b","messages":[{"role":"user","content":"Hello"}],"stream":true}'