Chat Completions
Generate text with any of the 14 Zen models using the OpenAI-compatible chat completions endpoint
Chat Completions
Generate text responses using any Zen model.
Endpoint
POST https://api.hanzo.ai/v1/chat/completionsRequest Body
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model ID (e.g., zen4, zen4-coder, zen3-omni) |
messages | array | Yes | Array of message objects with role and content |
temperature | float | No | Sampling temperature (0-2). Default: 1.0 |
top_p | float | No | Nucleus sampling. Default: 1.0 |
max_tokens | integer | No | Maximum tokens to generate |
stream | boolean | No | Enable streaming responses. Default: false |
stop | string/array | No | Stop sequences |
frequency_penalty | float | No | Frequency penalty (-2.0 to 2.0). Default: 0 |
presence_penalty | float | No | Presence penalty (-2.0 to 2.0). Default: 0 |
Message Roles
| Role | Description |
|---|---|
system | System prompt / instructions |
user | User message |
assistant | Model response (for multi-turn) |
Example Request
curl https://api.hanzo.ai/v1/chat/completions \
-H "Authorization: Bearer $HANZO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "zen4",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Write a haiku about programming."}
],
"temperature": 0.7,
"max_tokens": 100
}'Example Response
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1708000000,
"model": "zen4",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Lines of logic flow\nBugs hide in silent syntax\nCompile, debug, grow"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 18,
"total_tokens": 43
}
}Streaming
Set stream: true to receive server-sent events:
from hanzoai import Hanzo
client = Hanzo(api_key="hk-your-key")
stream = client.chat.completions.create(
model="zen4",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")Vision (zen3-vl, zen3-omni)
Vision models accept image URLs or base64 images in the content array:
response = client.chat.completions.create(
model="zen3-vl",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
]
}],
)Reasoning (zen4-thinking, zen4-ultra)
Reasoning models show their chain-of-thought process:
response = client.chat.completions.create(
model="zen4-thinking",
messages=[{"role": "user", "content": "Solve: If 2^x = 1024, what is x?"}],
)Code Generation (zen4-coder, zen4-coder-pro, zen4-coder-flash)
Code models support up to 262K context for full-repository understanding:
response = client.chat.completions.create(
model="zen4-coder",
messages=[{"role": "user", "content": "Write a Go HTTP server with graceful shutdown."}],
)Available Models
All 14 Zen models work with this endpoint (except zen3-embedding which uses /v1/embeddings):
| Model | Context | Best For |
|---|---|---|
| zen4 | 202K | General flagship |
| zen4-ultra | 202K | Maximum reasoning |
| zen4-pro | 131K | High capability |
| zen4-max | 131K | Large documents |
| zen4-mini | 40K | Fast and cheap |
| zen4-thinking | 131K | Chain-of-thought |
| zen4-coder | 262K | Code generation |
| zen4-coder-pro | 262K | Premium code |
| zen4-coder-flash | 262K | Fast code |
| zen3-omni | 202K | Multimodal |
| zen3-vl | 131K | Vision-language |
| zen3-nano | 40K | Edge |
| zen3-guard | 40K | Content safety |