Models
zen3-omni
Hypermodal ~200B dense model supporting text, vision, and audio. 202K context.
zen3-omni
Hypermodal
Multimodal model supporting text, vision, and audio. A ~200B dense multimodal architecture with 202K context.
Specifications
| Property | Value |
|---|---|
| Model ID | zen3-omni |
| Parameters | ~200B |
| Architecture | Dense Multimodal |
| Context Window | 202K tokens |
| Modalities | Text, Vision, Audio |
| Tier | pro max |
| Input Price | $1.80 / 1M tokens |
| Output Price | $6.60 / 1M tokens |
Capabilities
- Text generation and understanding
- Image analysis and visual reasoning
- Audio and speech processing
- Cross-modal reasoning
- Structured output generation
- 202K context window
API Usage
curl https://api.hanzo.ai/v1/chat/completions \
-H "Authorization: Bearer $HANZO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "zen3-omni",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "What is in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
]
}]
}'from hanzoai import Hanzo
client = Hanzo(api_key="hk-your-api-key")
response = client.chat.completions.create(
model="zen3-omni",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image."},
{"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}},
],
}],
)
print(response.choices[0].message.content)