Models
zen3-vl
Vision-language model with 30B (3B active) MoE architecture. 131K context.
zen3-vl
Vision-Language
Vision-language model for image understanding and visual reasoning. 30B total parameters with 3B active via MoE for efficient multimodal inference.
Specifications
| Property | Value |
|---|---|
| Model ID | zen3-vl |
| Parameters | 30B (3B active) |
| Architecture | MoE Vision-Language |
| Context Window | 131K tokens |
| Modalities | Text, Vision |
| Tier | pro max |
| Input Price | $0.45 / 1M tokens |
| Output Price | $1.80 / 1M tokens |
Capabilities
- Image analysis and visual understanding
- OCR across 32+ languages
- Spatial reasoning and object detection
- Document understanding (charts, tables, diagrams)
- Function calling for visual agent workflows
- 131K context window
API Usage
curl https://api.hanzo.ai/v1/chat/completions \
-H "Authorization: Bearer $HANZO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "zen3-vl",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "Read the text in this screenshot."},
{"type": "image_url", "image_url": {"url": "https://example.com/screenshot.png"}}
]
}]
}'from hanzoai import Hanzo
client = Hanzo(api_key="hk-your-api-key")
response = client.chat.completions.create(
model="zen3-vl",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "What does this chart show?"},
{"type": "image_url", "image_url": {"url": "https://example.com/chart.png"}},
],
}],
)
print(response.choices[0].message.content)