zen3-vl

Vision-Language

Vision-language model for image understanding and visual reasoning. 30B total parameters with 3B active via MoE for efficient multimodal inference.

Specifications

Property	Value
Model ID	`zen3-vl`
Parameters	30B (3B active)
Architecture	MoE Vision-Language
Context Window	262K tokens
Modalities	Text, Vision
Tier	pro max
Input Price	$0.45 / 1M tokens
Output Price	$1.80 / 1M tokens

Capabilities

Image analysis and visual understanding
OCR across 32+ languages
Spatial reasoning and object detection
Document understanding (charts, tables, diagrams)
Function calling for visual agent workflows
262K context window

API Usage

curl https://api.hanzo.ai/v1/chat/completions \
  -H "Authorization: Bearer $HANZO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zen3-vl",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "Read the text in this screenshot."},
        {"type": "image_url", "image_url": {"url": "https://example.com/screenshot.png"}}
      ]
    }]
  }'

from hanzoai import Hanzo

client = Hanzo(api_key="hk-your-api-key")

response = client.chat.completions.create(
    model="zen3-vl",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What does this chart show?"},
            {"type": "image_url", "image_url": {"url": "https://example.com/chart.png"}},
        ],
    }],
)
print(response.choices[0].message.content)

zen3-vl

zen3-vl

Specifications

Capabilities

API Usage

See Also

On this page