⚡ Zen LM
Models

zen3-vl

Vision-language model with 30B (3B active) MoE architecture. 131K context.

zen3-vl

Vision-Language

Vision-language model for image understanding and visual reasoning. 30B total parameters with 3B active via MoE for efficient multimodal inference.

Specifications

PropertyValue
Model IDzen3-vl
Parameters30B (3B active)
ArchitectureMoE Vision-Language
Context Window131K tokens
ModalitiesText, Vision
Tierpro max
Input Price$0.45 / 1M tokens
Output Price$1.80 / 1M tokens

Capabilities

  • Image analysis and visual understanding
  • OCR across 32+ languages
  • Spatial reasoning and object detection
  • Document understanding (charts, tables, diagrams)
  • Function calling for visual agent workflows
  • 131K context window

API Usage

curl https://api.hanzo.ai/v1/chat/completions \
  -H "Authorization: Bearer $HANZO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zen3-vl",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "Read the text in this screenshot."},
        {"type": "image_url", "image_url": {"url": "https://example.com/screenshot.png"}}
      ]
    }]
  }'
from hanzoai import Hanzo

client = Hanzo(api_key="hk-your-api-key")

response = client.chat.completions.create(
    model="zen3-vl",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What does this chart show?"},
            {"type": "image_url", "image_url": {"url": "https://example.com/chart.png"}},
        ],
    }],
)
print(response.choices[0].message.content)

See Also

On this page