⚡ Zen LM
Models

zen3-omni

Hypermodal ~200B dense model supporting text, vision, and audio. 202K context.

zen3-omni

Hypermodal

Multimodal model supporting text, vision, and audio. A ~200B dense multimodal architecture with 202K context.

Specifications

PropertyValue
Model IDzen3-omni
Parameters~200B
ArchitectureDense Multimodal
Context Window202K tokens
ModalitiesText, Vision, Audio
Tierpro max
Input Price$1.80 / 1M tokens
Output Price$6.60 / 1M tokens

Capabilities

  • Text generation and understanding
  • Image analysis and visual reasoning
  • Audio and speech processing
  • Cross-modal reasoning
  • Structured output generation
  • 202K context window

API Usage

curl https://api.hanzo.ai/v1/chat/completions \
  -H "Authorization: Bearer $HANZO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zen3-omni",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "What is in this image?"},
        {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
      ]
    }]
  }'
from hanzoai import Hanzo

client = Hanzo(api_key="hk-your-api-key")

response = client.chat.completions.create(
    model="zen3-omni",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this image."},
            {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}},
        ],
    }],
)
print(response.choices[0].message.content)

See Also

  • zen3-vl -- Vision-language only, lower cost
  • zen4 -- Text-only flagship
  • Pricing -- Full pricing table

On this page