Video understanding model for frame analysis, captioning, and temporal reasoning.

zen-video

Video Understanding

A video understanding model that analyzes video content frame by frame. Answers questions about video content, generates descriptions, detects actions, and reasons about temporal sequences.

This model is coming soon. Join the waitlist at hanzo.chat.

Specifications

Property	Value
Model ID	`zen-video`
Architecture	Multimodal Transformer
Input	Video (up to 10 minutes)
Status	Coming Soon
HuggingFace	--

Capabilities

Video question answering
Scene description and captioning
Action detection and classification
Temporal reasoning across frames
Key moment extraction
Content moderation for video

Usage

from hanzoai import Hanzo

client = Hanzo(api_key="hk-your-api-key")

# Coming soon
response = client.chat.completions.create(
    model="zen-video",
    messages=[{
        "role": "user",
        "content": [
            {"type": "video_url", "video_url": {"url": "https://example.com/video.mp4"}},
            {"type": "text", "text": "Summarize what happens in this video."},
        ],
    }],
)

zen-video

zen-video

Specifications

Capabilities

Usage

See Also

On this page