Models
zen-video
Video understanding model for frame analysis, captioning, and temporal reasoning.
zen-video
Video Understanding
A video understanding model that analyzes video content frame by frame. Answers questions about video content, generates descriptions, detects actions, and reasons about temporal sequences.
This model is coming soon. Join the waitlist at hanzo.chat.
Specifications
| Property | Value |
|---|---|
| Model ID | zen-video |
| Architecture | Multimodal Transformer |
| Input | Video (up to 10 minutes) |
| Status | Coming Soon |
| HuggingFace | -- |
Capabilities
- Video question answering
- Scene description and captioning
- Action detection and classification
- Temporal reasoning across frames
- Key moment extraction
- Content moderation for video
Usage
from hanzoai import Hanzo
client = Hanzo(api_key="hk-your-api-key")
# Coming soon
response = client.chat.completions.create(
model="zen-video",
messages=[{
"role": "user",
"content": [
{"type": "video_url", "video_url": {"url": "https://example.com/video.mp4"}},
{"type": "text", "text": "Summarize what happens in this video."},
],
}],
)See Also
- zen-director -- Text-to-video generation
- zen-video-i2v -- Image-to-video animation
- zen-omni -- Hypermodal understanding