Zen Model Family

24+ models spanning language, vision, audio, video, 3D, and specialized tasks

Complete model collection from 0.6B to 1T+ parameters. From efficient edge deployment to powerful cloud inference, each model is optimized for specific use cases while maintaining the same high standards of performance, transparency, and open-source accessibility.

Core Language Models

Foundational models from nano to next-gen

zen-nano

Available
Parameters0.6B
BaseQwen3-0.6B
Context32K tokens
Architecture28 layers, GQA

Ultra-efficient model for edge deployment and embedded systems. Perfect for on-device AI applications with minimal resource requirements.

SafeTensorsGGUFMLX

zen-eco

Available
Parameters4B
BaseQwen3-3B
Context32K tokens
VariantsInstruct, Agent, Coder, Thinking

Balanced performance and efficiency for general-purpose applications. Multiple specialized variants for different use cases.

SafeTensorsGGUFMLX

zen-omni

Available
Parameters7B
BaseQwen3-Omni
ModalitiesText + Vision + Audio
TypeMultimodal

Multimodal model based on Qwen3-Omni supporting text, vision, and audio understanding simultaneously. NOT Qwen2.5!

SafeTensors

zen-coder

Available
Parameters14B
BaseQwen3-Coder-14B
Context128K tokens
FocusCode Generation

Specialized for code generation, debugging, and software engineering tasks. Supports 100+ programming languages with extended context.

SafeTensorsGGUFMLX

zen-next

Available
Parameters32B
BaseQwen3-32B
Context32K tokens
FocusFrontier

Our flagship model pushing the boundaries of performance and capability. For the most demanding applications requiring maximum intelligence.

SafeTensorsGGUF

Multimodal Models

Vision, Audio, Video, and 3D Generation

zen-vl

Available
TypeVision-Language
BaseQwen3-VL
Sizes4B, 8B, 30B
VariantsInstruct, Agent
FocusFunction Calling

Next-generation vision-language model with advanced function calling capabilities. Trained on Agent Data Protocol (ADP) and xLAM datasets for superior agent performance and tool use.

SafeTensorsGGUF

zen-designer

Available
TypeVision-Language
BaseQwen-VL
VariantsInstruct, Thinking
FocusVisual Understanding

Advanced vision-language model for image understanding, analysis, and reasoning. Supports visual question answering, OCR, and detailed scene description.

SafeTensors

zen-artist

Available
TypeText-to-Image
BaseQwen-Image
VariantsBase, Edit
FocusImage Generation

High-quality image generation from text descriptions. zen-artist-edit provides advanced image editing capabilities with natural language instructions.

SafeTensorsDiffusers

zen-video

Available
TypeText-to-Video
BaseHunyuanVideo
VariantsT2V, I2V
FocusVideo Generation

State-of-the-art video generation from text descriptions. zen-video-i2v provides image-to-video generation with fine control over motion and dynamics.

SafeTensors

zen-3d

Available
Type3D Generation
InputText, Image, Point Cloud
Output3D Meshes
Focus3D Assets

Generate high-quality 3D models from various input modalities. Perfect for game development, AR/VR, and 3D content creation.

SafeTensors

zen-musician

Available
TypeMusic Generation
InputText, Audio
OutputMusic, Audio
FocusMusic Creation

Generate high-quality music from text descriptions or audio samples. Supports multiple genres, instruments, and musical styles.

SafeTensors