Zen LM Models - Complete Model Family

Zen Model Family

24+ models spanning language, vision, audio, video, 3D, and specialized tasks

Complete model collection from 0.6B to 1T+ parameters. From efficient edge deployment to powerful cloud inference, each model is optimized for specific use cases while maintaining the same high standards of performance, transparency, and open-source accessibility.

Core Language Models

Foundational models from nano to next-gen

zen-nano

Available

Parameters0.6B

BaseQwen3-0.6B

Context32K tokens

Architecture28 layers, GQA

Ultra-efficient model for edge deployment and embedded systems. Perfect for on-device AI applications with minimal resource requirements.

SafeTensorsGGUFMLX

🤗 HF 📦 GitHub 📖 Docs 📄 Paper

zen-eco

Available

Parameters4B

BaseQwen3-3B

Context32K tokens

VariantsInstruct, Agent, Coder, Thinking

Balanced performance and efficiency for general-purpose applications. Multiple specialized variants for different use cases.

SafeTensorsGGUFMLX

🤗 HF 📦 GitHub

zen-omni

Available

Parameters7B

BaseQwen3-Omni

ModalitiesText + Vision + Audio

TypeMultimodal

Multimodal model based on Qwen3-Omni supporting text, vision, and audio understanding simultaneously. NOT Qwen2.5!

SafeTensors

🤗 HF 📦 GitHub 📖 Docs 📄 Paper

zen-coder

Available

Parameters14B

BaseQwen3-Coder-14B

Context128K tokens

FocusCode Generation

Specialized for code generation, debugging, and software engineering tasks. Supports 100+ programming languages with extended context.

SafeTensorsGGUFMLX

🤗 HF 📦 GitHub

zen-next

Available

Parameters32B

BaseQwen3-32B

Context32K tokens

FocusFrontier

Our flagship model pushing the boundaries of performance and capability. For the most demanding applications requiring maximum intelligence.

SafeTensorsGGUF

📦 GitHub

Multimodal Models

Vision, Audio, Video, and 3D Generation

zen-vl

Available

TypeVision-Language

BaseQwen3-VL

Sizes4B, 8B, 30B

VariantsInstruct, Agent

FocusFunction Calling

Next-generation vision-language model with advanced function calling capabilities. Trained on Agent Data Protocol (ADP) and xLAM datasets for superior agent performance and tool use.

SafeTensorsGGUF

🤗 HF 📦 GitHub

zen-designer

Available

TypeVision-Language

BaseQwen-VL

VariantsInstruct, Thinking

FocusVisual Understanding

Advanced vision-language model for image understanding, analysis, and reasoning. Supports visual question answering, OCR, and detailed scene description.

SafeTensors

📦 GitHub

zen-artist

Available

TypeText-to-Image

BaseQwen-Image

VariantsBase, Edit

FocusImage Generation

High-quality image generation from text descriptions. zen-artist-edit provides advanced image editing capabilities with natural language instructions.

SafeTensorsDiffusers

📦 GitHub

zen-video

Available

TypeText-to-Video

BaseHunyuanVideo

VariantsT2V, I2V

FocusVideo Generation

State-of-the-art video generation from text descriptions. zen-video-i2v provides image-to-video generation with fine control over motion and dynamics.

SafeTensors

📦 GitHub

zen-3d

Available

Type3D Generation

InputText, Image, Point Cloud

Output3D Meshes

Focus3D Assets

Generate high-quality 3D models from various input modalities. Perfect for game development, AR/VR, and 3D content creation.

SafeTensors

📦 GitHub

zen-musician

Available

TypeMusic Generation

InputText, Audio

OutputMusic, Audio

FocusMusic Creation

Generate high-quality music from text descriptions or audio samples. Supports multiple genres, instruments, and musical styles.

SafeTensors

📦 GitHub