Zen MoDE — Mixture of Distilled Experts

Zen LM

Frontier AI models for code, reasoning, vision, video, audio, 3D, and agentic workflows

55 models across 10 modalities. Production API models from 4B to 1T+ parameters. Open weights on HuggingFace. OpenAI-compatible API. Built by Hanzo AI (Techstars '17).

55
Models
1T+
Max Parameters
2M
Max Context
10
Modalities
$0.15
From $/MTok

Flagship Models

Three tiers — from efficient edge to trillion-parameter frontier scale

API Pricing

Pay-as-you-go. $5 free credit on signup. No minimum commitment.

Loading pricing...

Quick Start

Install the Hanzo SDK — supports OpenAI and Claude-style endpoints, plus 100+ providers

Python — Hanzo SDK
pip install hanzoai

from hanzoai import Hanzo

client = Hanzo(api_key="hk-your-api-key")

response = client.chat.completions.create(
    model="zen4",
    messages=[
        {"role": "user", "content": "Hello, Zen."}
    ],
)
print(response.choices[0].message.content)
TypeScript — Hanzo SDK
npm install hanzoai

import Hanzo from "hanzoai";

const client = new Hanzo({
  apiKey: "hk-your-api-key",
});

const response = await client.chat.completions.create({
  model: "zen4-coder",
  messages: [
    { role: "user", content: "Write a React hook" }
  ],
});
console.log(response.choices[0].message.content);
curl
curl https://api.hanzo.ai/v1/chat/completions \
  -H "Authorization: Bearer hk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zen4",
    "messages": [
      {"role": "user", "content": "Hello, Zen."}
    ]
  }'
Streaming (Python)
from hanzoai import Hanzo

client = Hanzo(api_key="hk-your-api-key")

stream = client.chat.completions.create(
    model="zen4-max",
    messages=[
        {"role": "user", "content": "Explain MoE"}
    ],
    stream=True,
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

SDK Features

Multi-SDK — Python, TypeScript, Go, Rust
OpenAI Compatible — drop-in replacement
100+ Providers — Zen + Claude + GPT + more
Streaming — SSE, async, batch

Full Model Library

49 models across 10 categories — text, code, vision, video, audio, 3D, safety, embeddings, and agents

Showing 53 models · 2 legacy/upcoming hidden

Zen 5

Next-generation agentic models with native chain-of-thought.

5 models
zen5EARLY ACCESS
MoDE + CoT · 1.0M ctx

Next-generation agentic frontier model trained on 10B+ tokens of real-world tool use, multi-step reasoning, and production workflows. 1M+ token context with native chain-of-thought.

zen5-proEARLY ACCESS
MoDE + CoT · 524K ctx

High-throughput agentic model for demanding production workloads. Trained on real-world development patterns with deep chain-of-thought reasoning.

zen5-maxEARLY ACCESS
MoDE + CoT · 2.1M ctx

Maximum context agentic model for document-scale analysis. Trained on 10B+ tokens of real-world workflows with extended chain-of-thought.

zen5-ultraEARLY ACCESS
MoDE + Deep CoT · 1.0M ctx

Deepest reasoning model in the Zen family. Multi-pass chain-of-thought with self-verification.

zen5-miniEARLY ACCESS
MoDE + CoT · 262K ctx

Efficient agentic model delivering zen5-class intelligence at a fraction of the cost.

Zen 4

Latest generation production models with MoDE architecture.

7 models
Dense · 1M ctx

Most capable model for complex reasoning, analysis, and agentic tasks. 1M token context window.

Dense · 1M ctx

High-performance 1M context model for long-document analysis, large codebase reasoning, and agentic workflows. Best balance of intelligence and cost at million-token scale.

744B (40B active) MoE · 202K ctx

Flagship MoE model for complex reasoning and multi-domain tasks.

HuggingFace
744B (40B active) MoE + CoT · 262K ctx

Maximum reasoning capability with extended chain-of-thought on MoE architecture.

HuggingFace
80B (3B active) MoE · 131K ctx

Efficient MoE model for demanding workloads with strong reasoning at production-grade cost.

HuggingFace
80B (3B active) MoE + CoT · 131K ctx

Dedicated reasoning model with explicit chain-of-thought capabilities.

Dense · 128K ctx

Ultra-fast lightweight model optimized for speed and cost efficiency. Ideal for free tier.

HuggingFace

Code

Specialized models for code generation, review, and debugging.

6 models
480B (35B active) MoE · 163K ctx

Code-specialized MoE model for generation, review, debugging, and agentic programming.

HuggingFace
30B (3B active) MoE · 262K ctx

Lightweight code model optimized for speed and inline completions.

HuggingFace
480B Dense BF16 · 131K ctx

Full-precision BF16 code model for maximum accuracy on complex codebases.

HuggingFace
32B Dense · 131K ctx

Baseline code model for generation and completions.

HuggingFace
7B Dense · 32K ctx

Fast code model for inline completions and suggestions.

HuggingFace
14B Dense · 32K ctx

Legacy code model (superseded by Zen4 Coder series).

HuggingFace

Zen 3 Multimodal

Vision, safety, and multimodal chat models.

4 models
~200B Dense Multimodal · 202K ctx

Multimodal model supporting text, vision, audio, and structured output.

30B (3B active) MoE Vision-Language · 262K ctx

Vision-language model for image understanding and visual reasoning.

8B Dense · 128K ctx

Ultra-lightweight model for edge deployment and low-latency tasks. Available on free tier.

4B Dense · 65K ctx

Content safety classifier for moderation and guardrails. 9 safety categories, 119 languages.

Embedding & Retrieval

Text embeddings and search reranking via API.

8 models
3072 dimensions Embedding · 8K ctx

High-quality text embeddings for RAG, search, and classification.

4B Embedding · 40K ctx

Balanced embedding model for cost-effective retrieval workloads.

HuggingFace
0.6B Embedding · 32K ctx

Lightweight embedding model for high-throughput, low-cost applications.

HuggingFace
8B Reranker · 40K ctx

High-quality reranker for improving retrieval accuracy in RAG pipelines.

HuggingFace
4B Reranker · 40K ctx

Balanced reranker for cost-effective retrieval quality improvement.

HuggingFace
0.6B Reranker · 40K ctx

Lightweight reranker for high-throughput reranking at minimal cost.

HuggingFace
3072 dimensions Embedding · 8K ctx

Foundation embedding model for search and retrieval.

HuggingFace
568M Reranker · 8K ctx

Cross-encoder reranker for search result quality.

HuggingFace

Image Generation

Text-to-image generation via API.

8 models
Diffusion

Best general-purpose image generation.

Diffusion

Maximum quality image generation for professional creative work.

Diffusion

Development model for experimentation and iteration.

Diffusion

Fastest image model for real-time generation.

Diffusion

High-resolution image generation at 1024px.

Diffusion

Aesthetic model for artistic image generation.

1B Diffusion

Fastest diffusion model for real-time generation.

Diffusion

Japanese-specialized image generation model.

Audio & Speech

Speech-to-text, text-to-speech, and streaming ASR.

7 models
1.5B ASR

Best quality speech-to-text transcription. 100+ languages.

809M ASR

Fastest speech-to-text transcription for high-throughput workloads.

Streaming ASR

Real-time streaming speech recognition for live transcription and voice agents.

Streaming ASR

First-generation streaming ASR for legacy compatibility.

82M TTS

High-quality text-to-speech with natural prosody. 40+ voices, 8 languages.

TTS HD

Maximum fidelity text-to-speech for broadcast-quality audio production.

82M TTS

Low-latency text-to-speech for real-time voice agents and interactive applications.

Foundation

General-purpose open-weight models from 0.6B to 235B parameters.

5 models
0.6B Dense · 32K ctx

Ultra-lightweight LLM for edge and mobile deployment.

HuggingFace
4B Dense · 32K ctx

Efficient 4B model for general-purpose tasks.

HuggingFace
8–32B Dense · 32K ctx

Standard model available in 8B and 32B variants.

HuggingFace
32B Dense · 32K ctx

Professional-grade 32B dense model for demanding workloads.

HuggingFace
235B (22B active) MoE · 131K ctx

High-capability MoE model with 235B parameters.

HuggingFace

Vision (Open Weights)

Vision-language and multimodal open-weight models.

2 models
32B Dense Multimodal · 32K ctx

Multi-modal vision-language model for image understanding.

HuggingFace
72B Dense Multimodal · 131K ctx

Hypermodal model combining text, vision, audio, and code.

HuggingFace

Safety

Content moderation and safety guardrail models.

2 models
4B Dense · 65K ctx

Content safety classifier for moderation and guardrails. 9 safety categories, 119 languages.

8B Dense · 32K ctx

Content safety and moderation classifier.

HuggingFace

10 Modalities

One model family covering every AI capability

Text

14 models

Chat, reasoning, analysis

Code

9 models

Generation, review, debugging

Vision

5 models

Understanding, generation, editing

Video

4 models

Generation, understanding, I2V

Audio

7 models

Speech, music, translation

3D

2 models

Generation, world simulation

Safety

3 models

Moderation, guardrails

Embedding

2 models

Search, retrieval, RAG

Agents

1 models

Tool use, planning

Math

6 models

Reasoning, proof, computation

Architecture

Zen MoDE — curating best open-source foundations and fusing them into a unified, high-performance family

Consumer Line

Dense and MoE models from 4B to 80B. Edge-deployable dense models and efficient MoE flagships with only 3B active parameters.

Coder Line

Code-specialized models trained on 8.47B tokens of real agentic programming data. Fast completions to full-precision code intelligence.

Ultra Line

Trillion-parameter MoE models for cloud deployment. 1.04T parameters with 32B active for frontier-scale reasoning.

Efficient MoE

Mixture-of-Experts delivers frontier performance with only 3B active parameters — runs on consumer hardware.

Long Context

Up to 262K context on code models, 256K on frontier models. Dense models support 32–40K for efficient local inference.

Zen MoDE

Mixture of Distilled Experts — curating the best open-source foundations and fusing into a unified model family.

Zen Agentic Dataset

8.47 billion tokens of real-world agentic programming — not synthetic data

8.47B
Training Tokens
3.35M
Training Samples
1,452
Repositories
15yr
History (2010–2025)
48%

Git History

4.03B tokens

29%

Agentic Debug Sessions

2.42B tokens

13%

Architecture Discussions

1.14B tokens

10%

Code Review Sessions

0.86B tokens

Open Weights

Download, self-host, and fine-tune — multiple formats for every platform

SafeTensors

Full precision for HuggingFace Transformers

GGUF

Quantized for llama.cpp / Ollama

MLX

Apple Silicon optimized

ONNX

Cross-platform inference

Hanzo Ecosystem

Zen models power the entire Hanzo AI platform

Build with Zen LM

49 models. 10 modalities. Open weights. From $0.15/MTok.