Zen AI
A family of open-source AI models you can chat with for free — or download and run privately on your own machine.
No account needed to chat · Open weights on HuggingFace · Built by Hanzo AI
Chat with Zen — no signup required
Try any Zen model instantly on Hanzo Chat. Free tier includes access to zen4, zen4-mini, zen3-nano and more.
Run locally — 100% private
Download open-weight Zen models and run them on your own hardware. Works with Ollama, LM Studio, llama.cpp, and Transformers.
1. Install Ollama
Download Ollama from ollama.com →2. Run Zen Nano
ollama run hf.co/zenlm/zen-nano-0.6bOllama will download the model from HuggingFace automatically on first run. Requires ~1 GB RAM.
Free to chat
Try every model instantly on Hanzo Chat. No credit card, no signup. Upgrade for higher limits and API access.
Run it yourself
Apache 2.0 licensed weights on HuggingFace. Download GGUF for Ollama and LM Studio, or full precision for training.
Build with the API
OpenAI-compatible API at api.hanzo.ai. From $0.15/MTok. Drop-in replacement for GPT-4 or Claude.
Open weight models
Download and run — Apache 2.0
Developer API
OpenAI-compatible. Drop-in replacement.
pip install hanzoai
from hanzoai import Hanzo
client = Hanzo(api_key="hk-...")
r = client.chat.completions.create(
model="zen4",
messages=[{"role": "user", "content": "Hello!"}],
)
print(r.choices[0].message.content)curl https://api.hanzo.ai/v1/chat/completions \
-H "Authorization: Bearer hk-..." \
-H "Content-Type: application/json" \
-d '{
"model": "zen4",
"messages": [
{"role": "user", "content": "Hello!"}
]
}'All 55 models
Text, code, vision, audio, image, 3D, safety, embeddings, and agents
Showing 53 models · 2 legacy/upcoming hidden
Zen 5
Next-generation agentic models with native chain-of-thought.
Next-generation agentic frontier model trained on 10B+ tokens of real-world tool use, multi-step reasoning, and production workflows. 1M+ token context with native chain-of-thought.
High-throughput agentic model for demanding production workloads. Trained on real-world development patterns with deep chain-of-thought reasoning.
Maximum context agentic model for document-scale analysis. Trained on 10B+ tokens of real-world workflows with extended chain-of-thought.
Deepest reasoning model in the Zen family. Multi-pass chain-of-thought with self-verification.
Efficient agentic model delivering zen5-class intelligence at a fraction of the cost.
Zen 4
Latest generation production models with MoDE architecture.
Most capable model for complex reasoning, analysis, and agentic tasks. 1M token context window.
High-performance 1M context model for long-document analysis, large codebase reasoning, and agentic workflows. Best balance of intelligence and cost at million-token scale.
Flagship MoE model for complex reasoning and multi-domain tasks.
Maximum reasoning capability with extended chain-of-thought on MoE architecture.
Efficient MoE model for demanding workloads with strong reasoning at production-grade cost.
Dedicated reasoning model with explicit chain-of-thought capabilities.
Code
Specialized models for code generation, review, and debugging.
Code-specialized MoE model for generation, review, debugging, and agentic programming.
Lightweight code model optimized for speed and inline completions.
Full-precision BF16 code model for maximum accuracy on complex codebases.
Fast code model for inline completions and suggestions.
Zen 3
Previous generation API models — language, vision, multimodal, and safety.
Multimodal model supporting text, vision, audio, and structured output.
Vision-language model for image understanding and visual reasoning.
Ultra-lightweight model for edge deployment and low-latency tasks. Available on free tier.
Content safety classifier for moderation and guardrails. 9 safety categories, 119 languages.
Embedding & Retrieval
Text embeddings and search reranking via API.
High-quality text embeddings for RAG, search, and classification.
Balanced embedding model for cost-effective retrieval workloads.
Lightweight embedding model for high-throughput, low-cost applications.
High-quality reranker for improving retrieval accuracy in RAG pipelines.
Balanced reranker for cost-effective retrieval quality improvement.
Lightweight reranker for high-throughput reranking at minimal cost.
Foundation embedding model for search and retrieval.
Image Generation
Text-to-image generation via API.
Audio & Speech
Speech-to-text, text-to-speech, and streaming ASR.
Real-time streaming speech recognition for live transcription and voice agents.
Low-latency text-to-speech for real-time voice agents and interactive applications.
Foundation
General-purpose open-weight models from 0.6B to 235B parameters.
Vision (Open Weights)
Vision-language and multimodal open-weight models.
Ready to start?
Chat for free, download weights, or build with the API.
Research & Writing
Papers, technical reports, and updates from the Zen LM team