Today we release Zen 3.0, our third-generation language model family. Zen 3 represents a step change in what open models can do.
Model Family
Zen 3 comes in several sizes:
| Model | Parameters | Context | Training Tokens |
|---|---|---|---|
| Zen-3-8B | 8.1B | 128K | 15T |
| Zen-3-32B | 32.5B | 128K | 12T |
| Zen-3-72B | 72.3B | 128K | 10T |
| Zen-3-MoE | 141B (24B active) | 128K | 14T |
All models use the same architecture with scaled dimensions. All are released under Apache 2.0.
Architecture Highlights
Extended Context
All Zen 3 models support 128K token context natively:
- RoPE extensions: Position interpolation with NTK-aware scaling
- Sliding window attention: Efficient processing of long sequences
- Memory-efficient attention: FlashAttention-2 throughout
Long context isn’t just about the number; it’s about actually using it. Our needle-in-haystack evaluation shows >95% retrieval accuracy at 100K tokens.
Mixture of Experts
Zen-3-MoE uses a sparse architecture:
- 64 experts per layer
- Top-2 routing with load balancing
- 24B active parameters (141B total)
- Achieves 72B-dense quality at 32B-dense cost
Expert parallelism enables efficient inference on consumer hardware.
Improved Tokenizer
The Zen 3 tokenizer improves on previous versions:
- 128K vocabulary (up from 32K)
- Better multilingual coverage
- Improved code tokenization
- Reduced fertility for technical content
Larger vocabulary means fewer tokens per document means longer effective context.
Capability Improvements
Benchmarks
| Benchmark | Zen-2-70B | Zen-3-72B | Improvement |
|---|---|---|---|
| MMLU | 74.2 | 82.1 | +7.9 |
| GSM8K | 68.4 | 84.7 | +16.3 |
| HumanEval | 58.5 | 71.3 | +12.8 |
| HellaSwag | 85.1 | 89.4 | +4.3 |
| MATH | 32.6 | 51.2 | +18.6 |
The improvements are substantial across all categories. Math and coding see the largest gains.
Real-World Tasks
Benchmarks don’t tell the whole story. Zen 3 excels at:
Long-form writing: Coherent documents spanning thousands of words with consistent style and structure.
Multi-step reasoning: Complex problems requiring planning and backtracking.
Code generation: Full functions and classes, not just snippets.
Instruction following: Precise adherence to formatting and constraint requirements.
Multilingual: Strong performance in 30+ languages including low-resource ones.
Agentic Capabilities
Zen 3 is designed for agent use cases:
- Tool use: Reliable function calling with schema adherence
- Planning: Multi-step task decomposition
- Memory integration: Designed for RAG and experience ledgers
- Self-correction: Recognizes and recovers from errors
Early agent benchmarks show 2x improvement over Zen 2 on multi-step tasks.
Training Details
Data
Training data evolved significantly:
- Quality filtering: Improved classifiers for content quality
- Deduplication: Near-duplicate removal at document and paragraph level
- Synthetic data: 20% of training tokens from LLM-generated content
- Code emphasis: 15% code (up from 8% in Zen 2)
- Instruction mixing: 5% instruction data during pretraining
Total: 15T tokens for the 8B model, proportionally less for larger models.
Training Process
Training used the Zoo Compute Network:
- Duration: 4 months
- Peak nodes: 2,048 H100 GPUs
- Total compute: 3.2 million GPU-hours
- Efficiency: 47% MFU average
The training run was the largest yet on the decentralized network. It validated that frontier training is possible without centralized infrastructure.
Alignment
Post-training alignment followed our standard process:
- Supervised fine-tuning: 100K high-quality instruction examples
- GRPO: Group Relative Policy Optimization on preference data
- Constitutional training: Principle-based refinement
- Red teaming: Adversarial testing with remediation
Alignment reduced benchmark scores slightly (2-3%) while significantly improving real-world usefulness.
Safety Evaluation
All Zen 3 models passed our safety evaluation suite:
Refusal Rates
| Category | Zen-2-70B | Zen-3-72B |
|---|---|---|
| Violence instructions | 99.2% | 99.7% |
| CSAM | 100% | 100% |
| Malware | 97.8% | 99.1% |
| PII extraction | 94.6% | 98.3% |
Improved refusal with fewer false positives on legitimate requests.
Bias Metrics
We evaluated on standard bias benchmarks:
- BBQ: 89.2% accuracy (vs. 84.1% for Zen 2)
- WinoBias: 76.4% anti-stereotype (vs. 71.2%)
- Toxicity: 0.023 average score (vs. 0.041)
Improvements through both data curation and RLHF.
Limitations
Zen 3 is not perfect:
- Can still be jailbroken with sufficient effort
- May hallucinate facts, especially for recent events
- Long-context retrieval degrades past 100K tokens
- Some languages underperform (especially non-Latin scripts)
- Resource-intensive for edge deployment
We publish these limitations because transparency enables responsible use.
Usage
Hugging Face
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"zoo-labs/zen-3-72b",
torch_dtype=torch.bfloat16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("zoo-labs/zen-3-72b")
output = model.generate(
tokenizer("Hello, Zen!", return_tensors="pt").input_ids,
max_new_tokens=100,
)
vLLM
from vllm import LLM, SamplingParams
llm = LLM(model="zoo-labs/zen-3-72b")
outputs = llm.generate(["Hello, Zen!"], SamplingParams(max_tokens=100))
Quantized Versions
For resource-constrained deployment:
- zen-3-72b-AWQ: 4-bit quantization, minimal quality loss
- zen-3-72b-GPTQ: Alternative 4-bit format
- zen-3-72b-GGUF: llama.cpp compatible
The 8B model runs on consumer GPUs. The 72B quantized fits in 48GB.
What’s Next
Zen 3 is a foundation. Coming soon:
- Zen-3-Vision: Multimodal variant with image understanding
- Zen-3-Code: Specialized coding model
- Zen-3-Long: 1M+ context extension
- Zen-3-Agent: Optimized for agentic workflows
The foundation is strong. Now we build.
Acknowledgments
Zen 3 was trained on the Zoo Compute Network with contributions from 847 node operators across 34 countries. Thank you.
This release was funded through the Zoo Labs Foundation treasury, allocated by community vote (ZIP-72). Thank you to all token holders who participated in governance.
Special thanks to the training, alignment, and evaluation teams who made this possible.
Download at huggingface.co/zoo-labs.
Zach Kelling is a co-founder of Zoo Labs Foundation.