Zen4 Ultra: 480B Parameters, 1M Token Context
GITHUB HUGGING FACE TRY ZEN CHAT Zen4 Ultra is the most capable model in the Zen4 family. It is a Mixture of Distilled Experts model with 480B total parameters and 35B active parameters per forward pass. The native context window is 256K tokens, extending to 1M tokens with YaRN extrapolation. Architecture Property Value Total parameters 480B Active parameters per token 35B Experts per layer 128 Top-k routing 8 Context window (native) 256K Context window (YaRN) 1M Vocabulary size 151,936 Attention heads 64 KV heads (GQA) 8 Layers 94 Benchmark Results General Reasoning Benchmark Zen4 Ultra Zen Max 72B MMLU 89....