Blog

GT-QLoRA: Uncensoring Trillion-Parameter MoE Models

ZEN4-ULTRA TRAINER ZEN4-ULTRA WEIGHTS ZEN4-ULTRA GGUF Standard abliteration works on dense models. It fails on Mixture-of-Experts. This post explains why, and how Gate-Targeted QLoRA (GT-QLoRA) — the technique we developed for zen4-ultra — addresses the fundamental architectural mismatch. This is a technical post about a hard problem. We are not publishing this because we have solved it cleanly. We are publishing it because the failure mode of naive approaches is subtle and poorly documented, and other researchers building on MoE architectures need to understand it....

February 28, 2026 · 8 min · 1599 words · Zen LM Team

Drop-Upcycling and the Birth of Zen MoDE Architecture

DROP-UPCYCLING PAPER ZEN MODELS ZEN CODE Mixture of Experts (MoE) is the architecture that makes trillion-parameter models economically viable. By routing each token through a small subset of expert networks rather than the full parameter set, MoE achieves large-model quality at dense-model inference cost. The problem: training an MoE from scratch is expensive. You are paying for both the scale and the specialization overhead. Drop-Upcycling is a technique that converts a trained dense checkpoint into an MoE at roughly 1/4 the training cost of building the MoE from scratch....

February 28, 2026 · 7 min · 1443 words · Zen LM Team

BitDelta: 1-Bit Behavioral Compression Across the Zen Model Family

BITDELTA PAPER MONOSOUP PAPER K-MERGE PAPER ZEN MODELS The Zen model family has a deployment problem that is not immediately obvious from the outside. We publish 14+ distinct model variants — from zen-nano at 0.6B parameters to zen4-ultra at 1.04T. Each variant carries fine-tuned behavioral characteristics: different personas, different task specializations, different safety postures. In a naive serving architecture, each variant is a separate set of weights. Loading all of them onto a GPU cluster is economically impossible....

February 28, 2026 · 7 min · 1345 words · Zen LM Team

SuRe + OPCM: Production-Grade Continual Learning for Open Models

OPLoRA PAPER SuRe PAPER OPCM PAPER YOUTU-AGENT PAPER Every production LLM faces the same brutal constraint: the moment you start adapting a model on new data, it begins forgetting what it already knew. This is catastrophic forgetting — and it is not a theoretical concern. It is the reason most “continually updated” models in production are quietly replaced wholesale every few months rather than genuinely updated in place. For the Zen model family, wholesale replacement is not acceptable....

February 28, 2026 · 8 min · 1602 words · Zen LM Team

Zen4 Ultra: 480B Parameters, 1M Token Context

GITHUB HUGGING FACE TRY ZEN CHAT Zen4 Ultra is the most capable model in the Zen4 family. It is a Mixture of Distilled Experts model with 480B total parameters and 35B active parameters per forward pass. The native context window is 256K tokens, extending to 1M tokens with YaRN extrapolation. Architecture Property Value Total parameters 480B Active parameters per token 35B Experts per layer 128 Top-k routing 8 Context window (native) 256K Context window (YaRN) 1M Vocabulary size 151,936 Attention heads 64 KV heads (GQA) 8 Layers 94 Benchmark Results General Reasoning Benchmark Zen4 Ultra Zen Max 72B MMLU 89....

January 20, 2026 · 3 min · 506 words · Zen LM Team