MoE | Zen LM

GT-QLoRA: Uncensoring Trillion-Parameter MoE Models

ZEN4-ULTRA TRAINER ZEN4-ULTRA WEIGHTS ZEN4-ULTRA GGUF Standard abliteration works on dense models. It fails on Mixture-of-Experts. This post explains why, and how Gate-Targeted QLoRA (GT-QLoRA) — the technique we developed for zen4-ultra — addresses the fundamental architectural mismatch. This is a technical post about a hard problem. We are not publishing this because we have solved it cleanly. We are publishing it because the failure mode of naive approaches is subtle and poorly documented, and other researchers building on MoE architectures need to understand it....

Drop-Upcycling and the Birth of Zen MoDE Architecture

DROP-UPCYCLING PAPER ZEN MODELS ZEN CODE Mixture of Experts (MoE) is the architecture that makes trillion-parameter models economically viable. By routing each token through a small subset of expert networks rather than the full parameter set, MoE achieves large-model quality at dense-model inference cost. The problem: training an MoE from scratch is expensive. You are paying for both the scale and the specialization overhead. Drop-Upcycling is a technique that converts a trained dense checkpoint into an MoE at roughly 1/4 the training cost of building the MoE from scratch....

Zen MoDE: Mixture of Distilled Experts

GITHUB HUGGING FACE All Zen models are built on Zen MoDE: Mixture of Distilled Experts. This post explains the architecture, why we chose it, and how distillation and expert routing interact to deliver frontier capability at practical inference cost. The Core Problem There is a fundamental tension in large model design: More parameters → better capability More parameters → higher inference cost Dense scaling laws are well established. Doubling parameters roughly halves perplexity (with sufficient data), but doubles inference FLOP....