GT-QLoRA: Uncensoring Trillion-Parameter MoE Models
ZEN4-ULTRA TRAINER ZEN4-ULTRA WEIGHTS ZEN4-ULTRA GGUF Standard abliteration works on dense models. It fails on Mixture-of-Experts. This post explains why, and how Gate-Targeted QLoRA (GT-QLoRA) — the technique we developed for zen4-ultra — addresses the fundamental architectural mismatch. This is a technical post about a hard problem. We are not publishing this because we have solved it cleanly. We are publishing it because the failure mode of naive approaches is subtle and poorly documented, and other researchers building on MoE architectures need to understand it....