Introducing Zen 5 — Our Most Capable Model Yet
Zen 5 brings 1T+ parameter scale, 2M context windows, and state-of-the-art performance across reasoning, code, and multimodal tasks.
Research, releases, and perspectives from the Zen LM team.
Zen 5 brings 1T+ parameter scale, 2M context windows, and state-of-the-art performance across reasoning, code, and multimodal tasks.
How BitDelta (arXiv:2402.10193) compresses fine-tuned behavioral deltas to 1-bit precision, enabling the full Zen model family — nano through ultra — to share a single GPU cluster.
Deep dive on Surprise-Driven Prioritized Replay (SuRe) and Orthogonal Projection Continual Merging (OPCM) — the two SOTA techniques we use for catastrophic-forgetting-free LLM adaptation in the Zen model family.
How Drop-Upcycling (arXiv:2502.19261) transforms dense checkpoints into MoE models at 1/4 training cost, and how it shapes Zen MoDE — our Mixture of Distilled Experts architecture.
Why standard abliteration techniques fail on Mixture-of-Experts models, and how Gate-Targeted QLoRA solves the expert routing problem at 1 trillion parameters.
An inside look at the Mixture of Distilled Experts architecture that powers the Zen model family.
The case for radical openness in AI — why releasing weights is the right thing, strategically and ethically.
Zen4 Ultra is our most capable model: 480B total parameters, 35B active per token, 1M token context window. Benchmark results and use cases.
Announcing the Zen model family: 94+ open models built on Zen MoDE architecture, co-developed by Hanzo AI and Zoo Labs Foundation.
Tech Report GitHub Hugging Face ModelScope DISCORD
QWEN CHAT GITHUB HUGGING FACE MODELSCOPE DISCORD
GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD
DEMO API DISCORD
GITHUB HUGGING FACE MODELSCOPE DISCORD
API DISCORD
QWEN CHAT DISCORD
GITHUB HUGGING FACE MODELSCOPE DISCORD
QWEN CHAT GitHub Hugging Face ModelScope Kaggle DEMO DISCORD
QWEN CHAT GITHUB HUGGING FACE MODELSCOPE DISCORD
QWEN CHAT HUGGING FACE MODELSCOPE DASHSCOPE GITHUB PAPER DEMO DISCORD
QWEN CHAT Hugging Face ModelScope DEMO DISCORD
QWEN CHAT DISCORD
QWEN CHAT API DEMO DISCORD
Tech Report HuggingFace ModelScope Qwen Chat HuggingFace Demo ModelScope Demo DISCORD
We release zen-VL, the new flagship vision-language model with enhanced visual understanding, OCR, agentic capabilities, and long video comprehension.
GITHUB HUGGING FACE MODELSCOPE DISCORD
GITHUB HUGGING FACE MODELSCOPE DISCORD
Announcing Zen 3.0, our most capable open model family yet.
GITHUB HUGGING FACE MODELSCOPE KAGGLE DEMO DISCORD
GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD
API Documentation (Chinese) HuggingFace Demo ModelScope Demo
GITHUB HUGGING FACE MODELSCOPE KAGGLE DEMO DISCORD
Reflections on where open AI development is heading and what it will take to get there.
GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD
GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD
GITHUB HUGGING FACE MODELSCOPE DISCORD
GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD
DEMO GITHUB HUGGING FACE MODELSCOPE API DISCORD
DEMO PAPER GITHUB HUGGING FACE MODELSCOPE DISCORD
GITHUB HUGGING FACE MODELSCOPE DISCORD
How we're building a decentralized compute network for training large AI models.
GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD
We’ve created an agent using zen models with an 8k context size to understand documents with 1M tokens, surpassing RAG and native long-context models. This agent was also used to generate data for training new long-context Qwen models.
How Zoo Improvement Proposals enable community-driven governance of open AI development.
API DEMO DISCORD
GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD
GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD
GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD
GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD
Today we formally launch Zoo Labs Foundation, an open research network dedicated to decentralized AI and decentralized science.
GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD
Along with the rapid development of our large language model Qwen, we leveraged Qwen’s capabilities and unified multimodal pretraining to address the limitations of multimodal models in generalization, and we opensourced multimodal model Qwen-VL in Sep. 2023. Recently, the Qwen-VL series has undergone a significant upgrade with the launch of two enhanced versions, Qwen-VL-Plus and Qwen-VL-Max. The key technical advancements in these versions include:
4 months after our first release of Qwen-7B, which is the starting point of our opensource journey of large language models (LLM), we now provide an introduction to the Qwen series to give you a whole picture of our work as well as our objectives. Below are important links to our opensource projects and community.
Introducing Agent NFTs, a framework for giving AI agents persistent identity and enabling ownership of their capabilities.
Announcing Training Gym, our open platform for collaborative large model training.
How we're bringing cryptographic verification to AI inference, enabling trustless machine learning.
Introducing the Zen Reranker, a cross-encoder model that dramatically improves retrieval quality in two-stage pipelines.
Intro Generalist Models are hot! We all see an opportunity towards a real generalist model by multimodal multitask learning. We previously release an opensourced unified multimodal pretrained model OFA for this goal. However, we actually met a lot of difficulties in our implementation. For example, it is hard to set up multiple tasks concerning multiple modalities, and it is hard to organize multitask learning, e.g., how to batchify your data and how to make your training stable.
CLIP1 is a phenomenal playmaker in vision and multimodal representation learning. It plays not only as a foundation model but also a bridge between vision and language. It has triggered a series of research in different fields, especially text-to-image generation. However, we find that there is a necessity for a language-specific CLIP for applications, especially cross-modal retrieval, and there is no opensourced Chinese CLIP with good performance. We therefore launched this project to promote the Chinese multimodal representation learning.
Why we trained embedding models with 7680 dimensions and what we learned about the relationship between dimensionality and retrieval quality.
Exploring high-dimensional embedding spaces for semantic search and retrieval.
2022 is a year of generalist models! With the bloom of multimodal pretraining, especially the unified model, we have witnessed the opportunity to building a generalist model that is capable of processing tasks of different modalities or multi-modalities! Thus, we propose OFA1, namely One-For-All, a unified multimodal pretrained model that unifies understanding and generation tasks concerning modalities into a single framework, and we pretrain OFA with the instruction-based multitask-pretraining that endows it with multiple capabilities.
Introducing GRPO, a new approach to reinforcement learning from human feedback that improves sample efficiency and alignment stability.
A companion post to our GRPO paper, explaining group relative policy optimization for language model alignment.
Privacy-preserving machine learning that maintains model quality through novel aggregation protocols.
How federated learning enables collaborative model training while preserving data privacy.
Introducing experience ledgers, a framework for giving AI agents persistent, verifiable memory.
A manifesto for decentralized science (DeSci) and its application to AI research.
Why scientific research needs decentralization, and how blockchain can help.
How we're approaching training data curation to capture humanity's collective intelligence.
We're launching Zen, an open research initiative to build AI that serves everyone.