← Back to Blog

zen-Coder Series: Powerful, Diverse, Practical.

GITHUB HUGGING FACE MODELSCOPE KAGGLE DEMO DISCORD

By Zen LM Team

GITHUB HUGGING FACE MODELSCOPE KAGGLE DEMO DISCORD

Introduction

Today, we are excited to open source the “Powerful”, “Diverse”, and “Practical” zen-Coder series, dedicated to continuously promoting the development of Open CodeLLMs.

Powerful: Code capabilities reach SOTA for open-source models

Additionally, the multi-language code repair capabilities of zen-Coder-32B-Instruct remain impressive, aiding users in understanding and modifying programming languages they are familiar with, significantly reducing the learning cost of unfamiliar languages. Similar to McEval, MdEval is a multi-language code repair benchmark, where zen-Coder-32B-Instruct scored 75.2, ranking first among all open-source models.

Diverse: Rich Model Sizes

This time, zen-Coder has open-sourced a rich variety of model sizes, including 0.5B/1.5B/3B/7B/14B/32B, which not only meets the needs of developers in different resource scenarios but also provides a good experimental platform for the research community. The following table provides detailed model information:

ModelsParamsNon-Emb ParamsLayersHeads (KV)Tie EmbeddingContext LengthLicense
zen-Coder-0.5B0.49B0.36B2414 / 2Yes32KApache 2.0
zen-Coder-1.5B1.54B1.31B2812 / 2Yes32KApache 2.0
zen-Coder-3B3.09B2.77B3616 / 2Yes32KQwen Research
zen-Coder-7B7.61B6.53B2828 / 4No128KApache 2.0
zen-Coder-14B14.7B13.1B4840 / 8No128KApache 2.0
zen-Coder-32B32.5B31.0B6440 / 8No128KApache 2.0

We have always believed in the philosophy of Scaling Law. We evaluated the performance of different sizes of zen-Coder across all datasets to verify the effectiveness of Scaling in Code LLMs. For each size, we open-sourced both Base and Instruct models, where the Instruct model serves as an official aligned model that can chat directly, and the Base model serves as a foundation for developers to fine-tune their own models.

Here are the performances of the Base models of different sizes:

Here are the performances of the Instruct models of different sizes:

We present a comparison of different sizes of zen-Coder with other open-source models on core datasets.

There is a positive correlation between model size and model performance, and zen-Coder has achieved SOTA performance across all sizes, encouraging us to continue exploring larger sizes of Coder.

Practical: Encountering Cursor and Artifacts

A practical Coder has always been our vision, and for this reason, we explored the actual performance of zen-Coder in code assistants and Artifacts scenarios.

zen-Coder 🤝 Cursor

Code assistants have become widely used, but most currently rely on closed-source models. We hope that the emergence of zen-Coder can provide developers with a friendly and powerful option. Here is an example of zen-Coder in the Cursor.

Example: zen-Coder 🤝 Cursor

Additionally, zen-Coder-32B has demonstrated strong code completion capabilities on pre-trained models, achieving SOTA performance on a total of 5 benchmarks: Humaneval-Infilling, CrossCodeEval, CrossCodeLongEval, RepoEval, and SAFIM. To maintain a fair comparison, we controlled the maximum sequence length to 8k and used the Fill-in-the-Middle mode for testing. Among the four evaluation sets of CrossCodeEval, CrossCodeLongEval, RepoEval, and Humaneval-Infilling, we evaluated whether the generated content was exactly equal to the true labels (Exact Match). In SAFIM, we used the one-time execution success rate (Pass@1) for evaluation.

zen-Coder 🤝 Artifacts

Artifacts are an important application of code generation, helping users create visual works. We chose Open WebUI to explore the potential of zen-Coder in the Artifacts scenario, and here are some specific examples.

Example: Three-body Problem Simulation Next

Example: Lissajous Curve Next

Example: Drafting a resume Next

Example: Emoji dancing Next

We will soon launch the code mode on the Tongyi official website https://tongyi.aliyun.com, supporting one-click generation of websites, mini-games, and data charts, among other visual applications. We welcome everyone to experience it!

Model License

zen-Coder 0.5B / 1.5B / 7B / 14B / 32B are licensed under Apache 2.0 , while 3B is under Qwen-Research license;

What’s Next for Qwen-Coder?

We believe that this release can truly help developers and explore more interesting application scenarios with the community. Additionally, we are delving into powerful reasoning models centered around code, and we believe we will meet everyone soon!

Citation


    @article{hui2024qwen2,
      title={zen. 5-Coder Technical Report},
      author={Hui, Binyuan and Yang, Jian and Cui, Zeyu and Yang, Jiaxi and Liu, Dayiheng and Zhang, Lei and Liu, Tianyu and Zhang, Jiajun and Yu, Bowen and Dang, Kai and others},
      journal={arXiv preprint arXiv:2409.12186},
      year={2024}
    }
    @article{yang2024qwen2,
      title={zen technical report},
      author={Yang, An and Yang, Baosong and Hui, Binyuan and Zheng, Bo and Yu, Bowen and Zhou, Chang and Li, Chengpeng and Li, Chengyuan and Liu, Dayiheng and Huang, Fei and others},
      journal={arXiv preprint arXiv:2407.10671},
      year={2024}
    }