⚡ Zen LM
Getting started

Quickstart

Get started with Zen models in minutes

Quickstart

Get up and running with Zen models quickly.

Hanzo Cloud API

The fastest way to use Zen models — no GPU required:

pip install hanzoai
from hanzoai import Hanzo

client = Hanzo(api_key="hk-your-api-key")

response = client.chat.completions.create(
    model="zen4",
    messages=[{"role": "user", "content": "Hello, Zen!"}],
)
print(response.choices[0].message.content)

Get your API key at console.hanzo.ai — every new account gets $5 free credit.

Transformers

The easiest way to use Zen models:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "zenlm/zen-coder-flash"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [{"role": "user", "content": "Write a Python function for binary search"}]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
outputs = model.generate(inputs.to(model.device), max_new_tokens=512, do_sample=True, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

vLLM (Production)

For high-throughput production serving:

vllm serve zenlm/zen-coder-flash \
    --tensor-parallel-size 4 \
    --enable-auto-tool-choice

SGLang

With EAGLE speculative decoding:

python -m sglang.launch_server \
    --model-path zenlm/zen-coder-flash \
    --tp-size 4 \
    --speculative-algorithm EAGLE

MLX (Apple Silicon)

Optimized for M1/M2/M3 Macs:

from mlx_lm import load, generate

model, tokenizer = load("zenlm/zen-coder-flash")
response = generate(model, tokenizer, prompt="Write a Rust function for quicksort", max_tokens=256)
print(response)

On this page