Getting started
Quickstart
Get started with Zen models in minutes
Quickstart
Get up and running with Zen models quickly.
Hanzo Cloud API
The fastest way to use Zen models — no GPU required:
pip install hanzoaifrom hanzoai import Hanzo
client = Hanzo(api_key="hk-your-api-key")
response = client.chat.completions.create(
model="zen4",
messages=[{"role": "user", "content": "Hello, Zen!"}],
)
print(response.choices[0].message.content)Get your API key at console.hanzo.ai — every new account gets $5 free credit.
Transformers
The easiest way to use Zen models:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "zenlm/zen-coder-flash"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [{"role": "user", "content": "Write a Python function for binary search"}]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
outputs = model.generate(inputs.to(model.device), max_new_tokens=512, do_sample=True, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))vLLM (Production)
For high-throughput production serving:
vllm serve zenlm/zen-coder-flash \
--tensor-parallel-size 4 \
--enable-auto-tool-choiceSGLang
With EAGLE speculative decoding:
python -m sglang.launch_server \
--model-path zenlm/zen-coder-flash \
--tp-size 4 \
--speculative-algorithm EAGLEMLX (Apple Silicon)
Optimized for M1/M2/M3 Macs:
from mlx_lm import load, generate
model, tokenizer = load("zenlm/zen-coder-flash")
response = generate(model, tokenizer, prompt="Write a Rust function for quicksort", max_tokens=256)
print(response)