⚡ Zen LM
Models

zen-nano

Ultra-compact 0.6B dense model for edge inference at 44K tokens/sec.

zen-nano

Edge

The smallest model in the Zen family. A 0.6B dense transformer designed for on-device inference, IoT deployments, and latency-critical pipelines where every millisecond counts. Achieves 44K tokens/sec with as little as 0.4GB RAM.

Specifications

PropertyValue
Model IDzen-nano
Parameters0.6B
ArchitectureDense
Context Window32K tokens
Throughput44K tokens/sec
Memory0.4--1.2 GB RAM
HuggingFacezenlm/zen-nano-0.6b

Capabilities

  • Ultra-low latency inference (44K tokens/sec)
  • On-device and edge deployment
  • Mobile and embedded applications
  • Text classification and extraction
  • Lightweight chat and instruction following
  • Runs on CPU-only hardware

Usage

HuggingFace

pip install transformers torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("zenlm/zen-nano-0.6b")
model = AutoModelForCausalLM.from_pretrained("zenlm/zen-nano-0.6b")

inputs = tokenizer("Hello, Zen.", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=64)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

API

from hanzoai import Hanzo

client = Hanzo(api_key="hk-your-api-key")

response = client.chat.completions.create(
    model="zen-nano",
    messages=[{"role": "user", "content": "Hello, Zen."}],
)
print(response.choices[0].message.content)

See Also

On this page