zen-nano

Edge

The smallest model in the Zen family. A 0.6B dense transformer designed for on-device inference, IoT deployments, and latency-critical pipelines where every millisecond counts. Achieves 44K tokens/sec with as little as 0.4GB RAM.

Specifications

Property	Value
Model ID	`zen-nano`
Parameters	0.6B
Architecture	Dense
Context Window	32K tokens
Throughput	44K tokens/sec
Memory	0.4--1.2 GB RAM
HuggingFace	zenlm/zen-nano-0.6b

Capabilities

Ultra-low latency inference (44K tokens/sec)
On-device and edge deployment
Mobile and embedded applications
Text classification and extraction
Lightweight chat and instruction following
Runs on CPU-only hardware

Usage

HuggingFace

pip install transformers torch

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("zenlm/zen-nano-0.6b")
model = AutoModelForCausalLM.from_pretrained("zenlm/zen-nano-0.6b")

inputs = tokenizer("Hello, Zen.", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=64)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

API

from hanzoai import Hanzo

client = Hanzo(api_key="hk-your-api-key")

response = client.chat.completions.create(
    model="zen-nano",
    messages=[{"role": "user", "content": "Hello, Zen."}],
)
print(response.choices[0].message.content)

zen-nano

zen-nano

Specifications

Capabilities

Usage

HuggingFace

API

See Also

On this page