🪷 Zen LM
Models

zen3-tts

High-quality text-to-speech with natural prosody. 40+ voices, 8 languages.

zen3-tts

High-Quality Text-to-Speech

An 82M parameter text-to-speech model delivering natural prosody and expressive speech across 40+ voices and 8 languages. Ideal for voice assistants, audiobook generation, accessibility tools, and interactive voice applications.

Specifications

PropertyValue
Model IDzen3-tts
Parameters82M
ArchitectureTTS
Voices40+
Languages8
OutputAudio (MP3, WAV, FLAC, OPUS)
Tierpro max
StatusAvailable
DeploymentAPI only

Capabilities

  • Natural prosody with human-like intonation
  • 40+ built-in voice presets across styles and genders
  • 8 language support with native-quality output
  • Adjustable speaking rate and pitch
  • Streaming audio output for real-time playback
  • Voice cloning compatible architecture

API Usage

curl https://api.hanzo.ai/v1/audio/speech \
  -H "Authorization: Bearer $HANZO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zen3-tts",
    "input": "Welcome to Zen AI. How can I help you today?",
    "voice": "nova",
    "response_format": "mp3"
  }' \
  --output speech.mp3
from hanzoai import Hanzo

client = Hanzo(api_key="hk-your-api-key")

response = client.audio.speech.create(
    model="zen3-tts",
    input="Welcome to Zen AI. How can I help you today?",
    voice="nova",
    response_format="mp3",
)

response.stream_to_file("speech.mp3")

Try It

Open in Hanzo Chat

Resources

See Also

  • zen3-tts-hd -- Maximum fidelity for broadcast-quality audio
  • zen3-tts-fast -- Low-latency TTS for real-time agents
  • zen3-asr -- Real-time streaming speech recognition

On this page