Models
zen3-tts
High-quality text-to-speech with natural prosody. 40+ voices, 8 languages.
zen3-tts
High-Quality Text-to-Speech
An 82M parameter text-to-speech model delivering natural prosody and expressive speech across 40+ voices and 8 languages. Ideal for voice assistants, audiobook generation, accessibility tools, and interactive voice applications.
Specifications
| Property | Value |
|---|---|
| Model ID | zen3-tts |
| Parameters | 82M |
| Architecture | TTS |
| Voices | 40+ |
| Languages | 8 |
| Output | Audio (MP3, WAV, FLAC, OPUS) |
| Tier | pro max |
| Status | Available |
| Deployment | API only |
Capabilities
- Natural prosody with human-like intonation
- 40+ built-in voice presets across styles and genders
- 8 language support with native-quality output
- Adjustable speaking rate and pitch
- Streaming audio output for real-time playback
- Voice cloning compatible architecture
API Usage
curl https://api.hanzo.ai/v1/audio/speech \
-H "Authorization: Bearer $HANZO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "zen3-tts",
"input": "Welcome to Zen AI. How can I help you today?",
"voice": "nova",
"response_format": "mp3"
}' \
--output speech.mp3from hanzoai import Hanzo
client = Hanzo(api_key="hk-your-api-key")
response = client.audio.speech.create(
model="zen3-tts",
input="Welcome to Zen AI. How can I help you today?",
voice="nova",
response_format="mp3",
)
response.stream_to_file("speech.mp3")Try It
Resources
- Audio API -- Endpoint documentation
- Technical Report
See Also
- zen3-tts-hd -- Maximum fidelity for broadcast-quality audio
- zen3-tts-fast -- Low-latency TTS for real-time agents
- zen3-asr -- Real-time streaming speech recognition