Low-latency text-to-speech for real-time voice agents and interactive applications.

zen3-tts-fast

Low-Latency Voice Synthesis

An 82M parameter low-latency text-to-speech model built for real-time voice agents and interactive applications. Streams audio output with minimal first-byte latency, enabling fluid conversational AI experiences.

Specifications

Property	Value
Model ID	`zen3-tts-fast`
Parameters	82M
Architecture	TTS
Output	Audio (MP3, WAV, OPUS)
Tier	pro
Status	Available
Deployment	API only

Capabilities

Low first-byte latency for conversational responsiveness
Streaming audio output for real-time playback
Voice agent and chatbot integration
Interactive voice response (IVR) systems
Real-time narration and announcements
High-throughput TTS for cost-efficient scale

API Usage

curl https://api.hanzo.ai/v1/audio/speech \
  -H "Authorization: Bearer $HANZO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zen3-tts-fast",
    "input": "Your order has been confirmed. It will arrive in 3 to 5 business days.",
    "voice": "alloy",
    "response_format": "opus"
  }' \
  --output response.opus

from hanzoai import Hanzo

client = Hanzo(api_key="hk-your-api-key")

# Streaming for real-time playback
with client.audio.speech.with_streaming_response.create(
    model="zen3-tts-fast",
    input="Your order has been confirmed.",
    voice="alloy",
    response_format="opus",
) as response:
    response.stream_to_file("response.opus")

Try It

Open in Hanzo Chat

Resources

Audio API -- Endpoint documentation
Technical Report

zen3-tts-fast

zen3-tts-fast

Specifications

Capabilities

API Usage

Try It

Resources

See Also

On this page