zen3-audio

Best Quality Transcription

A 1.5B parameter speech-to-text model delivering the highest transcription accuracy in the Zen family. Supports 100+ languages with strong performance across accents, noisy environments, and domain-specific terminology.

Specifications

Property	Value
Model ID	`zen3-audio`
Parameters	1.5B
Architecture	ASR
Languages	100+
Input	Audio (WAV, MP3, FLAC, M4A, OGG)
Tier	pro max
Status	Available
Deployment	API only

Capabilities

Highest accuracy transcription across 100+ languages
Speaker diarization (who spoke when)
Word and segment-level timestamps
Punctuation and capitalization restoration
Noise-robust transcription in challenging environments
Domain-specific vocabulary support (medical, legal, technical)

API Usage

curl https://api.hanzo.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer $HANZO_API_KEY" \
  -F model=zen3-audio \
  -F file=@meeting.mp3 \
  -F language=en \
  -F response_format=verbose_json

from hanzoai import Hanzo

client = Hanzo(api_key="hk-your-api-key")

with open("meeting.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="zen3-audio",
        file=audio_file,
        language="en",
        response_format="verbose_json",
        timestamp_granularities=["word", "segment"],
    )

for segment in response.segments:
    print(f"[{segment.start:.1f}s - {segment.end:.1f}s] {segment.text}")

Try It

Open in Hanzo Chat

Resources

Audio API -- Endpoint documentation
Technical Report

zen3-audio

zen3-audio

Specifications

Capabilities

API Usage

Try It

Resources

See Also

On this page