🪷 Zen LM
Models

zen3-audio

Best quality speech-to-text transcription. 100+ languages.

zen3-audio

Best Quality Transcription

A 1.5B parameter speech-to-text model delivering the highest transcription accuracy in the Zen family. Supports 100+ languages with strong performance across accents, noisy environments, and domain-specific terminology.

Specifications

PropertyValue
Model IDzen3-audio
Parameters1.5B
ArchitectureASR
Languages100+
InputAudio (WAV, MP3, FLAC, M4A, OGG)
Tierpro max
StatusAvailable
DeploymentAPI only

Capabilities

  • Highest accuracy transcription across 100+ languages
  • Speaker diarization (who spoke when)
  • Word and segment-level timestamps
  • Punctuation and capitalization restoration
  • Noise-robust transcription in challenging environments
  • Domain-specific vocabulary support (medical, legal, technical)

API Usage

curl https://api.hanzo.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer $HANZO_API_KEY" \
  -F model=zen3-audio \
  -F file=@meeting.mp3 \
  -F language=en \
  -F response_format=verbose_json
from hanzoai import Hanzo

client = Hanzo(api_key="hk-your-api-key")

with open("meeting.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        model="zen3-audio",
        file=audio_file,
        language="en",
        response_format="verbose_json",
        timestamp_granularities=["word", "segment"],
    )

for segment in response.segments:
    print(f"[{segment.start:.1f}s - {segment.end:.1f}s] {segment.text}")

Try It

Open in Hanzo Chat

Resources

See Also

On this page