🪷 Zen LM
Models

zen3-asr

Real-time streaming speech recognition for live transcription and voice agents.

zen3-asr

Real-Time Streaming ASR

Real-time streaming automatic speech recognition for live transcription and voice agents. Delivers word-by-word output via WebSocket with minimal latency, enabling responsive voice-driven applications.

Specifications

PropertyValue
Model IDzen3-asr
ArchitectureStreaming ASR
Tierpro max
StatusAvailable
DeploymentAPI only (WebSocket)

Capabilities

  • Real-time word-by-word streaming transcription
  • Sub-300ms latency for voice agent responsiveness
  • Live captioning and transcription
  • Voice-driven UI interaction
  • Speaker change detection
  • Interim and final result streaming

API Usage

import asyncio
import websockets
import json

async def stream_transcription(audio_source):
    uri = "wss://api.hanzo.ai/v1/audio/stream"
    headers = {"Authorization": f"Bearer {HANZO_API_KEY}"}

    async with websockets.connect(uri, extra_headers=headers) as ws:
        await ws.send(json.dumps({
            "model": "zen3-asr",
            "language": "en",
            "interim_results": True,
        }))

        async for audio_chunk in audio_source:
            await ws.send(audio_chunk)

        await ws.send(json.dumps({"type": "end"}))

        async for message in ws:
            result = json.loads(message)
            if result.get("is_final"):
                print(f"[FINAL] {result['transcript']}")
            else:
                print(f"[INTERIM] {result['transcript']}", end="\r")

asyncio.run(stream_transcription(audio_source))

Try It

Open in Hanzo Chat

Resources

See Also

On this page