🪷 Zen LM
Models

zen3-reranker

High-quality reranker for improving retrieval accuracy in RAG pipelines.

zen3-reranker

Maximum Reranking Quality

An 8B cross-encoder reranker that dramatically improves retrieval accuracy in RAG pipelines and search systems. Takes query-document pairs and produces calibrated relevance scores for precise ranking.

Specifications

PropertyValue
Model IDzen3-reranker
Parameters8B
ArchitectureReranker
Context Window40K tokens
Tierpro max
StatusAvailable
HuggingFacezenlm/zen3-reranker

Capabilities

  • Cross-encoder query-document relevance scoring
  • Long-document reranking up to 40K tokens
  • RAG pipeline retrieval quality improvement
  • Multi-stage search refinement
  • Calibrated relevance scores for threshold filtering
  • Multi-lingual reranking support

API Usage

curl https://api.hanzo.ai/v1/rerank \
  -H "Authorization: Bearer $HANZO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zen3-reranker",
    "query": "What is retrieval augmented generation?",
    "documents": [
      "RAG combines retrieval with language model generation.",
      "The weather in Tokyo is sunny today.",
      "Vector databases store embeddings for semantic search."
    ]
  }'
from hanzoai import Hanzo

client = Hanzo(api_key="hk-your-api-key")

response = client.rerank.create(
    model="zen3-reranker",
    query="What is retrieval augmented generation?",
    documents=[
        "RAG combines retrieval with language model generation.",
        "The weather in Tokyo is sunny today.",
        "Vector databases store embeddings for semantic search.",
    ],
)

for result in response.results:
    print(f"[{result.relevance_score:.3f}] {result.document.text}")

HuggingFace Usage

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained("zenlm/zen3-reranker")
model = AutoModelForSequenceClassification.from_pretrained("zenlm/zen3-reranker")

query = "What is retrieval augmented generation?"
documents = [
    "RAG combines retrieval with language model generation.",
    "The weather in Tokyo is sunny today.",
]

pairs = [[query, doc] for doc in documents]
inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors="pt")

with torch.no_grad():
    scores = model(**inputs).logits.squeeze()

ranked = sorted(zip(documents, scores.tolist()), key=lambda x: x[1], reverse=True)
for doc, score in ranked:
    print(f"[{score:.3f}] {doc}")

Try It

Open in Hanzo Chat

Resources

See Also

On this page