hyde-retrieval

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

HyDE (Hypothetical Document Embeddings)

HyDE (Hypothetical Document Embeddings)

Generate hypothetical answer documents to bridge vocabulary gaps in semantic search.
生成假设性答案文档,以弥合语义搜索中的词汇差距。

The Problem

问题

Direct query embedding often fails due to vocabulary mismatch:
Query: "scaling async data pipelines"
Docs use: "event-driven messaging", "Apache Kafka", "message brokers"
→ Low similarity scores despite high relevance
直接对查询进行嵌入往往会因词汇不匹配而失败:
Query: "scaling async data pipelines"
Docs use: "event-driven messaging", "Apache Kafka", "message brokers"
→ Low similarity scores despite high relevance

The Solution

解决方案

Instead of embedding the query, generate a hypothetical answer document:
Query: "scaling async data pipelines"
→ LLM generates: "To scale asynchronous data pipelines, use event-driven
   messaging with Apache Kafka. Message brokers provide backpressure..."
→ Embed the hypothetical document
→ Now matches docs using similar terminology
不对查询进行嵌入,而是生成一份假设性答案文档:
Query: "scaling async data pipelines"
→ LLM generates: "To scale asynchronous data pipelines, use event-driven
   messaging with Apache Kafka. Message brokers provide backpressure..."
→ Embed the hypothetical document
→ Now matches docs using similar terminology

Implementation

实现

python
from openai import AsyncOpenAI
from pydantic import BaseModel, Field

class HyDEResult(BaseModel):
    """Result of HyDE generation."""
    original_query: str
    hypothetical_doc: str
    embedding: list[float]

async def generate_hyde(
    query: str,
    llm: AsyncOpenAI,
    embed_fn: callable,
    max_tokens: int = 150,
) -> HyDEResult:
    """Generate hypothetical document and embed it."""

    # Generate hypothetical answer
    response = await llm.chat.completions.create(
        model="gpt-5.2-mini",  # Fast, cheap model
        messages=[
            {"role": "system", "content":
                "Write a short paragraph that would answer this query. "
                "Use technical terminology that documentation would use."},
            {"role": "user", "content": query}
        ],
        max_tokens=max_tokens,
        temperature=0.3,  # Low temp for consistency
    )

    hypothetical_doc = response.choices[0].message.content

    # Embed the hypothetical document (not the query!)
    embedding = await embed_fn(hypothetical_doc)

    return HyDEResult(
        original_query=query,
        hypothetical_doc=hypothetical_doc,
        embedding=embedding,
    )
python
from openai import AsyncOpenAI
from pydantic import BaseModel, Field

class HyDEResult(BaseModel):
    """Result of HyDE generation."""
    original_query: str
    hypothetical_doc: str
    embedding: list[float]

async def generate_hyde(
    query: str,
    llm: AsyncOpenAI,
    embed_fn: callable,
    max_tokens: int = 150,
) -> HyDEResult:
    """Generate hypothetical document and embed it."""

    # Generate hypothetical answer
    response = await llm.chat.completions.create(
        model="gpt-5.2-mini",  # Fast, cheap model
        messages=[
            {"role": "system", "content":
                "Write a short paragraph that would answer this query. "
                "Use technical terminology that documentation would use."},
            {"role": "user", "content": query}
        ],
        max_tokens=max_tokens,
        temperature=0.3,  # Low temp for consistency
    )

    hypothetical_doc = response.choices[0].message.content

    # Embed the hypothetical document (not the query!)
    embedding = await embed_fn(hypothetical_doc)

    return HyDEResult(
        original_query=query,
        hypothetical_doc=hypothetical_doc,
        embedding=embedding,
    )

With Caching

带缓存的实现

python
from functools import lru_cache
import hashlib

class HyDEService:
    def __init__(self, llm, embed_fn):
        self.llm = llm
        self.embed_fn = embed_fn
        self._cache: dict[str, HyDEResult] = {}

    def _cache_key(self, query: str) -> str:
        return hashlib.md5(query.lower().strip().encode()).hexdigest()

    async def generate(self, query: str) -> HyDEResult:
        key = self._cache_key(query)

        if key in self._cache:
            return self._cache[key]

        result = await generate_hyde(query, self.llm, self.embed_fn)
        self._cache[key] = result
        return result
python
from functools import lru_cache
import hashlib

class HyDEService:
    def __init__(self, llm, embed_fn):
        self.llm = llm
        self.embed_fn = embed_fn
        self._cache: dict[str, HyDEResult] = {}

    def _cache_key(self, query: str) -> str:
        return hashlib.md5(query.lower().strip().encode()).hexdigest()

    async def generate(self, query: str) -> HyDEResult:
        key = self._cache_key(query)

        if key in self._cache:
            return self._cache[key]

        result = await generate_hyde(query, self.llm, self.embed_fn)
        self._cache[key] = result
        return result

Per-Concept HyDE (Advanced)

多概念HyDE(进阶)

For multi-concept queries, generate HyDE for each concept:
python
async def batch_hyde(
    concepts: list[str],
    hyde_service: HyDEService,
) -> list[HyDEResult]:
    """Generate HyDE embeddings for multiple concepts in parallel."""
    import asyncio

    tasks = [hyde_service.generate(concept) for concept in concepts]
    return await asyncio.gather(*tasks)
针对包含多个概念的查询,可为每个概念生成HyDE:
python
async def batch_hyde(
    concepts: list[str],
    hyde_service: HyDEService,
) -> list[HyDEResult]:
    """Generate HyDE embeddings for multiple concepts in parallel."""
    import asyncio

    tasks = [hyde_service.generate(concept) for concept in concepts]
    return await asyncio.gather(*tasks)

Overview

适用场景概览

ScenarioUse HyDE?
Abstract/conceptual queriesYes
Exact term searchesNo (use keyword)
Code snippet searchesNo
Natural language questionsYes
Vocabulary mismatch suspectedYes
场景是否使用HyDE?
抽象/概念性查询
精确术语搜索否(使用关键词搜索)
代码片段搜索
自然语言问题
怀疑存在词汇不匹配

Fallback Strategy

降级策略

python
async def hyde_with_fallback(
    query: str,
    hyde_service: HyDEService,
    embed_fn: callable,
    timeout: float = 3.0,
) -> list[float]:
    """HyDE with fallback to direct embedding on timeout."""
    import asyncio

    try:
        async with asyncio.timeout(timeout):
            result = await hyde_service.generate(query)
            return result.embedding
    except TimeoutError:
        # Fallback to direct query embedding
        return await embed_fn(query)
python
async def hyde_with_fallback(
    query: str,
    hyde_service: HyDEService,
    embed_fn: callable,
    timeout: float = 3.0,
) -> list[float]:
    """HyDE with fallback to direct embedding on timeout."""
    import asyncio

    try:
        async with asyncio.timeout(timeout):
            result = await hyde_service.generate(query)
            return result.embedding
    except TimeoutError:
        # Fallback to direct query embedding
        return await embed_fn(query)

Performance Tips

性能优化技巧

  • Use fast model (gpt-5.2-mini, claude-haiku-4-5) for generation
  • Cache aggressively (queries often repeat)
  • Set tight timeouts (2-3s) with fallback
  • Keep hypothetical docs concise (100-200 tokens)
  • Combine with query decomposition for best results
  • 使用轻量快速的模型(如gpt-5.2-mini、claude-haiku-4-5)生成假设文档
  • 积极启用缓存(查询往往会重复出现)
  • 设置严格的超时时间(2-3秒)并搭配降级策略
  • 保持假设文档简洁(100-200个token)
  • 结合查询分解技术以获得最佳效果

Related Skills

相关技术

  • rag-retrieval
    - Core RAG patterns that HyDE enhances for better retrieval
  • embeddings
    - Embedding models used to embed hypothetical documents
  • query-decomposition
    - Complementary technique for multi-concept queries
  • semantic-caching
    - Cache HyDE results to avoid repeated LLM calls
  • rag-retrieval
    - HyDE可增强核心RAG模式,以实现更优的检索效果
  • embeddings
    - 用于嵌入假设文档的嵌入模型
  • query-decomposition
    - 针对多概念查询的补充技术
  • semantic-caching
    - 缓存HyDE结果,避免重复调用LLM

Key Decisions

关键决策

DecisionChoiceRationale
Generation modelgpt-5.2-mini / claude-haiku-4-5Fast and cheap for hypothetical doc generation
Temperature0.3Low temperature for consistent, factual hypothetical docs
Max tokens100-200Concise docs match embedding sweet spot
Timeout with fallback2-3 secondsGraceful degradation to direct query embedding
决策项选择理由
生成模型gpt-5.2-mini / claude-haiku-4-5生成假设文档时速度快、成本低
温度参数0.3低温度参数可生成一致、符合事实的假设文档
最大Token数100-200简洁的文档更契合嵌入模型的最优输入范围
超时与降级2-3秒优雅降级为直接查询嵌入,避免服务不可用

References

参考资料