hyde-retrieval

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

HyDE (Hypothetical Document Embeddings)

Generate hypothetical answer documents to bridge vocabulary gaps in semantic search.

生成假设性答案文档，以弥合语义搜索中的词汇差距。

The Problem

问题

Direct query embedding often fails due to vocabulary mismatch:

Query: "scaling async data pipelines"
Docs use: "event-driven messaging", "Apache Kafka", "message brokers"
→ Low similarity scores despite high relevance

直接对查询进行嵌入往往会因词汇不匹配而失败：

Query: "scaling async data pipelines"
Docs use: "event-driven messaging", "Apache Kafka", "message brokers"
→ Low similarity scores despite high relevance

The Solution

解决方案

Instead of embedding the query, generate a hypothetical answer document:

Query: "scaling async data pipelines"
→ LLM generates: "To scale asynchronous data pipelines, use event-driven
   messaging with Apache Kafka. Message brokers provide backpressure..."
→ Embed the hypothetical document
→ Now matches docs using similar terminology

不对查询进行嵌入，而是生成一份假设性答案文档：

Query: "scaling async data pipelines"
→ LLM generates: "To scale asynchronous data pipelines, use event-driven
   messaging with Apache Kafka. Message brokers provide backpressure..."
→ Embed the hypothetical document
→ Now matches docs using similar terminology

Implementation

实现

python

from openai import AsyncOpenAI
from pydantic import BaseModel, Field

class HyDEResult(BaseModel):
    """Result of HyDE generation."""
    original_query: str
    hypothetical_doc: str
    embedding: list[float]

async def generate_hyde(
    query: str,
    llm: AsyncOpenAI,
    embed_fn: callable,
    max_tokens: int = 150,
) -> HyDEResult:
    """Generate hypothetical document and embed it."""

    # Generate hypothetical answer
    response = await llm.chat.completions.create(
        model="gpt-5.2-mini",  # Fast, cheap model
        messages=[
            {"role": "system", "content":
                "Write a short paragraph that would answer this query. "
                "Use technical terminology that documentation would use."},
            {"role": "user", "content": query}
        ],
        max_tokens=max_tokens,
        temperature=0.3,  # Low temp for consistency
    )

    hypothetical_doc = response.choices[0].message.content

    # Embed the hypothetical document (not the query!)
    embedding = await embed_fn(hypothetical_doc)

    return HyDEResult(
        original_query=query,
        hypothetical_doc=hypothetical_doc,
        embedding=embedding,
    )

python

from openai import AsyncOpenAI
from pydantic import BaseModel, Field

class HyDEResult(BaseModel):
    """Result of HyDE generation."""
    original_query: str
    hypothetical_doc: str
    embedding: list[float]

async def generate_hyde(
    query: str,
    llm: AsyncOpenAI,
    embed_fn: callable,
    max_tokens: int = 150,
) -> HyDEResult:
    """Generate hypothetical document and embed it."""

    # Generate hypothetical answer
    response = await llm.chat.completions.create(
        model="gpt-5.2-mini",  # Fast, cheap model
        messages=[
            {"role": "system", "content":
                "Write a short paragraph that would answer this query. "
                "Use technical terminology that documentation would use."},
            {"role": "user", "content": query}
        ],
        max_tokens=max_tokens,
        temperature=0.3,  # Low temp for consistency
    )

    hypothetical_doc = response.choices[0].message.content

    # Embed the hypothetical document (not the query!)
    embedding = await embed_fn(hypothetical_doc)

    return HyDEResult(
        original_query=query,
        hypothetical_doc=hypothetical_doc,
        embedding=embedding,
    )

With Caching

带缓存的实现

python

from functools import lru_cache
import hashlib

class HyDEService:
    def __init__(self, llm, embed_fn):
        self.llm = llm
        self.embed_fn = embed_fn
        self._cache: dict[str, HyDEResult] = {}

    def _cache_key(self, query: str) -> str:
        return hashlib.md5(query.lower().strip().encode()).hexdigest()

    async def generate(self, query: str) -> HyDEResult:
        key = self._cache_key(query)

        if key in self._cache:
            return self._cache[key]

        result = await generate_hyde(query, self.llm, self.embed_fn)
        self._cache[key] = result
        return result

python

from functools import lru_cache
import hashlib

class HyDEService:
    def __init__(self, llm, embed_fn):
        self.llm = llm
        self.embed_fn = embed_fn
        self._cache: dict[str, HyDEResult] = {}

    def _cache_key(self, query: str) -> str:
        return hashlib.md5(query.lower().strip().encode()).hexdigest()

    async def generate(self, query: str) -> HyDEResult:
        key = self._cache_key(query)

        if key in self._cache:
            return self._cache[key]

        result = await generate_hyde(query, self.llm, self.embed_fn)
        self._cache[key] = result
        return result

Per-Concept HyDE (Advanced)

多概念HyDE（进阶）

For multi-concept queries, generate HyDE for each concept:

python

async def batch_hyde(
    concepts: list[str],
    hyde_service: HyDEService,
) -> list[HyDEResult]:
    """Generate HyDE embeddings for multiple concepts in parallel."""
    import asyncio

    tasks = [hyde_service.generate(concept) for concept in concepts]
    return await asyncio.gather(*tasks)

针对包含多个概念的查询，可为每个概念生成HyDE：

python

async def batch_hyde(
    concepts: list[str],
    hyde_service: HyDEService,
) -> list[HyDEResult]:
    """Generate HyDE embeddings for multiple concepts in parallel."""
    import asyncio

    tasks = [hyde_service.generate(concept) for concept in concepts]
    return await asyncio.gather(*tasks)

Overview

适用场景概览

Scenario	Use HyDE?
Abstract/conceptual queries	Yes
Exact term searches	No (use keyword)
Code snippet searches	No
Natural language questions	Yes
Vocabulary mismatch suspected	Yes

场景	是否使用HyDE？
抽象/概念性查询	是
精确术语搜索	否（使用关键词搜索）
代码片段搜索	否
自然语言问题	是
怀疑存在词汇不匹配	是

Fallback Strategy

降级策略

python

async def hyde_with_fallback(
    query: str,
    hyde_service: HyDEService,
    embed_fn: callable,
    timeout: float = 3.0,
) -> list[float]:
    """HyDE with fallback to direct embedding on timeout."""
    import asyncio

    try:
        async with asyncio.timeout(timeout):
            result = await hyde_service.generate(query)
            return result.embedding
    except TimeoutError:
        # Fallback to direct query embedding
        return await embed_fn(query)

python

async def hyde_with_fallback(
    query: str,
    hyde_service: HyDEService,
    embed_fn: callable,
    timeout: float = 3.0,
) -> list[float]:
    """HyDE with fallback to direct embedding on timeout."""
    import asyncio

    try:
        async with asyncio.timeout(timeout):
            result = await hyde_service.generate(query)
            return result.embedding
    except TimeoutError:
        # Fallback to direct query embedding
        return await embed_fn(query)

Performance Tips

性能优化技巧

Use fast model (gpt-5.2-mini, claude-haiku-4-5) for generation
Cache aggressively (queries often repeat)
Set tight timeouts (2-3s) with fallback
Keep hypothetical docs concise (100-200 tokens)
Combine with query decomposition for best results

使用轻量快速的模型（如gpt-5.2-mini、claude-haiku-4-5）生成假设文档
积极启用缓存（查询往往会重复出现）
设置严格的超时时间（2-3秒）并搭配降级策略
保持假设文档简洁（100-200个token）
结合查询分解技术以获得最佳效果

Related Skills

Key Decisions

关键决策

Decision	Choice	Rationale
Generation model	gpt-5.2-mini / claude-haiku-4-5	Fast and cheap for hypothetical doc generation
Temperature	0.3	Low temperature for consistent, factual hypothetical docs
Max tokens	100-200	Concise docs match embedding sweet spot
Timeout with fallback	2-3 seconds	Graceful degradation to direct query embedding

决策项	选择	理由
生成模型	gpt-5.2-mini / claude-haiku-4-5	生成假设文档时速度快、成本低
温度参数	0.3	低温度参数可生成一致、符合事实的假设文档
最大Token数	100-200	简洁的文档更契合嵌入模型的最优输入范围
超时与降级	2-3秒	优雅降级为直接查询嵌入，避免服务不可用

hyde-retrieval

Original

Translation

HyDE (Hypothetical Document Embeddings)

HyDE (Hypothetical Document Embeddings)

The Problem

问题

The Solution

解决方案

Implementation

实现

With Caching

带缓存的实现

Per-Concept HyDE (Advanced)

多概念HyDE（进阶）

Overview

适用场景概览

Fallback Strategy

降级策略

Performance Tips

性能优化技巧

Related Skills

相关技术

Key Decisions

关键决策

References

参考资料