hyde-retrieval
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseHyDE (Hypothetical Document Embeddings)
HyDE (Hypothetical Document Embeddings)
Generate hypothetical answer documents to bridge vocabulary gaps in semantic search.
生成假设性答案文档,以弥合语义搜索中的词汇差距。
The Problem
问题
Direct query embedding often fails due to vocabulary mismatch:
Query: "scaling async data pipelines"
Docs use: "event-driven messaging", "Apache Kafka", "message brokers"
→ Low similarity scores despite high relevance直接对查询进行嵌入往往会因词汇不匹配而失败:
Query: "scaling async data pipelines"
Docs use: "event-driven messaging", "Apache Kafka", "message brokers"
→ Low similarity scores despite high relevanceThe Solution
解决方案
Instead of embedding the query, generate a hypothetical answer document:
Query: "scaling async data pipelines"
→ LLM generates: "To scale asynchronous data pipelines, use event-driven
messaging with Apache Kafka. Message brokers provide backpressure..."
→ Embed the hypothetical document
→ Now matches docs using similar terminology不对查询进行嵌入,而是生成一份假设性答案文档:
Query: "scaling async data pipelines"
→ LLM generates: "To scale asynchronous data pipelines, use event-driven
messaging with Apache Kafka. Message brokers provide backpressure..."
→ Embed the hypothetical document
→ Now matches docs using similar terminologyImplementation
实现
python
from openai import AsyncOpenAI
from pydantic import BaseModel, Field
class HyDEResult(BaseModel):
"""Result of HyDE generation."""
original_query: str
hypothetical_doc: str
embedding: list[float]
async def generate_hyde(
query: str,
llm: AsyncOpenAI,
embed_fn: callable,
max_tokens: int = 150,
) -> HyDEResult:
"""Generate hypothetical document and embed it."""
# Generate hypothetical answer
response = await llm.chat.completions.create(
model="gpt-5.2-mini", # Fast, cheap model
messages=[
{"role": "system", "content":
"Write a short paragraph that would answer this query. "
"Use technical terminology that documentation would use."},
{"role": "user", "content": query}
],
max_tokens=max_tokens,
temperature=0.3, # Low temp for consistency
)
hypothetical_doc = response.choices[0].message.content
# Embed the hypothetical document (not the query!)
embedding = await embed_fn(hypothetical_doc)
return HyDEResult(
original_query=query,
hypothetical_doc=hypothetical_doc,
embedding=embedding,
)python
from openai import AsyncOpenAI
from pydantic import BaseModel, Field
class HyDEResult(BaseModel):
"""Result of HyDE generation."""
original_query: str
hypothetical_doc: str
embedding: list[float]
async def generate_hyde(
query: str,
llm: AsyncOpenAI,
embed_fn: callable,
max_tokens: int = 150,
) -> HyDEResult:
"""Generate hypothetical document and embed it."""
# Generate hypothetical answer
response = await llm.chat.completions.create(
model="gpt-5.2-mini", # Fast, cheap model
messages=[
{"role": "system", "content":
"Write a short paragraph that would answer this query. "
"Use technical terminology that documentation would use."},
{"role": "user", "content": query}
],
max_tokens=max_tokens,
temperature=0.3, # Low temp for consistency
)
hypothetical_doc = response.choices[0].message.content
# Embed the hypothetical document (not the query!)
embedding = await embed_fn(hypothetical_doc)
return HyDEResult(
original_query=query,
hypothetical_doc=hypothetical_doc,
embedding=embedding,
)With Caching
带缓存的实现
python
from functools import lru_cache
import hashlib
class HyDEService:
def __init__(self, llm, embed_fn):
self.llm = llm
self.embed_fn = embed_fn
self._cache: dict[str, HyDEResult] = {}
def _cache_key(self, query: str) -> str:
return hashlib.md5(query.lower().strip().encode()).hexdigest()
async def generate(self, query: str) -> HyDEResult:
key = self._cache_key(query)
if key in self._cache:
return self._cache[key]
result = await generate_hyde(query, self.llm, self.embed_fn)
self._cache[key] = result
return resultpython
from functools import lru_cache
import hashlib
class HyDEService:
def __init__(self, llm, embed_fn):
self.llm = llm
self.embed_fn = embed_fn
self._cache: dict[str, HyDEResult] = {}
def _cache_key(self, query: str) -> str:
return hashlib.md5(query.lower().strip().encode()).hexdigest()
async def generate(self, query: str) -> HyDEResult:
key = self._cache_key(query)
if key in self._cache:
return self._cache[key]
result = await generate_hyde(query, self.llm, self.embed_fn)
self._cache[key] = result
return resultPer-Concept HyDE (Advanced)
多概念HyDE(进阶)
For multi-concept queries, generate HyDE for each concept:
python
async def batch_hyde(
concepts: list[str],
hyde_service: HyDEService,
) -> list[HyDEResult]:
"""Generate HyDE embeddings for multiple concepts in parallel."""
import asyncio
tasks = [hyde_service.generate(concept) for concept in concepts]
return await asyncio.gather(*tasks)针对包含多个概念的查询,可为每个概念生成HyDE:
python
async def batch_hyde(
concepts: list[str],
hyde_service: HyDEService,
) -> list[HyDEResult]:
"""Generate HyDE embeddings for multiple concepts in parallel."""
import asyncio
tasks = [hyde_service.generate(concept) for concept in concepts]
return await asyncio.gather(*tasks)Overview
适用场景概览
| Scenario | Use HyDE? |
|---|---|
| Abstract/conceptual queries | Yes |
| Exact term searches | No (use keyword) |
| Code snippet searches | No |
| Natural language questions | Yes |
| Vocabulary mismatch suspected | Yes |
| 场景 | 是否使用HyDE? |
|---|---|
| 抽象/概念性查询 | 是 |
| 精确术语搜索 | 否(使用关键词搜索) |
| 代码片段搜索 | 否 |
| 自然语言问题 | 是 |
| 怀疑存在词汇不匹配 | 是 |
Fallback Strategy
降级策略
python
async def hyde_with_fallback(
query: str,
hyde_service: HyDEService,
embed_fn: callable,
timeout: float = 3.0,
) -> list[float]:
"""HyDE with fallback to direct embedding on timeout."""
import asyncio
try:
async with asyncio.timeout(timeout):
result = await hyde_service.generate(query)
return result.embedding
except TimeoutError:
# Fallback to direct query embedding
return await embed_fn(query)python
async def hyde_with_fallback(
query: str,
hyde_service: HyDEService,
embed_fn: callable,
timeout: float = 3.0,
) -> list[float]:
"""HyDE with fallback to direct embedding on timeout."""
import asyncio
try:
async with asyncio.timeout(timeout):
result = await hyde_service.generate(query)
return result.embedding
except TimeoutError:
# Fallback to direct query embedding
return await embed_fn(query)Performance Tips
性能优化技巧
- Use fast model (gpt-5.2-mini, claude-haiku-4-5) for generation
- Cache aggressively (queries often repeat)
- Set tight timeouts (2-3s) with fallback
- Keep hypothetical docs concise (100-200 tokens)
- Combine with query decomposition for best results
- 使用轻量快速的模型(如gpt-5.2-mini、claude-haiku-4-5)生成假设文档
- 积极启用缓存(查询往往会重复出现)
- 设置严格的超时时间(2-3秒)并搭配降级策略
- 保持假设文档简洁(100-200个token)
- 结合查询分解技术以获得最佳效果
Related Skills
相关技术
- - Core RAG patterns that HyDE enhances for better retrieval
rag-retrieval - - Embedding models used to embed hypothetical documents
embeddings - - Complementary technique for multi-concept queries
query-decomposition - - Cache HyDE results to avoid repeated LLM calls
semantic-caching
- - HyDE可增强核心RAG模式,以实现更优的检索效果
rag-retrieval - - 用于嵌入假设文档的嵌入模型
embeddings - - 针对多概念查询的补充技术
query-decomposition - - 缓存HyDE结果,避免重复调用LLM
semantic-caching
Key Decisions
关键决策
| Decision | Choice | Rationale |
|---|---|---|
| Generation model | gpt-5.2-mini / claude-haiku-4-5 | Fast and cheap for hypothetical doc generation |
| Temperature | 0.3 | Low temperature for consistent, factual hypothetical docs |
| Max tokens | 100-200 | Concise docs match embedding sweet spot |
| Timeout with fallback | 2-3 seconds | Graceful degradation to direct query embedding |
| 决策项 | 选择 | 理由 |
|---|---|---|
| 生成模型 | gpt-5.2-mini / claude-haiku-4-5 | 生成假设文档时速度快、成本低 |
| 温度参数 | 0.3 | 低温度参数可生成一致、符合事实的假设文档 |
| 最大Token数 | 100-200 | 简洁的文档更契合嵌入模型的最优输入范围 |
| 超时与降级 | 2-3秒 | 优雅降级为直接查询嵌入,避免服务不可用 |