dspy-haystack-integration

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

DSPy + Haystack Integration

DSPy 与 Haystack 集成

Goal

目标

Use DSPy's optimization capabilities to automatically improve prompts in Haystack pipelines.
利用DSPy的优化能力自动改进Haystack流水线中的提示词。

When to Use

适用场景

  • You have existing Haystack pipelines
  • Manual prompt tuning is tedious
  • Need data-driven prompt optimization
  • Want to combine Haystack components with DSPy optimization
  • 你已有现成的Haystack流水线
  • 手动调整提示词繁琐耗时
  • 需要基于数据的提示词优化
  • 希望将Haystack组件与DSPy优化能力相结合

Inputs

输入项

InputTypeDescription
haystack_pipeline
Pipeline
Existing Haystack pipeline
trainset
list[dspy.Example]
Training examples
metric
callable
Evaluation function
输入项类型描述
haystack_pipeline
Pipeline
已有的Haystack流水线
trainset
list[dspy.Example]
训练示例集
metric
callable
评估函数

Outputs

输出项

OutputTypeDescription
optimized_prompt
str
DSPy-optimized prompt
optimized_pipeline
Pipeline
Updated Haystack pipeline
输出项类型描述
optimized_prompt
str
经DSPy优化后的提示词
optimized_pipeline
Pipeline
更新后的Haystack流水线

Workflow

工作流程

Phase 1: Build Initial Haystack Pipeline

阶段1:构建初始Haystack流水线

python
from haystack import Pipeline
from haystack.components.generators import OpenAIGenerator
from haystack.components.builders import PromptBuilder
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.document_stores.in_memory import InMemoryDocumentStore
python
from haystack import Pipeline
from haystack.components.generators import OpenAIGenerator
from haystack.components.builders import PromptBuilder
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.document_stores.in_memory import InMemoryDocumentStore

Setup document store

配置文档存储

doc_store = InMemoryDocumentStore() doc_store.write_documents(documents)
doc_store = InMemoryDocumentStore() doc_store.write_documents(documents)

Initial generic prompt

初始通用提示词

initial_prompt = """ Context: {{context}} Question: {{question}} Answer: """
initial_prompt = """ Context: {{context}} Question: {{question}} Answer: """

Build pipeline

构建流水线

pipeline = Pipeline() pipeline.add_component("retriever", InMemoryBM25Retriever(document_store=doc_store)) pipeline.add_component("prompt_builder", PromptBuilder(template=initial_prompt)) pipeline.add_component("generator", OpenAIGenerator(model="gpt-4o-mini"))
pipeline.connect("retriever", "prompt_builder.context") pipeline.connect("prompt_builder", "generator")
undefined
pipeline = Pipeline() pipeline.add_component("retriever", InMemoryBM25Retriever(document_store=doc_store)) pipeline.add_component("prompt_builder", PromptBuilder(template=initial_prompt)) pipeline.add_component("generator", OpenAIGenerator(model="gpt-4o-mini"))
pipeline.connect("retriever", "prompt_builder.context") pipeline.connect("prompt_builder", "generator")
undefined

Phase 2: Create DSPy RAG Module

阶段2:创建DSPy RAG模块

python
import dspy

class HaystackRAG(dspy.Module):
    """DSPy module wrapping Haystack retriever."""
    
    def __init__(self, retriever, k=3):
        super().__init__()
        self.retriever = retriever
        self.k = k
        self.generate = dspy.ChainOfThought("context, question -> answer")
    
    def forward(self, question):
        # Use Haystack retriever
        results = self.retriever.run(query=question, top_k=self.k)
        context = [doc.content for doc in results['documents']]
        
        # Use DSPy for generation
        pred = self.generate(context=context, question=question)
        return dspy.Prediction(context=context, answer=pred.answer)
python
import dspy

class HaystackRAG(dspy.Module):
    """封装Haystack检索器的DSPy模块。"""
    
    def __init__(self, retriever, k=3):
        super().__init__()
        self.retriever = retriever
        self.k = k
        self.generate = dspy.ChainOfThought("context, question -> answer")
    
    def forward(self, question):
        # 使用Haystack检索器
        results = self.retriever.run(query=question, top_k=self.k)
        context = [doc.content for doc in results['documents']]
        
        # 使用DSPy生成结果
        pred = self.generate(context=context, question=question)
        return dspy.Prediction(context=context, answer=pred.answer)

Phase 3: Define Custom Metric

阶段3:定义自定义评估指标

python
from haystack.components.evaluators import SASEvaluator
python
from haystack.components.evaluators import SASEvaluator

Haystack semantic evaluator

Haystack语义评估器

sas_evaluator = SASEvaluator(model="sentence-transformers/all-MiniLM-L6-v2")
def mixed_metric(example, pred, trace=None): """Combine semantic accuracy with conciseness."""
# Semantic similarity (Haystack SAS)
sas_result = sas_evaluator.run(
    ground_truth_answers=[example.answer],
    predicted_answers=[pred.answer]
)
semantic_score = sas_result['score']

# Conciseness penalty
word_count = len(pred.answer.split())
conciseness = 1.0 if word_count <= 20 else max(0, 1 - (word_count - 20) / 50)

return 0.7 * semantic_score + 0.3 * conciseness
undefined
sas_evaluator = SASEvaluator(model="sentence-transformers/all-MiniLM-L6-v2")
def mixed_metric(example, pred, trace=None): """结合语义准确性与简洁性。"""
# 语义相似度(Haystack SAS)
sas_result = sas_evaluator.run(
    ground_truth_answers=[example.answer],
    predicted_answers=[pred.answer]
)
semantic_score = sas_result['score']

# 简洁性惩罚项
word_count = len(pred.answer.split())
conciseness = 1.0 if word_count <= 20 else max(0, 1 - (word_count - 20) / 50)

return 0.7 * semantic_score + 0.3 * conciseness
undefined

Phase 4: Optimize with DSPy

阶段4:使用DSPy进行优化

python
from dspy.teleprompt import BootstrapFewShot

lm = dspy.LM("openai/gpt-4o-mini")
dspy.configure(lm=lm)
python
from dspy.teleprompt import BootstrapFewShot

lm = dspy.LM("openai/gpt-4o-mini")
dspy.configure(lm=lm)

Create DSPy module with Haystack retriever

创建包含Haystack检索器的DSPy模块

rag_module = HaystackRAG(retriever=pipeline.get_component("retriever"))
rag_module = HaystackRAG(retriever=pipeline.get_component("retriever"))

Optimize

优化

optimizer = BootstrapFewShot( metric=mixed_metric, max_bootstrapped_demos=4, max_labeled_demos=4 )
compiled = optimizer.compile(rag_module, trainset=trainset)
undefined
optimizer = BootstrapFewShot( metric=mixed_metric, max_bootstrapped_demos=4, max_labeled_demos=4 )
compiled = optimizer.compile(rag_module, trainset=trainset)
undefined

Phase 5: Extract and Apply Optimized Prompt

阶段5:提取并应用优化后的提示词

After optimization, extract the optimized prompt and apply it to your Haystack pipeline.
See Prompt Extraction Guide for detailed steps on:
  • Extracting prompts from compiled DSPy modules
  • Mapping DSPy demos to Haystack templates
  • Building optimized Haystack pipelines
优化完成后,提取优化后的提示词并应用到你的Haystack流水线中。 查看提示词提取指南获取以下操作的详细步骤:
  • 从已编译的DSPy模块中提取提示词
  • 将DSPy示例映射到Haystack模板
  • 构建优化后的Haystack流水线

Production Example

生产环境示例

For a complete production-ready implementation, see HaystackDSPyOptimizer.
This class provides:
  • Wrapper for Haystack retrievers in DSPy modules
  • Automatic optimization with BootstrapFewShot
  • Prompt extraction and Haystack pipeline rebuilding
  • Complete usage example with document store setup
如需完整的生产级实现示例,请查看HaystackDSPyOptimizer。 该类提供:
  • 在DSPy模块中封装Haystack检索器的功能
  • 借助BootstrapFewShot实现自动优化
  • 提示词提取与Haystack流水线重建
  • 包含文档存储配置的完整使用示例

Best Practices

最佳实践

  1. Match retrievers - Use same retriever in DSPy module as Haystack pipeline
  2. Custom metrics - Combine Haystack evaluators with DSPy optimization
  3. Prompt extraction - Carefully map DSPy demos to Haystack template format
  4. Test both - Validate DSPy module AND final Haystack pipeline
  1. 匹配检索器 - 在DSPy模块中使用与Haystack流水线相同的检索器
  2. 自定义评估指标 - 结合Haystack评估器与DSPy优化能力
  3. 提示词提取 - 仔细将DSPy示例映射到Haystack模板格式
  4. 双重测试 - 验证DSPy模块和最终的Haystack流水线

Limitations

局限性

  • Prompt template conversion can be tricky
  • Some Haystack features don't map directly to DSPy
  • Requires maintaining two codebases initially
  • Complex pipelines may need custom integration
  • 提示词模板转换可能存在难度
  • 部分Haystack功能无法直接映射到DSPy
  • 初期需要维护两套代码库
  • 复杂流水线可能需要自定义集成

Official Documentation

官方文档