dspy-haystack-integration

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

DSPy + Haystack Integration

DSPy 与 Haystack 集成

Goal

目标

Use DSPy's optimization capabilities to automatically improve prompts in Haystack pipelines.

利用DSPy的优化能力自动改进Haystack流水线中的提示词。

When to Use

适用场景

You have existing Haystack pipelines
Manual prompt tuning is tedious
Need data-driven prompt optimization
Want to combine Haystack components with DSPy optimization

你已有现成的Haystack流水线
手动调整提示词繁琐耗时
需要基于数据的提示词优化
希望将Haystack组件与DSPy优化能力相结合

Inputs

输入项

Input	Type	Description
`haystack_pipeline`	`Pipeline`	Existing Haystack pipeline
`trainset`	`list[dspy.Example]`	Training examples
`metric`	`callable`	Evaluation function

输入项	类型	描述
`haystack_pipeline`	`Pipeline`	已有的Haystack流水线
`trainset`	`list[dspy.Example]`	训练示例集
`metric`	`callable`	评估函数

Outputs

输出项

Output	Type	Description
`optimized_prompt`	`str`	DSPy-optimized prompt
`optimized_pipeline`	`Pipeline`	Updated Haystack pipeline

输出项	类型	描述
`optimized_prompt`	`str`	经DSPy优化后的提示词
`optimized_pipeline`	`Pipeline`	更新后的Haystack流水线

Workflow

工作流程

Phase 1: Build Initial Haystack Pipeline

阶段1：构建初始Haystack流水线

python

from haystack import Pipeline
from haystack.components.generators import OpenAIGenerator
from haystack.components.builders import PromptBuilder
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.document_stores.in_memory import InMemoryDocumentStore

python

from haystack import Pipeline
from haystack.components.generators import OpenAIGenerator
from haystack.components.builders import PromptBuilder
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.document_stores.in_memory import InMemoryDocumentStore

Setup document store

配置文档存储

doc_store = InMemoryDocumentStore() doc_store.write_documents(documents)

Initial generic prompt

初始通用提示词

initial_prompt = """ Context: {{context}} Question: {{question}} Answer: """

Build pipeline

构建流水线

pipeline = Pipeline() pipeline.add_component("retriever", InMemoryBM25Retriever(document_store=doc_store)) pipeline.add_component("prompt_builder", PromptBuilder(template=initial_prompt)) pipeline.add_component("generator", OpenAIGenerator(model="gpt-4o-mini"))

pipeline.connect("retriever", "prompt_builder.context") pipeline.connect("prompt_builder", "generator")

undefined

pipeline.connect("retriever", "prompt_builder.context") pipeline.connect("prompt_builder", "generator")

undefined

Phase 2: Create DSPy RAG Module

阶段2：创建DSPy RAG模块

python

import dspy

class HaystackRAG(dspy.Module):
    """DSPy module wrapping Haystack retriever."""
    
    def __init__(self, retriever, k=3):
        super().__init__()
        self.retriever = retriever
        self.k = k
        self.generate = dspy.ChainOfThought("context, question -> answer")
    
    def forward(self, question):
        # Use Haystack retriever
        results = self.retriever.run(query=question, top_k=self.k)
        context = [doc.content for doc in results['documents']]
        
        # Use DSPy for generation
        pred = self.generate(context=context, question=question)
        return dspy.Prediction(context=context, answer=pred.answer)

python

import dspy

class HaystackRAG(dspy.Module):
    """封装Haystack检索器的DSPy模块。"""
    
    def __init__(self, retriever, k=3):
        super().__init__()
        self.retriever = retriever
        self.k = k
        self.generate = dspy.ChainOfThought("context, question -> answer")
    
    def forward(self, question):
        # 使用Haystack检索器
        results = self.retriever.run(query=question, top_k=self.k)
        context = [doc.content for doc in results['documents']]
        
        # 使用DSPy生成结果
        pred = self.generate(context=context, question=question)
        return dspy.Prediction(context=context, answer=pred.answer)

Phase 3: Define Custom Metric

阶段3：定义自定义评估指标

python

from haystack.components.evaluators import SASEvaluator

python

from haystack.components.evaluators import SASEvaluator

Haystack semantic evaluator

Haystack语义评估器

sas_evaluator = SASEvaluator(model="sentence-transformers/all-MiniLM-L6-v2")

def mixed_metric(example, pred, trace=None): """Combine semantic accuracy with conciseness."""

# Semantic similarity (Haystack SAS)
sas_result = sas_evaluator.run(
    ground_truth_answers=[example.answer],
    predicted_answers=[pred.answer]
)
semantic_score = sas_result['score']

# Conciseness penalty
word_count = len(pred.answer.split())
conciseness = 1.0 if word_count <= 20 else max(0, 1 - (word_count - 20) / 50)

return 0.7 * semantic_score + 0.3 * conciseness

undefined

sas_evaluator = SASEvaluator(model="sentence-transformers/all-MiniLM-L6-v2")

def mixed_metric(example, pred, trace=None): """结合语义准确性与简洁性。"""

# 语义相似度（Haystack SAS）
sas_result = sas_evaluator.run(
    ground_truth_answers=[example.answer],
    predicted_answers=[pred.answer]
)
semantic_score = sas_result['score']

# 简洁性惩罚项
word_count = len(pred.answer.split())
conciseness = 1.0 if word_count <= 20 else max(0, 1 - (word_count - 20) / 50)

return 0.7 * semantic_score + 0.3 * conciseness

undefined

Phase 4: Optimize with DSPy

阶段4：使用DSPy进行优化

python

from dspy.teleprompt import BootstrapFewShot

lm = dspy.LM("openai/gpt-4o-mini")
dspy.configure(lm=lm)

python

from dspy.teleprompt import BootstrapFewShot

lm = dspy.LM("openai/gpt-4o-mini")
dspy.configure(lm=lm)

Create DSPy module with Haystack retriever

创建包含Haystack检索器的DSPy模块

rag_module = HaystackRAG(retriever=pipeline.get_component("retriever"))

Optimize

优化

optimizer = BootstrapFewShot( metric=mixed_metric, max_bootstrapped_demos=4, max_labeled_demos=4 )

compiled = optimizer.compile(rag_module, trainset=trainset)

undefined

optimizer = BootstrapFewShot( metric=mixed_metric, max_bootstrapped_demos=4, max_labeled_demos=4 )

compiled = optimizer.compile(rag_module, trainset=trainset)

undefined

Phase 5: Extract and Apply Optimized Prompt

阶段5：提取并应用优化后的提示词

After optimization, extract the optimized prompt and apply it to your Haystack pipeline.

See Prompt Extraction Guide for detailed steps on:

Extracting prompts from compiled DSPy modules
Mapping DSPy demos to Haystack templates
Building optimized Haystack pipelines

优化完成后，提取优化后的提示词并应用到你的Haystack流水线中。查看提示词提取指南获取以下操作的详细步骤：

从已编译的DSPy模块中提取提示词
将DSPy示例映射到Haystack模板
构建优化后的Haystack流水线

Production Example

生产环境示例

For a complete production-ready implementation, see HaystackDSPyOptimizer.

This class provides:

Wrapper for Haystack retrievers in DSPy modules
Automatic optimization with BootstrapFewShot
Prompt extraction and Haystack pipeline rebuilding
Complete usage example with document store setup

如需完整的生产级实现示例，请查看HaystackDSPyOptimizer。该类提供：

在DSPy模块中封装Haystack检索器的功能
借助BootstrapFewShot实现自动优化
提示词提取与Haystack流水线重建
包含文档存储配置的完整使用示例

Best Practices

最佳实践

Match retrievers - Use same retriever in DSPy module as Haystack pipeline
Custom metrics - Combine Haystack evaluators with DSPy optimization
Prompt extraction - Carefully map DSPy demos to Haystack template format
Test both - Validate DSPy module AND final Haystack pipeline

匹配检索器 - 在DSPy模块中使用与Haystack流水线相同的检索器
自定义评估指标 - 结合Haystack评估器与DSPy优化能力
提示词提取 - 仔细将DSPy示例映射到Haystack模板格式
双重测试 - 验证DSPy模块和最终的Haystack流水线

Limitations

局限性

Prompt template conversion can be tricky
Some Haystack features don't map directly to DSPy
Requires maintaining two codebases initially
Complex pipelines may need custom integration

提示词模板转换可能存在难度
部分Haystack功能无法直接映射到DSPy
初期需要维护两套代码库
复杂流水线可能需要自定义集成

Official Documentation

官方文档

DSPy Documentation: https://dspy.ai/
DSPy GitHub: https://github.com/stanfordnlp/dspy
Haystack Documentation: https://docs.haystack.deepset.ai/

DSPy 文档：https://dspy.ai/
DSPy GitHub：https://github.com/stanfordnlp/dspy
Haystack 文档：https://docs.haystack.deepset.ai/