ai-stopping-hallucinations
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseStop Your AI From Making Things Up
阻止你的AI生成虚假内容
Guide the user through making their AI factually grounded. The core principle: never trust a bare LM output — always verify against sources.
引导用户让其AI实现事实锚定。核心原则:永远不要信任未经处理的LM输出——始终对照原始材料进行验证。
Why AI hallucinates
AI为何会产生幻觉
LMs generate plausible-sounding text, not verified facts. Hallucination happens when:
- The model has no source material to ground its answer
- The prompt doesn't enforce citations or evidence
- There's no verification step after generation
- Temperature is too high for factual tasks
The fix isn't better prompting — it's programmatic constraints that force grounding.
语言模型(LM)生成的是听起来合理的文本,而非经过验证的事实。出现幻觉的场景包括:
- 模型没有可锚定答案的原始材料
- 提示词未强制要求引用或提供证据
- 生成后没有验证步骤
- 针对事实类任务设置的Temperature参数过高
解决办法并非优化提示词——而是通过程序化约束强制实现事实锚定。
Step 1: Understand the grounding situation
步骤1:明确事实锚定场景
Ask the user:
- Do you have source documents? (knowledge base, docs, database) → use retrieval-grounded answers
- Is it general knowledge? (no docs, just the model's knowledge) → use self-consistency checks
- How bad is a hallucination? (annoying vs. dangerous) → determines how strict the checks should be
询问用户:
- 你是否有原始文档?(知识库、文档、数据库)→ 使用基于检索的锚定答案
- 是否是通用知识?(无文档,仅依赖模型自身知识)→ 使用自一致性检查
- 幻觉的影响有多严重?(只是烦人还是有危险)→ 决定检查的严格程度
Step 2: Citation enforcement
步骤2:强制引用来源
Force the AI to cite sources for every claim. Uses to reject answers without citations.
dspy.Assertpython
import dspy
import re
class CitedAnswer(dspy.Signature):
"""Answer the question using the provided sources. Cite every claim with [1], [2], etc."""
context: list[str] = dspy.InputField(desc="Numbered source documents")
question: str = dspy.InputField()
answer: str = dspy.OutputField(desc="Answer with inline citations like [1], [2]")
class CitationEnforcer(dspy.Module):
def __init__(self):
self.answer = dspy.ChainOfThought(CitedAnswer)
def forward(self, context, question):
result = self.answer(context=context, question=question)
# Every 1-2 sentences must have a citation
sentences = [s.strip() for s in result.answer.split(".") if s.strip()]
citations_found = [bool(re.search(r"\[\d+\]", s)) for s in sentences]
# Check that at least half the sentences have citations
citation_ratio = sum(citations_found) / max(len(sentences), 1)
dspy.Assert(
citation_ratio >= 0.5,
"Answer must cite sources. Use [1], [2], etc. after claims. "
f"Only {citation_ratio:.0%} of sentences have citations."
)
# Check that cited numbers actually exist in the context
cited_nums = set(int(n) for n in re.findall(r"\[(\d+)\]", result.answer))
valid_nums = set(range(1, len(context) + 1))
invalid = cited_nums - valid_nums
dspy.Assert(
len(invalid) == 0,
f"Citations {invalid} don't match any source. Valid sources: [1] to [{len(context)}]."
)
return result强制AI为每个主张引用来源。使用拒绝未引用来源的答案。
dspy.Assertpython
import dspy
import re
class CitedAnswer(dspy.Signature):
"""Answer the question using the provided sources. Cite every claim with [1], [2], etc."""
context: list[str] = dspy.InputField(desc="Numbered source documents")
question: str = dspy.InputField()
answer: str = dspy.OutputField(desc="Answer with inline citations like [1], [2]")
class CitationEnforcer(dspy.Module):
def __init__(self):
self.answer = dspy.ChainOfThought(CitedAnswer)
def forward(self, context, question):
result = self.answer(context=context, question=question)
# Every 1-2 sentences must have a citation
sentences = [s.strip() for s in result.answer.split(".") if s.strip()]
citations_found = [bool(re.search(r"\[\d+\]", s)) for s in sentences]
# Check that at least half the sentences have citations
citation_ratio = sum(citations_found) / max(len(sentences), 1)
dspy.Assert(
citation_ratio >= 0.5,
"Answer must cite sources. Use [1], [2], etc. after claims. "
f"Only {citation_ratio:.0%} of sentences have citations."
)
# Check that cited numbers actually exist in the context
cited_nums = set(int(n) for n in re.findall(r"\[(\d+)\]", result.answer))
valid_nums = set(range(1, len(context) + 1))
invalid = cited_nums - valid_nums
dspy.Assert(
len(invalid) == 0,
f"Citations {invalid} don't match any source. Valid sources: [1] to [{len(context)}]."
)
return resultStep 3: Faithfulness verification
步骤3:忠实性验证
After generating an answer, use a second LM call to check if it's actually supported by the sources.
python
class CheckFaithfulness(dspy.Signature):
"""Check if every claim in the answer is supported by the context."""
context: list[str] = dspy.InputField(desc="Source documents")
answer: str = dspy.InputField(desc="Generated answer to verify")
is_faithful: bool = dspy.OutputField(desc="Is every claim supported by the context?")
unsupported_claims: list[str] = dspy.OutputField(desc="Claims not found in context")
class FaithfulResponder(dspy.Module):
def __init__(self):
self.retrieve = dspy.Retrieve(k=5)
self.answer = dspy.ChainOfThought(CitedAnswer)
self.verify = dspy.Predict(CheckFaithfulness)
def forward(self, question):
context = self.retrieve(question).passages
result = self.answer(context=context, question=question)
check = self.verify(context=context, answer=result.answer)
dspy.Assert(
check.is_faithful,
f"Answer contains unsupported claims: {check.unsupported_claims}. "
"Rewrite using only information from the provided sources."
)
return resultWhen fails, DSPy automatically retries the LM call, feeding back the error message so the model can self-correct. This retry loop (called backtracking) runs up to times (default: 2).
dspy.Assertmax_backtrack_attempts生成答案后,调用第二个LM检查答案是否确实得到原始材料的支持。
python
class CheckFaithfulness(dspy.Signature):
"""Check if every claim in the answer is supported by the context."""
context: list[str] = dspy.InputField(desc="Source documents")
answer: str = dspy.InputField(desc="Generated answer to verify")
is_faithful: bool = dspy.OutputField(desc="Is every claim supported by the context?")
unsupported_claims: list[str] = dspy.OutputField(desc="Claims not found in context")
class FaithfulResponder(dspy.Module):
def __init__(self):
self.retrieve = dspy.Retrieve(k=5)
self.answer = dspy.ChainOfThought(CitedAnswer)
self.verify = dspy.Predict(CheckFaithfulness)
def forward(self, question):
context = self.retrieve(question).passages
result = self.answer(context=context, question=question)
check = self.verify(context=context, answer=result.answer)
dspy.Assert(
check.is_faithful,
f"Answer contains unsupported claims: {check.unsupported_claims}. "
"Rewrite using only information from the provided sources."
)
return result当失败时,DSPy会自动重试LM调用,并将错误信息作为反馈提供给模型,使其能够自我修正。这个重试循环(称为回溯)最多运行次(默认值:2)。
dspy.Assertmax_backtrack_attemptsStep 4: Self-check pattern
步骤4:自我检查模式
Generate an answer, then ask the model to verify its own claims against the sources. Lightweight and good for most cases.
python
class SelfCheckedAnswer(dspy.Module):
def __init__(self):
self.answer = dspy.ChainOfThought("context, question -> answer")
self.check = dspy.ChainOfThought(CheckFaithfulness)
def forward(self, context, question):
result = self.answer(context=context, question=question)
verification = self.check(context=context, answer=result.answer)
dspy.Suggest(
verification.is_faithful,
f"Some claims may not be supported: {verification.unsupported_claims}. "
"Consider revising to stick closer to the sources."
)
return dspy.Prediction(
answer=result.answer,
is_verified=verification.is_faithful,
unsupported=verification.unsupported_claims,
)Use (soft) instead of (hard) when you want to flag issues without blocking the response.
dspy.Suggestdspy.Assert生成答案后,让模型对照原始材料验证自己的主张。这种方式轻量,适用于大多数场景。
python
class SelfCheckedAnswer(dspy.Module):
def __init__(self):
self.answer = dspy.ChainOfThought("context, question -> answer")
self.check = dspy.ChainOfThought(CheckFaithfulness)
def forward(self, context, question):
result = self.answer(context=context, question=question)
verification = self.check(context=context, answer=result.answer)
dspy.Suggest(
verification.is_faithful,
f"Some claims may not be supported: {verification.unsupported_claims}. "
"Consider revising to stick closer to the sources."
)
return dspy.Prediction(
answer=result.answer,
is_verified=verification.is_faithful,
unsupported=verification.unsupported_claims,
)当你想要标记问题但不阻止响应时,使用(软约束)而非(硬约束)。
dspy.Suggestdspy.AssertStep 5: Cross-check pattern
步骤5:交叉检查模式
Generate the answer twice independently, then compare. If two independent generations disagree, something is probably made up.
python
class CrossChecked(dspy.Module):
def __init__(self):
self.gen_a = dspy.ChainOfThought("context, question -> answer")
self.gen_b = dspy.ChainOfThought("context, question -> answer")
self.compare = dspy.Predict(CompareAnswers)
def forward(self, context, question):
a = self.gen_a(context=context, question=question)
b = self.gen_b(context=context, question=question)
check = self.compare(answer_a=a.answer, answer_b=b.answer)
dspy.Assert(
check.agree,
f"Two independent answers disagree: {check.discrepancy}. "
"This suggests hallucination. Regenerate with closer attention to sources."
)
return a
class CompareAnswers(dspy.Signature):
"""Check if two independently generated answers agree on the facts."""
answer_a: str = dspy.InputField()
answer_b: str = dspy.InputField()
agree: bool = dspy.OutputField(desc="Do they agree on all factual claims?")
discrepancy: str = dspy.OutputField(desc="What they disagree on, if anything")Best for high-stakes outputs where the cost of hallucination is high. Doubles your LM calls but catches inconsistencies.
独立生成两次答案,然后进行比较。如果两次独立生成的答案不一致,很可能存在虚假内容。
python
class CrossChecked(dspy.Module):
def __init__(self):
self.gen_a = dspy.ChainOfThought("context, question -> answer")
self.gen_b = dspy.ChainOfThought("context, question -> answer")
self.compare = dspy.Predict(CompareAnswers)
def forward(self, context, question):
a = self.gen_a(context=context, question=question)
b = self.gen_b(context=context, question=question)
check = self.compare(answer_a=a.answer, answer_b=b.answer)
dspy.Assert(
check.agree,
f"Two independent answers disagree: {check.discrepancy}. "
"This suggests hallucination. Regenerate with closer attention to sources."
)
return a
class CompareAnswers(dspy.Signature):
"""Check if two independently generated answers agree on the facts."""
answer_a: str = dspy.InputField()
answer_b: str = dspy.InputField()
agree: bool = dspy.OutputField(desc="Do they agree on all factual claims?")
discrepancy: str = dspy.OutputField(desc="What they disagree on, if anything")适用于幻觉成本极高的高风险输出场景。虽然会使LM调用次数翻倍,但能发现不一致问题。
Step 6: Grounding via retrieval
步骤6:基于检索的事实锚定
The single most effective anti-hallucination measure: give the AI source material and constrain it to that material. Connect to for the full RAG setup.
/ai-searching-docspython
class GroundedQA(dspy.Module):
def __init__(self):
self.retrieve = dspy.Retrieve(k=5)
self.answer = dspy.ChainOfThought(CitedAnswer)
self.verify = dspy.Predict(CheckFaithfulness)
def forward(self, question):
# Ground in retrieved sources
context = self.retrieve(question).passages
# Generate with citation requirement
result = self.answer(context=context, question=question)
# Verify faithfulness
check = self.verify(context=context, answer=result.answer)
dspy.Assert(
check.is_faithful,
f"Unsupported claims: {check.unsupported_claims}. "
"Only use information from the provided sources."
)
return result最有效的反幻觉措施:为AI提供原始材料,并将其约束在该材料范围内。如需完整RAG设置,请参考。
/ai-searching-docspython
class GroundedQA(dspy.Module):
def __init__(self):
self.retrieve = dspy.Retrieve(k=5)
self.answer = dspy.ChainOfThought(CitedAnswer)
self.verify = dspy.Predict(CheckFaithfulness)
def forward(self, question):
# Ground in retrieved sources
context = self.retrieve(question).passages
# Generate with citation requirement
result = self.answer(context=context, question=question)
# Verify faithfulness
check = self.verify(context=context, answer=result.answer)
dspy.Assert(
check.is_faithful,
f"Unsupported claims: {check.unsupported_claims}. "
"Only use information from the provided sources."
)
return resultStep 7: Confidence thresholds
步骤7:置信度阈值设置
Flag low-confidence outputs for human review instead of showing them to users.
python
class ConfidenceGated(dspy.Signature):
"""Answer the question and rate your confidence."""
context: list[str] = dspy.InputField()
question: str = dspy.InputField()
answer: str = dspy.OutputField()
confidence: float = dspy.OutputField(desc="0.0 to 1.0, how confident are you?")
reasoning: str = dspy.OutputField(desc="Why this confidence level?")
class GatedResponder(dspy.Module):
def __init__(self, threshold=0.7):
self.respond = dspy.ChainOfThought(ConfidenceGated)
self.threshold = threshold
def forward(self, context, question):
result = self.respond(context=context, question=question)
if result.confidence < self.threshold:
return dspy.Prediction(
answer=result.answer,
needs_review=True,
confidence=result.confidence,
reason=result.reasoning,
)
return dspy.Prediction(
answer=result.answer,
needs_review=False,
confidence=result.confidence,
)标记低置信度输出,交由人工审核,而非直接展示给用户。
python
class ConfidenceGated(dspy.Signature):
"""Answer the question and rate your confidence."""
context: list[str] = dspy.InputField()
question: str = dspy.InputField()
answer: str = dspy.OutputField()
confidence: float = dspy.OutputField(desc="0.0 to 1.0, how confident are you?")
reasoning: str = dspy.OutputField(desc="Why this confidence level?")
class GatedResponder(dspy.Module):
def __init__(self, threshold=0.7):
self.respond = dspy.ChainOfThought(ConfidenceGated)
self.threshold = threshold
def forward(self, context, question):
result = self.respond(context=context, question=question)
if result.confidence < self.threshold:
return dspy.Prediction(
answer=result.answer,
needs_review=True,
confidence=result.confidence,
reason=result.reasoning,
)
return dspy.Prediction(
answer=result.answer,
needs_review=False,
confidence=result.confidence,
)How backtracking works
回溯机制的工作原理
When fails:
dspy.Assert- DSPy catches the assertion failure
- The error message is fed back to the LM as additional context
- The LM retries generation with the feedback ("your answer had unsupported claims X, Y")
- This repeats up to times
max_backtrack_attempts - If all retries fail, the assertion raises an error
This is why good error messages matter — they're literally the feedback the model uses to improve.
当失败时:
dspy.Assert- DSPy捕获断言失败
- 将错误信息作为额外上下文反馈给LM
- LM结合反馈重试生成(例如“你的答案存在无依据主张X、Y”)
- 此过程重复最多次
max_backtrack_attempts - 如果所有重试都失败,断言将触发错误
这就是清晰的错误信息至关重要的原因——它们是模型用于自我修正的提示词。
Choosing the right pattern
选择合适的模式
| Pattern | Cost | Latency | Best for |
|---|---|---|---|
| Citation enforcement | 1 LM call | Low | When you have numbered sources |
| Faithfulness verification | 2 LM calls | Medium | RAG systems, doc Q&A |
| Self-check | 2 LM calls | Medium | General fact-checking |
| Cross-check | 3 LM calls | High | High-stakes, critical outputs |
| Confidence gating | 1 LM call | Low | Human-in-the-loop systems |
| Retrieval grounding | 1 retrieval + 1-2 LM | Medium | When you have a knowledge base |
| 模式 | 成本 | 延迟 | 适用场景 |
|---|---|---|---|
| 引用强制 | 1次LM调用 | 低 | 拥有编号原始材料时 |
| 忠实性验证 | 2次LM调用 | 中 | RAG系统、文档问答 |
| 自我检查 | 2次LM调用 | 中 | 通用事实核查 |
| 交叉检查 | 3次LM调用 | 高 | 高风险、关键输出 |
| 置信度门控 | 1次LM调用 | 低 | 有人工参与的系统 |
| 检索锚定 | 1次检索 + 1-2次LM调用 | 中 | 拥有知识库时 |
Key principles
核心原则
- Grounding beats prompting. Giving the AI sources to cite is more effective than asking it to "be accurate."
- Assert for critical facts. Use when hallucination is unacceptable (medical, legal, financial).
dspy.Assert - Suggest for nice-to-haves. Use when you want to flag but not block.
dspy.Suggest - Layer your defenses. Combine retrieval + citation + verification for the strongest protection.
- Good error messages help. The Assert message becomes the model's self-correction prompt.
- 事实锚定优于提示词优化。为AI提供可引用的原始材料比要求它“保持准确”更有效。
- 对关键事实使用断言。当幻觉不可接受时(医疗、法律、金融场景),使用。
dspy.Assert - 对非强制要求使用建议。当你想要标记但不阻止响应时,使用。
dspy.Suggest - 多层防御组合。结合检索、引用和验证,实现最强保护。
- 清晰的错误信息很重要。断言信息会成为模型的自我修正提示词。
Additional resources
额外资源
- Use for retrieval-augmented generation (RAG) setup
/ai-searching-docs - Use for general output validation (format, safety, quality)
/ai-checking-outputs - Use for enforcing business rules and content policies
/ai-following-rules - See for complete worked examples
examples.md
- 如需检索增强生成(RAG)设置,请参考
/ai-searching-docs - 如需通用输出验证(格式、安全性、质量),请参考
/ai-checking-outputs - 如需执行业务规则和内容政策,请参考
/ai-following-rules - 完整示例请查看
examples.md