ai-stopping-hallucinations

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Stop Your AI From Making Things Up

阻止你的AI生成虚假内容

Guide the user through making their AI factually grounded. The core principle: never trust a bare LM output — always verify against sources.
引导用户让其AI实现事实锚定。核心原则:永远不要信任未经处理的LM输出——始终对照原始材料进行验证。

Why AI hallucinates

AI为何会产生幻觉

LMs generate plausible-sounding text, not verified facts. Hallucination happens when:
  • The model has no source material to ground its answer
  • The prompt doesn't enforce citations or evidence
  • There's no verification step after generation
  • Temperature is too high for factual tasks
The fix isn't better prompting — it's programmatic constraints that force grounding.
语言模型(LM)生成的是听起来合理的文本,而非经过验证的事实。出现幻觉的场景包括:
  • 模型没有可锚定答案的原始材料
  • 提示词未强制要求引用或提供证据
  • 生成后没有验证步骤
  • 针对事实类任务设置的Temperature参数过高
解决办法并非优化提示词——而是通过程序化约束强制实现事实锚定。

Step 1: Understand the grounding situation

步骤1:明确事实锚定场景

Ask the user:
  1. Do you have source documents? (knowledge base, docs, database) → use retrieval-grounded answers
  2. Is it general knowledge? (no docs, just the model's knowledge) → use self-consistency checks
  3. How bad is a hallucination? (annoying vs. dangerous) → determines how strict the checks should be
询问用户:
  1. 你是否有原始文档?(知识库、文档、数据库)→ 使用基于检索的锚定答案
  2. 是否是通用知识?(无文档,仅依赖模型自身知识)→ 使用自一致性检查
  3. 幻觉的影响有多严重?(只是烦人还是有危险)→ 决定检查的严格程度

Step 2: Citation enforcement

步骤2:强制引用来源

Force the AI to cite sources for every claim. Uses
dspy.Assert
to reject answers without citations.
python
import dspy
import re

class CitedAnswer(dspy.Signature):
    """Answer the question using the provided sources. Cite every claim with [1], [2], etc."""
    context: list[str] = dspy.InputField(desc="Numbered source documents")
    question: str = dspy.InputField()
    answer: str = dspy.OutputField(desc="Answer with inline citations like [1], [2]")

class CitationEnforcer(dspy.Module):
    def __init__(self):
        self.answer = dspy.ChainOfThought(CitedAnswer)

    def forward(self, context, question):
        result = self.answer(context=context, question=question)

        # Every 1-2 sentences must have a citation
        sentences = [s.strip() for s in result.answer.split(".") if s.strip()]
        citations_found = [bool(re.search(r"\[\d+\]", s)) for s in sentences]

        # Check that at least half the sentences have citations
        citation_ratio = sum(citations_found) / max(len(sentences), 1)
        dspy.Assert(
            citation_ratio >= 0.5,
            "Answer must cite sources. Use [1], [2], etc. after claims. "
            f"Only {citation_ratio:.0%} of sentences have citations."
        )

        # Check that cited numbers actually exist in the context
        cited_nums = set(int(n) for n in re.findall(r"\[(\d+)\]", result.answer))
        valid_nums = set(range(1, len(context) + 1))
        invalid = cited_nums - valid_nums
        dspy.Assert(
            len(invalid) == 0,
            f"Citations {invalid} don't match any source. Valid sources: [1] to [{len(context)}]."
        )

        return result
强制AI为每个主张引用来源。使用
dspy.Assert
拒绝未引用来源的答案。
python
import dspy
import re

class CitedAnswer(dspy.Signature):
    """Answer the question using the provided sources. Cite every claim with [1], [2], etc."""
    context: list[str] = dspy.InputField(desc="Numbered source documents")
    question: str = dspy.InputField()
    answer: str = dspy.OutputField(desc="Answer with inline citations like [1], [2]")

class CitationEnforcer(dspy.Module):
    def __init__(self):
        self.answer = dspy.ChainOfThought(CitedAnswer)

    def forward(self, context, question):
        result = self.answer(context=context, question=question)

        # Every 1-2 sentences must have a citation
        sentences = [s.strip() for s in result.answer.split(".") if s.strip()]
        citations_found = [bool(re.search(r"\[\d+\]", s)) for s in sentences]

        # Check that at least half the sentences have citations
        citation_ratio = sum(citations_found) / max(len(sentences), 1)
        dspy.Assert(
            citation_ratio >= 0.5,
            "Answer must cite sources. Use [1], [2], etc. after claims. "
            f"Only {citation_ratio:.0%} of sentences have citations."
        )

        # Check that cited numbers actually exist in the context
        cited_nums = set(int(n) for n in re.findall(r"\[(\d+)\]", result.answer))
        valid_nums = set(range(1, len(context) + 1))
        invalid = cited_nums - valid_nums
        dspy.Assert(
            len(invalid) == 0,
            f"Citations {invalid} don't match any source. Valid sources: [1] to [{len(context)}]."
        )

        return result

Step 3: Faithfulness verification

步骤3:忠实性验证

After generating an answer, use a second LM call to check if it's actually supported by the sources.
python
class CheckFaithfulness(dspy.Signature):
    """Check if every claim in the answer is supported by the context."""
    context: list[str] = dspy.InputField(desc="Source documents")
    answer: str = dspy.InputField(desc="Generated answer to verify")
    is_faithful: bool = dspy.OutputField(desc="Is every claim supported by the context?")
    unsupported_claims: list[str] = dspy.OutputField(desc="Claims not found in context")

class FaithfulResponder(dspy.Module):
    def __init__(self):
        self.retrieve = dspy.Retrieve(k=5)
        self.answer = dspy.ChainOfThought(CitedAnswer)
        self.verify = dspy.Predict(CheckFaithfulness)

    def forward(self, question):
        context = self.retrieve(question).passages
        result = self.answer(context=context, question=question)

        check = self.verify(context=context, answer=result.answer)
        dspy.Assert(
            check.is_faithful,
            f"Answer contains unsupported claims: {check.unsupported_claims}. "
            "Rewrite using only information from the provided sources."
        )

        return result
When
dspy.Assert
fails, DSPy automatically retries the LM call, feeding back the error message so the model can self-correct. This retry loop (called backtracking) runs up to
max_backtrack_attempts
times (default: 2).
生成答案后,调用第二个LM检查答案是否确实得到原始材料的支持。
python
class CheckFaithfulness(dspy.Signature):
    """Check if every claim in the answer is supported by the context."""
    context: list[str] = dspy.InputField(desc="Source documents")
    answer: str = dspy.InputField(desc="Generated answer to verify")
    is_faithful: bool = dspy.OutputField(desc="Is every claim supported by the context?")
    unsupported_claims: list[str] = dspy.OutputField(desc="Claims not found in context")

class FaithfulResponder(dspy.Module):
    def __init__(self):
        self.retrieve = dspy.Retrieve(k=5)
        self.answer = dspy.ChainOfThought(CitedAnswer)
        self.verify = dspy.Predict(CheckFaithfulness)

    def forward(self, question):
        context = self.retrieve(question).passages
        result = self.answer(context=context, question=question)

        check = self.verify(context=context, answer=result.answer)
        dspy.Assert(
            check.is_faithful,
            f"Answer contains unsupported claims: {check.unsupported_claims}. "
            "Rewrite using only information from the provided sources."
        )

        return result
dspy.Assert
失败时,DSPy会自动重试LM调用,并将错误信息作为反馈提供给模型,使其能够自我修正。这个重试循环(称为回溯)最多运行
max_backtrack_attempts
次(默认值:2)。

Step 4: Self-check pattern

步骤4:自我检查模式

Generate an answer, then ask the model to verify its own claims against the sources. Lightweight and good for most cases.
python
class SelfCheckedAnswer(dspy.Module):
    def __init__(self):
        self.answer = dspy.ChainOfThought("context, question -> answer")
        self.check = dspy.ChainOfThought(CheckFaithfulness)

    def forward(self, context, question):
        result = self.answer(context=context, question=question)

        verification = self.check(context=context, answer=result.answer)
        dspy.Suggest(
            verification.is_faithful,
            f"Some claims may not be supported: {verification.unsupported_claims}. "
            "Consider revising to stick closer to the sources."
        )

        return dspy.Prediction(
            answer=result.answer,
            is_verified=verification.is_faithful,
            unsupported=verification.unsupported_claims,
        )
Use
dspy.Suggest
(soft) instead of
dspy.Assert
(hard) when you want to flag issues without blocking the response.
生成答案后,让模型对照原始材料验证自己的主张。这种方式轻量,适用于大多数场景。
python
class SelfCheckedAnswer(dspy.Module):
    def __init__(self):
        self.answer = dspy.ChainOfThought("context, question -> answer")
        self.check = dspy.ChainOfThought(CheckFaithfulness)

    def forward(self, context, question):
        result = self.answer(context=context, question=question)

        verification = self.check(context=context, answer=result.answer)
        dspy.Suggest(
            verification.is_faithful,
            f"Some claims may not be supported: {verification.unsupported_claims}. "
            "Consider revising to stick closer to the sources."
        )

        return dspy.Prediction(
            answer=result.answer,
            is_verified=verification.is_faithful,
            unsupported=verification.unsupported_claims,
        )
当你想要标记问题但不阻止响应时,使用
dspy.Suggest
(软约束)而非
dspy.Assert
(硬约束)。

Step 5: Cross-check pattern

步骤5:交叉检查模式

Generate the answer twice independently, then compare. If two independent generations disagree, something is probably made up.
python
class CrossChecked(dspy.Module):
    def __init__(self):
        self.gen_a = dspy.ChainOfThought("context, question -> answer")
        self.gen_b = dspy.ChainOfThought("context, question -> answer")
        self.compare = dspy.Predict(CompareAnswers)

    def forward(self, context, question):
        a = self.gen_a(context=context, question=question)
        b = self.gen_b(context=context, question=question)

        check = self.compare(answer_a=a.answer, answer_b=b.answer)
        dspy.Assert(
            check.agree,
            f"Two independent answers disagree: {check.discrepancy}. "
            "This suggests hallucination. Regenerate with closer attention to sources."
        )

        return a

class CompareAnswers(dspy.Signature):
    """Check if two independently generated answers agree on the facts."""
    answer_a: str = dspy.InputField()
    answer_b: str = dspy.InputField()
    agree: bool = dspy.OutputField(desc="Do they agree on all factual claims?")
    discrepancy: str = dspy.OutputField(desc="What they disagree on, if anything")
Best for high-stakes outputs where the cost of hallucination is high. Doubles your LM calls but catches inconsistencies.
独立生成两次答案,然后进行比较。如果两次独立生成的答案不一致,很可能存在虚假内容。
python
class CrossChecked(dspy.Module):
    def __init__(self):
        self.gen_a = dspy.ChainOfThought("context, question -> answer")
        self.gen_b = dspy.ChainOfThought("context, question -> answer")
        self.compare = dspy.Predict(CompareAnswers)

    def forward(self, context, question):
        a = self.gen_a(context=context, question=question)
        b = self.gen_b(context=context, question=question)

        check = self.compare(answer_a=a.answer, answer_b=b.answer)
        dspy.Assert(
            check.agree,
            f"Two independent answers disagree: {check.discrepancy}. "
            "This suggests hallucination. Regenerate with closer attention to sources."
        )

        return a

class CompareAnswers(dspy.Signature):
    """Check if two independently generated answers agree on the facts."""
    answer_a: str = dspy.InputField()
    answer_b: str = dspy.InputField()
    agree: bool = dspy.OutputField(desc="Do they agree on all factual claims?")
    discrepancy: str = dspy.OutputField(desc="What they disagree on, if anything")
适用于幻觉成本极高的高风险输出场景。虽然会使LM调用次数翻倍,但能发现不一致问题。

Step 6: Grounding via retrieval

步骤6:基于检索的事实锚定

The single most effective anti-hallucination measure: give the AI source material and constrain it to that material. Connect to
/ai-searching-docs
for the full RAG setup.
python
class GroundedQA(dspy.Module):
    def __init__(self):
        self.retrieve = dspy.Retrieve(k=5)
        self.answer = dspy.ChainOfThought(CitedAnswer)
        self.verify = dspy.Predict(CheckFaithfulness)

    def forward(self, question):
        # Ground in retrieved sources
        context = self.retrieve(question).passages

        # Generate with citation requirement
        result = self.answer(context=context, question=question)

        # Verify faithfulness
        check = self.verify(context=context, answer=result.answer)
        dspy.Assert(
            check.is_faithful,
            f"Unsupported claims: {check.unsupported_claims}. "
            "Only use information from the provided sources."
        )

        return result
最有效的反幻觉措施:为AI提供原始材料,并将其约束在该材料范围内。如需完整RAG设置,请参考
/ai-searching-docs
python
class GroundedQA(dspy.Module):
    def __init__(self):
        self.retrieve = dspy.Retrieve(k=5)
        self.answer = dspy.ChainOfThought(CitedAnswer)
        self.verify = dspy.Predict(CheckFaithfulness)

    def forward(self, question):
        # Ground in retrieved sources
        context = self.retrieve(question).passages

        # Generate with citation requirement
        result = self.answer(context=context, question=question)

        # Verify faithfulness
        check = self.verify(context=context, answer=result.answer)
        dspy.Assert(
            check.is_faithful,
            f"Unsupported claims: {check.unsupported_claims}. "
            "Only use information from the provided sources."
        )

        return result

Step 7: Confidence thresholds

步骤7:置信度阈值设置

Flag low-confidence outputs for human review instead of showing them to users.
python
class ConfidenceGated(dspy.Signature):
    """Answer the question and rate your confidence."""
    context: list[str] = dspy.InputField()
    question: str = dspy.InputField()
    answer: str = dspy.OutputField()
    confidence: float = dspy.OutputField(desc="0.0 to 1.0, how confident are you?")
    reasoning: str = dspy.OutputField(desc="Why this confidence level?")

class GatedResponder(dspy.Module):
    def __init__(self, threshold=0.7):
        self.respond = dspy.ChainOfThought(ConfidenceGated)
        self.threshold = threshold

    def forward(self, context, question):
        result = self.respond(context=context, question=question)

        if result.confidence < self.threshold:
            return dspy.Prediction(
                answer=result.answer,
                needs_review=True,
                confidence=result.confidence,
                reason=result.reasoning,
            )

        return dspy.Prediction(
            answer=result.answer,
            needs_review=False,
            confidence=result.confidence,
        )
标记低置信度输出,交由人工审核,而非直接展示给用户。
python
class ConfidenceGated(dspy.Signature):
    """Answer the question and rate your confidence."""
    context: list[str] = dspy.InputField()
    question: str = dspy.InputField()
    answer: str = dspy.OutputField()
    confidence: float = dspy.OutputField(desc="0.0 to 1.0, how confident are you?")
    reasoning: str = dspy.OutputField(desc="Why this confidence level?")

class GatedResponder(dspy.Module):
    def __init__(self, threshold=0.7):
        self.respond = dspy.ChainOfThought(ConfidenceGated)
        self.threshold = threshold

    def forward(self, context, question):
        result = self.respond(context=context, question=question)

        if result.confidence < self.threshold:
            return dspy.Prediction(
                answer=result.answer,
                needs_review=True,
                confidence=result.confidence,
                reason=result.reasoning,
            )

        return dspy.Prediction(
            answer=result.answer,
            needs_review=False,
            confidence=result.confidence,
        )

How backtracking works

回溯机制的工作原理

When
dspy.Assert
fails:
  1. DSPy catches the assertion failure
  2. The error message is fed back to the LM as additional context
  3. The LM retries generation with the feedback ("your answer had unsupported claims X, Y")
  4. This repeats up to
    max_backtrack_attempts
    times
  5. If all retries fail, the assertion raises an error
This is why good error messages matter — they're literally the feedback the model uses to improve.
dspy.Assert
失败时:
  1. DSPy捕获断言失败
  2. 将错误信息作为额外上下文反馈给LM
  3. LM结合反馈重试生成(例如“你的答案存在无依据主张X、Y”)
  4. 此过程重复最多
    max_backtrack_attempts
  5. 如果所有重试都失败,断言将触发错误
这就是清晰的错误信息至关重要的原因——它们是模型用于自我修正的提示词。

Choosing the right pattern

选择合适的模式

PatternCostLatencyBest for
Citation enforcement1 LM callLowWhen you have numbered sources
Faithfulness verification2 LM callsMediumRAG systems, doc Q&A
Self-check2 LM callsMediumGeneral fact-checking
Cross-check3 LM callsHighHigh-stakes, critical outputs
Confidence gating1 LM callLowHuman-in-the-loop systems
Retrieval grounding1 retrieval + 1-2 LMMediumWhen you have a knowledge base
模式成本延迟适用场景
引用强制1次LM调用拥有编号原始材料时
忠实性验证2次LM调用RAG系统、文档问答
自我检查2次LM调用通用事实核查
交叉检查3次LM调用高风险、关键输出
置信度门控1次LM调用有人工参与的系统
检索锚定1次检索 + 1-2次LM调用拥有知识库时

Key principles

核心原则

  • Grounding beats prompting. Giving the AI sources to cite is more effective than asking it to "be accurate."
  • Assert for critical facts. Use
    dspy.Assert
    when hallucination is unacceptable (medical, legal, financial).
  • Suggest for nice-to-haves. Use
    dspy.Suggest
    when you want to flag but not block.
  • Layer your defenses. Combine retrieval + citation + verification for the strongest protection.
  • Good error messages help. The Assert message becomes the model's self-correction prompt.
  • 事实锚定优于提示词优化。为AI提供可引用的原始材料比要求它“保持准确”更有效。
  • 对关键事实使用断言。当幻觉不可接受时(医疗、法律、金融场景),使用
    dspy.Assert
  • 对非强制要求使用建议。当你想要标记但不阻止响应时,使用
    dspy.Suggest
  • 多层防御组合。结合检索、引用和验证,实现最强保护。
  • 清晰的错误信息很重要。断言信息会成为模型的自我修正提示词。

Additional resources

额外资源

  • Use
    /ai-searching-docs
    for retrieval-augmented generation (RAG) setup
  • Use
    /ai-checking-outputs
    for general output validation (format, safety, quality)
  • Use
    /ai-following-rules
    for enforcing business rules and content policies
  • See
    examples.md
    for complete worked examples
  • 如需检索增强生成(RAG)设置,请参考
    /ai-searching-docs
  • 如需通用输出验证(格式、安全性、质量),请参考
    /ai-checking-outputs
  • 如需执行业务规则和内容政策,请参考
    /ai-following-rules
  • 完整示例请查看
    examples.md