ai-checking-outputs

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Check AI Output Before It Ships

在AI输出交付前进行检查

Guide the user through adding verification and guardrails so bad AI outputs never reach users. The pattern: generate, check, fix or reject.

引导用户添加验证机制与防护护栏，确保不良AI输出不会触达用户。核心模式：生成、检查、修复或拒绝。

Step 1: Understand what to check

步骤1：明确检查内容

Ask the user:

What could go wrong? (hallucinations, wrong format, offensive content, missing info, factual errors?)
How strict does it need to be? (reject bad outputs vs. try to fix them?)
What's the cost of a bad output reaching users? (annoyance vs. legal/safety risk)

询问用户：

可能出现哪些问题？（幻觉输出、格式错误、冒犯性内容、信息缺失、事实错误？）
检查严格程度要求？（直接拒绝不良输出还是尝试修复？）
不良输出触达用户的代价？（仅造成困扰还是存在法律/安全风险？）

Step 2: Quick wins — DSPy assertions

步骤2：快速实现——DSPy断言

The simplest way to add checks.

dspy.Assert

is a hard stop (retry if violated),

dspy.Suggest

is a soft nudge:

python

import dspy

class CheckedResponder(dspy.Module):
    def __init__(self):
        self.respond = dspy.ChainOfThought(GenerateResponse)

    def forward(self, question):
        result = self.respond(question=question)

        # Hard checks — will retry if these fail
        dspy.Assert(
            len(result.answer) > 0,
            "Must produce an answer"
        )
        dspy.Assert(
            len(result.answer.split()) <= 200,
            "Answer must be under 200 words"
        )

        # Soft checks — hints for improvement
        dspy.Suggest(
            "I don't know" not in result.answer.lower(),
            "Try to provide a substantive answer"
        )
        dspy.Suggest(
            not any(word in result.answer.lower() for word in ["definitely", "absolutely", "100%"]),
            "Avoid overconfident language"
        )

        return result

DSPy will automatically retry the LM call (with the assertion feedback) when an

Assert

fails, up to a configurable number of times.

这是添加检查的最简方式。

dspy.Assert

是强制拦截（触发重试），

dspy.Suggest

是温和提示：

python

import dspy

class CheckedResponder(dspy.Module):
    def __init__(self):
        self.respond = dspy.ChainOfThought(GenerateResponse)

    def forward(self, question):
        result = self.respond(question=question)

        # 强制检查——失败则触发重试
        dspy.Assert(
            len(result.answer) > 0,
            "必须生成有效回答"
        )
        dspy.Assert(
            len(result.answer.split()) <= 200,
            "回答长度不得超过200词"
        )

        # 温和提示——仅给出改进建议
        dspy.Suggest(
            "I don't know" not in result.answer.lower(),
            "尝试提供实质性回答"
        )
        dspy.Suggest(
            not any(word in result.answer.lower() for word in ["definitely", "absolutely", "100%"]),
            "避免使用过度绝对的表述"
        )

        return result

当

Assert

断言失败时，DSPy会自动重试大模型调用（并将断言反馈作为上下文），重试次数可配置。

Step 3: Format validation

步骤3：格式验证

Type-based validation (automatic)

基于类型的自动验证

DSPy validates typed outputs automatically:

python

from typing import Literal
from pydantic import BaseModel, Field

class Response(BaseModel):
    answer: str = Field(min_length=1, max_length=500)
    confidence: float = Field(ge=0.0, le=1.0)
    category: str

class MySignature(dspy.Signature):
    question: str = dspy.InputField()
    response: Response = dspy.OutputField()

Pydantic catches malformed JSON, out-of-range values, and wrong types before your code ever sees them.

DSPy会自动验证带类型的输出：

python

from typing import Literal
from pydantic import BaseModel, Field

class Response(BaseModel):
    answer: str = Field(min_length=1, max_length=500)
    confidence: float = Field(ge=0.0, le=1.0)
    category: str

class MySignature(dspy.Signature):
    question: str = dspy.InputField()
    response: Response = dspy.OutputField()

Pydantic会在代码处理输出前，自动捕获格式错误、数值越界与类型不匹配问题。

Custom validation in the module

模块内自定义验证

python

import re

class ValidatedExtractor(dspy.Module):
    def __init__(self):
        self.extract = dspy.ChainOfThought(ExtractContact)

    def forward(self, text):
        result = self.extract(text=text)

        # Validate email format
        dspy.Assert(
            re.match(r"[^@]+@[^@]+\.[^@]+", result.email or ""),
            "Email must be a valid email address"
        )

        # Validate phone format
        dspy.Assert(
            len(re.sub(r"\D", "", result.phone or "")) >= 10,
            "Phone must have at least 10 digits"
        )

        return result

python

import re

class ValidatedExtractor(dspy.Module):
    def __init__(self):
        self.extract = dspy.ChainOfThought(ExtractContact)

    def forward(self, text):
        result = self.extract(text=text)

        # 验证邮箱格式
        dspy.Assert(
            re.match(r"[^@]+@[^@]+\.[^@]+", result.email or ""),
            "邮箱格式必须合法"
        )

        # 验证手机号格式
        dspy.Assert(
            len(re.sub(r"\D", "", result.phone or "")) >= 10,
            "手机号必须包含至少10位数字"
        )

        return result

Step 4: Factual verification

步骤4：事实核查

Self-check — ask the AI to verify its own output

自我核查——让AI自行验证输出

python

class VerifyFacts(dspy.Signature):
    """Check if the answer is supported by the given context."""
    context: list[str] = dspy.InputField(desc="Source documents")
    answer: str = dspy.InputField(desc="Generated answer to verify")
    is_supported: bool = dspy.OutputField(desc="Is the answer fully supported by the context?")
    unsupported_claims: list[str] = dspy.OutputField(desc="Claims not found in context")

class GroundedResponder(dspy.Module):
    def __init__(self):
        self.retrieve = dspy.Retrieve(k=5)
        self.answer = dspy.ChainOfThought(AnswerFromDocs)
        self.verify = dspy.Predict(VerifyFacts)

    def forward(self, question):
        context = self.retrieve(question).passages
        response = self.answer(context=context, question=question)

        # Verify the answer is grounded in sources
        check = self.verify(context=context, answer=response.answer)
        dspy.Assert(
            check.is_supported,
            f"Answer contains unsupported claims: {check.unsupported_claims}. "
            "Rewrite using only information from the context."
        )

        return response

python

class VerifyFacts(dspy.Signature):
    """检查回答是否有给定上下文的支撑。"""
    context: list[str] = dspy.InputField(desc="源文档")
    answer: str = dspy.InputField(desc="待验证的生成回答")
    is_supported: bool = dspy.OutputField(desc="回答是否完全由上下文支撑？")
    unsupported_claims: list[str] = dspy.OutputField(desc="上下文中未提及的断言内容")

class GroundedResponder(dspy.Module):
    def __init__(self):
        self.retrieve = dspy.Retrieve(k=5)
        self.answer = dspy.ChainOfThought(AnswerFromDocs)
        self.verify = dspy.Predict(VerifyFacts)

    def forward(self, question):
        context = self.retrieve(question).passages
        response = self.answer(context=context, question=question)

        # 验证回答是否基于源数据
        check = self.verify(context=context, answer=response.answer)
        dspy.Assert(
            check.is_supported,
            f"回答包含无支撑的断言：{check.unsupported_claims}。"
            "请仅使用上下文中的信息重写回答。"
        )

        return response

Cross-check — generate two ways, compare

交叉核查——两种方式生成后对比

python

class CrossCheckedAnswer(dspy.Module):
    def __init__(self):
        self.answer_a = dspy.ChainOfThought(AnswerQuestion)
        self.answer_b = dspy.ChainOfThought(AnswerQuestion)
        self.compare = dspy.ChainOfThought(CompareAnswers)

    def forward(self, question):
        a = self.answer_a(question=question)
        b = self.answer_b(question=question)

        comparison = self.compare(
            question=question,
            answer_a=a.answer,
            answer_b=b.answer,
        )

        dspy.Assert(
            comparison.agree,
            "Two independent generations disagree — the answer may be unreliable"
        )

        return a

class CompareAnswers(dspy.Signature):
    """Check if two independently generated answers agree."""
    question: str = dspy.InputField()
    answer_a: str = dspy.InputField()
    answer_b: str = dspy.InputField()
    agree: bool = dspy.OutputField(desc="Do the answers substantially agree?")
    discrepancy: str = dspy.OutputField(desc="What they disagree on, if anything")

python

class CrossCheckedAnswer(dspy.Module):
    def __init__(self):
        self.answer_a = dspy.ChainOfThought(AnswerQuestion)
        self.answer_b = dspy.ChainOfThought(AnswerQuestion)
        self.compare = dspy.ChainOfThought(CompareAnswers)

    def forward(self, question):
        a = self.answer_a(question=question)
        b = self.answer_b(question=question)

        comparison = self.compare(
            question=question,
            answer_a=a.answer,
            answer_b=b.answer,
        )

        dspy.Assert(
            comparison.agree,
            "两次独立生成的回答不一致——结果可能不可靠"
        )

        return a

class CompareAnswers(dspy.Signature):
    """检查两次独立生成的回答是否一致。"""
    question: str = dspy.InputField()
    answer_a: str = dspy.InputField()
    answer_b: str = dspy.InputField()
    agree: bool = dspy.OutputField(desc="回答是否在实质上一致？")
    discrepancy: str = dspy.OutputField(desc="若不一致，差异点是什么")

Step 5: Safety and content filtering

步骤5：安全与内容过滤

Block harmful outputs

拦截有害输出

python

BLOCKED_PATTERNS = [
    r"\b(password|secret|api.?key)\b",
    r"\b\d{3}-\d{2}-\d{4}\b",  # SSN pattern
]

class SafeResponder(dspy.Module):
    def __init__(self):
        self.respond = dspy.ChainOfThought(GenerateResponse)

    def forward(self, question):
        result = self.respond(question=question)

        # Check for leaked sensitive data
        for pattern in BLOCKED_PATTERNS:
            dspy.Assert(
                not re.search(pattern, result.answer, re.IGNORECASE),
                f"Response may contain sensitive data (pattern: {pattern})"
            )

        return result

python

BLOCKED_PATTERNS = [
    r"\b(password|secret|api.?key)\b",
    r"\b\d{3}-\d{2}-\d{4}\b",  # 社保号格式
]

class SafeResponder(dspy.Module):
    def __init__(self):
        self.respond = dspy.ChainOfThought(GenerateResponse)

    def forward(self, question):
        result = self.respond(question=question)

        # 检查是否包含泄露的敏感数据
        for pattern in BLOCKED_PATTERNS:
            dspy.Assert(
                not re.search(pattern, result.answer, re.IGNORECASE),
                f"响应可能包含敏感数据（匹配规则：{pattern})"
            )

        return result

AI-as-safety-judge

用AI作为安全审核员

python

class SafetyCheck(dspy.Signature):
    """Check if the response is safe and appropriate."""
    question: str = dspy.InputField()
    response: str = dspy.InputField()
    is_safe: bool = dspy.OutputField()
    concern: str = dspy.OutputField(desc="Safety concern if not safe, empty if safe")

class SafetyCheckedResponder(dspy.Module):
    def __init__(self):
        self.respond = dspy.ChainOfThought(GenerateResponse)
        self.check = dspy.Predict(SafetyCheck)

    def forward(self, question):
        result = self.respond(question=question)

        safety = self.check(question=question, response=result.answer)
        dspy.Assert(
            safety.is_safe,
            f"Response flagged as unsafe: {safety.concern}. Regenerate."
        )

        return result

python

class SafetyCheck(dspy.Signature):
    """检查响应是否安全合规。"""
    question: str = dspy.InputField()
    response: str = dspy.InputField()
    is_safe: bool = dspy.OutputField()
    concern: str = dspy.OutputField(desc="若不安全，说明安全问题；若安全则留空")

class SafetyCheckedResponder(dspy.Module):
    def __init__(self):
        self.respond = dspy.ChainOfThought(GenerateResponse)
        self.check = dspy.Predict(SafetyCheck)

    def forward(self, question):
        result = self.respond(question=question)

        safety = self.check(question=question, response=result.answer)
        dspy.Assert(
            safety.is_safe,
            f"响应被标记为不安全：{safety.concern}。请重新生成。"
        )

        return result

Step 6: Generate → Filter → Pick best (ensemble pattern)

步骤6：生成→过滤→择优（集成模式）

For high-stakes outputs, generate multiple candidates and filter:

python

class FilteredEnsemble(dspy.Module):
    def __init__(self, num_candidates=5):
        self.generators = [dspy.ChainOfThought(GenerateAnswer) for _ in range(num_candidates)]
        self.judge = dspy.ChainOfThought(RankAnswers)

    def forward(self, question):
        candidates = []
        for gen in self.generators:
            try:
                result = gen(question=question)
                # Only keep candidates that pass basic checks
                if len(result.answer) > 0 and len(result.answer.split()) < 200:
                    candidates.append(result.answer)
            except Exception:
                continue

        dspy.Assert(len(candidates) > 0, "No valid candidates generated")

        return self.judge(question=question, candidates=candidates)

class RankAnswers(dspy.Signature):
    """Pick the best answer from the candidates."""
    question: str = dspy.InputField()
    candidates: list[str] = dspy.InputField()
    best_answer: str = dspy.OutputField()

针对高风险场景，可生成多个候选结果后再过滤：

python

class FilteredEnsemble(dspy.Module):
    def __init__(self, num_candidates=5):
        self.generators = [dspy.ChainOfThought(GenerateAnswer) for _ in range(num_candidates)]
        self.judge = dspy.ChainOfThought(RankAnswers)

    def forward(self, question):
        candidates = []
        for gen in self.generators:
            try:
                result = gen(question=question)
                # 仅保留通过基础检查的候选结果
                if len(result.answer) > 0 and len(result.answer.split()) < 200:
                    candidates.append(result.answer)
            except Exception:
                continue

        dspy.Assert(len(candidates) > 0, "未生成有效候选结果")

        return self.judge(question=question, candidates=candidates)

class RankAnswers(dspy.Signature):
    """从候选结果中挑选最优回答。"""
    question: str = dspy.InputField()
    candidates: list[str] = dspy.InputField()
    best_answer: str = dspy.OutputField()

How backtracking works

回溯机制原理

When

dspy.Assert

fails, DSPy doesn't just retry blindly:

The assertion failure is caught
The error message is fed back to the LM as additional context
The LM retries with this feedback (e.g., "your answer was 350 words, must be under 280")
This repeats up to
```
max_backtrack_attempts
```
times (default: 2)
If all retries fail, the assertion raises an error

This is why specific error messages matter — they're the model's self-correction instructions. "Response is 350 words, must be under 280" is much more useful than "too long."

When combined with optimization (

/ai-improving-accuracy

), the model learns to satisfy constraints on the first try, reducing retries in production.

当

dspy.Assert

失败时，DSPy不会盲目重试：

捕获断言失败事件
将错误信息作为额外上下文反馈给大模型
大模型基于该反馈重新生成（例如：“你的回答有350词，需控制在280词以内”）
重复上述过程，直到达到
```
max_backtrack_attempts
```
次数（默认：2次）
若所有重试均失败，则断言抛出错误

这也是明确错误信息至关重要的原因——它们是模型自我修正的指令。“回答有350词，需控制在280词以内”比“太长了”有用得多。

与优化功能（

/ai-improving-accuracy

）结合使用时，模型会学习在首次生成时就满足约束条件，从而减少生产环境中的重试次数。

Key patterns

核心模式总结

Assert for hard requirements — format, length, safety. DSPy retries automatically.
Suggest for soft preferences — style, tone, detail level. Won't block but nudges.
Pydantic for structure — catches malformed output automatically.
Self-verification for facts — ask the AI "is this grounded in the sources?"
Cross-checking for reliability — generate twice independently, compare.
Regex for sensitive data — block SSNs, API keys, passwords in output.
Ensemble for high stakes — generate many, filter, pick the best.

强制断言满足硬性要求——格式、长度、安全。DSPy自动重试。
温和提示满足软性偏好——风格、语气、详细程度。不会拦截但会给出改进建议。
Pydantic确保结构合规——自动捕获格式错误的输出。
自我验证确保事实准确性——让AI回答“该内容是否基于源数据？”
交叉核查提升可靠性——独立生成两次后对比结果。
正则表达式拦截敏感数据——在输出中拦截社保号、API密钥、密码等。
集成模式应对高风险场景——生成多个结果、过滤、挑选最优。

Checklist: what to check

检查项清单

Check	When to use	How
Non-empty output	Always	`dspy.Assert(len(answer) > 0, ...)`
Length limits	User-facing text	`dspy.Assert(len(answer.split()) < N, ...)`
Valid format	Structured output	Pydantic model + `dspy.Assert`
Grounded in sources	RAG / doc search	Verification signature
No sensitive data	Any user-facing output	Regex patterns
Safe content	Public-facing apps	AI safety judge
Consistent	Critical decisions	Cross-check with two generations
High quality	High-stakes outputs	Ensemble + ranking

检查内容	适用场景	实现方式
非空输出	所有场景	`dspy.Assert(len(answer) > 0, ...)`
长度限制	面向用户的文本	`dspy.Assert(len(answer.split()) < N, ...)`
格式合规	结构化输出	Pydantic模型 + `dspy.Assert`
基于源数据	RAG/文档检索	验证签名
无敏感数据	所有面向用户的输出	正则表达式规则
内容安全	公开应用	AI安全审核员
结果一致	关键决策场景	两次生成结果交叉对比
高质量输出	高风险场景	集成生成 + 排序择优

Additional resources

额外资源

Use
```
/ai-stopping-hallucinations
```
for citation enforcement, faithfulness verification, and grounding AI in facts
Use
```
/ai-following-rules
```
for defining and enforcing content policies, format rules, and business constraints
Use
```
/ai-building-pipelines
```
to wire checks into multi-step systems
Use
```
/ai-making-consistent
```
for output consistency (not correctness)
Use
```
/ai-testing-safety
```
to stress-test your guardrails with adversarial attacks
Need to evaluate human work against criteria? Use
```
/ai-scoring
```
Next:
```
/ai-improving-accuracy
```
to measure and improve quality

如需引用强制、事实一致性验证及AI事实锚定，可使用
```
/ai-stopping-hallucinations
```
如需定义与执行内容政策、格式规则及业务约束，可使用
```
/ai-following-rules
```
如需将检查机制接入多步骤系统，可使用
```
/ai-building-pipelines
```
如需确保输出一致性（非正确性），可使用
```
/ai-making-consistent
```
如需通过对抗攻击测试防护护栏，可使用
```
/ai-testing-safety
```
如需根据标准评估人工工作，可使用
```
/ai-scoring
```
下一步：使用
```
/ai-improving-accuracy
```
衡量并提升输出质量