ai-checking-outputs
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseCheck AI Output Before It Ships
在AI输出交付前进行检查
Guide the user through adding verification and guardrails so bad AI outputs never reach users. The pattern: generate, check, fix or reject.
引导用户添加验证机制与防护护栏,确保不良AI输出不会触达用户。核心模式:生成、检查、修复或拒绝。
Step 1: Understand what to check
步骤1:明确检查内容
Ask the user:
- What could go wrong? (hallucinations, wrong format, offensive content, missing info, factual errors?)
- How strict does it need to be? (reject bad outputs vs. try to fix them?)
- What's the cost of a bad output reaching users? (annoyance vs. legal/safety risk)
询问用户:
- 可能出现哪些问题?(幻觉输出、格式错误、冒犯性内容、信息缺失、事实错误?)
- 检查严格程度要求?(直接拒绝不良输出还是尝试修复?)
- 不良输出触达用户的代价?(仅造成困扰还是存在法律/安全风险?)
Step 2: Quick wins — DSPy assertions
步骤2:快速实现——DSPy断言
The simplest way to add checks. is a hard stop (retry if violated), is a soft nudge:
dspy.Assertdspy.Suggestpython
import dspy
class CheckedResponder(dspy.Module):
def __init__(self):
self.respond = dspy.ChainOfThought(GenerateResponse)
def forward(self, question):
result = self.respond(question=question)
# Hard checks — will retry if these fail
dspy.Assert(
len(result.answer) > 0,
"Must produce an answer"
)
dspy.Assert(
len(result.answer.split()) <= 200,
"Answer must be under 200 words"
)
# Soft checks — hints for improvement
dspy.Suggest(
"I don't know" not in result.answer.lower(),
"Try to provide a substantive answer"
)
dspy.Suggest(
not any(word in result.answer.lower() for word in ["definitely", "absolutely", "100%"]),
"Avoid overconfident language"
)
return resultDSPy will automatically retry the LM call (with the assertion feedback) when an fails, up to a configurable number of times.
Assert这是添加检查的最简方式。是强制拦截(触发重试),是温和提示:
dspy.Assertdspy.Suggestpython
import dspy
class CheckedResponder(dspy.Module):
def __init__(self):
self.respond = dspy.ChainOfThought(GenerateResponse)
def forward(self, question):
result = self.respond(question=question)
# 强制检查——失败则触发重试
dspy.Assert(
len(result.answer) > 0,
"必须生成有效回答"
)
dspy.Assert(
len(result.answer.split()) <= 200,
"回答长度不得超过200词"
)
# 温和提示——仅给出改进建议
dspy.Suggest(
"I don't know" not in result.answer.lower(),
"尝试提供实质性回答"
)
dspy.Suggest(
not any(word in result.answer.lower() for word in ["definitely", "absolutely", "100%"]),
"避免使用过度绝对的表述"
)
return result当断言失败时,DSPy会自动重试大模型调用(并将断言反馈作为上下文),重试次数可配置。
AssertStep 3: Format validation
步骤3:格式验证
Type-based validation (automatic)
基于类型的自动验证
DSPy validates typed outputs automatically:
python
from typing import Literal
from pydantic import BaseModel, Field
class Response(BaseModel):
answer: str = Field(min_length=1, max_length=500)
confidence: float = Field(ge=0.0, le=1.0)
category: str
class MySignature(dspy.Signature):
question: str = dspy.InputField()
response: Response = dspy.OutputField()Pydantic catches malformed JSON, out-of-range values, and wrong types before your code ever sees them.
DSPy会自动验证带类型的输出:
python
from typing import Literal
from pydantic import BaseModel, Field
class Response(BaseModel):
answer: str = Field(min_length=1, max_length=500)
confidence: float = Field(ge=0.0, le=1.0)
category: str
class MySignature(dspy.Signature):
question: str = dspy.InputField()
response: Response = dspy.OutputField()Pydantic会在代码处理输出前,自动捕获格式错误、数值越界与类型不匹配问题。
Custom validation in the module
模块内自定义验证
python
import re
class ValidatedExtractor(dspy.Module):
def __init__(self):
self.extract = dspy.ChainOfThought(ExtractContact)
def forward(self, text):
result = self.extract(text=text)
# Validate email format
dspy.Assert(
re.match(r"[^@]+@[^@]+\.[^@]+", result.email or ""),
"Email must be a valid email address"
)
# Validate phone format
dspy.Assert(
len(re.sub(r"\D", "", result.phone or "")) >= 10,
"Phone must have at least 10 digits"
)
return resultpython
import re
class ValidatedExtractor(dspy.Module):
def __init__(self):
self.extract = dspy.ChainOfThought(ExtractContact)
def forward(self, text):
result = self.extract(text=text)
# 验证邮箱格式
dspy.Assert(
re.match(r"[^@]+@[^@]+\.[^@]+", result.email or ""),
"邮箱格式必须合法"
)
# 验证手机号格式
dspy.Assert(
len(re.sub(r"\D", "", result.phone or "")) >= 10,
"手机号必须包含至少10位数字"
)
return resultStep 4: Factual verification
步骤4:事实核查
Self-check — ask the AI to verify its own output
自我核查——让AI自行验证输出
python
class VerifyFacts(dspy.Signature):
"""Check if the answer is supported by the given context."""
context: list[str] = dspy.InputField(desc="Source documents")
answer: str = dspy.InputField(desc="Generated answer to verify")
is_supported: bool = dspy.OutputField(desc="Is the answer fully supported by the context?")
unsupported_claims: list[str] = dspy.OutputField(desc="Claims not found in context")
class GroundedResponder(dspy.Module):
def __init__(self):
self.retrieve = dspy.Retrieve(k=5)
self.answer = dspy.ChainOfThought(AnswerFromDocs)
self.verify = dspy.Predict(VerifyFacts)
def forward(self, question):
context = self.retrieve(question).passages
response = self.answer(context=context, question=question)
# Verify the answer is grounded in sources
check = self.verify(context=context, answer=response.answer)
dspy.Assert(
check.is_supported,
f"Answer contains unsupported claims: {check.unsupported_claims}. "
"Rewrite using only information from the context."
)
return responsepython
class VerifyFacts(dspy.Signature):
"""检查回答是否有给定上下文的支撑。"""
context: list[str] = dspy.InputField(desc="源文档")
answer: str = dspy.InputField(desc="待验证的生成回答")
is_supported: bool = dspy.OutputField(desc="回答是否完全由上下文支撑?")
unsupported_claims: list[str] = dspy.OutputField(desc="上下文中未提及的断言内容")
class GroundedResponder(dspy.Module):
def __init__(self):
self.retrieve = dspy.Retrieve(k=5)
self.answer = dspy.ChainOfThought(AnswerFromDocs)
self.verify = dspy.Predict(VerifyFacts)
def forward(self, question):
context = self.retrieve(question).passages
response = self.answer(context=context, question=question)
# 验证回答是否基于源数据
check = self.verify(context=context, answer=response.answer)
dspy.Assert(
check.is_supported,
f"回答包含无支撑的断言:{check.unsupported_claims}。"
"请仅使用上下文中的信息重写回答。"
)
return responseCross-check — generate two ways, compare
交叉核查——两种方式生成后对比
python
class CrossCheckedAnswer(dspy.Module):
def __init__(self):
self.answer_a = dspy.ChainOfThought(AnswerQuestion)
self.answer_b = dspy.ChainOfThought(AnswerQuestion)
self.compare = dspy.ChainOfThought(CompareAnswers)
def forward(self, question):
a = self.answer_a(question=question)
b = self.answer_b(question=question)
comparison = self.compare(
question=question,
answer_a=a.answer,
answer_b=b.answer,
)
dspy.Assert(
comparison.agree,
"Two independent generations disagree — the answer may be unreliable"
)
return a
class CompareAnswers(dspy.Signature):
"""Check if two independently generated answers agree."""
question: str = dspy.InputField()
answer_a: str = dspy.InputField()
answer_b: str = dspy.InputField()
agree: bool = dspy.OutputField(desc="Do the answers substantially agree?")
discrepancy: str = dspy.OutputField(desc="What they disagree on, if anything")python
class CrossCheckedAnswer(dspy.Module):
def __init__(self):
self.answer_a = dspy.ChainOfThought(AnswerQuestion)
self.answer_b = dspy.ChainOfThought(AnswerQuestion)
self.compare = dspy.ChainOfThought(CompareAnswers)
def forward(self, question):
a = self.answer_a(question=question)
b = self.answer_b(question=question)
comparison = self.compare(
question=question,
answer_a=a.answer,
answer_b=b.answer,
)
dspy.Assert(
comparison.agree,
"两次独立生成的回答不一致——结果可能不可靠"
)
return a
class CompareAnswers(dspy.Signature):
"""检查两次独立生成的回答是否一致。"""
question: str = dspy.InputField()
answer_a: str = dspy.InputField()
answer_b: str = dspy.InputField()
agree: bool = dspy.OutputField(desc="回答是否在实质上一致?")
discrepancy: str = dspy.OutputField(desc="若不一致,差异点是什么")Step 5: Safety and content filtering
步骤5:安全与内容过滤
Block harmful outputs
拦截有害输出
python
BLOCKED_PATTERNS = [
r"\b(password|secret|api.?key)\b",
r"\b\d{3}-\d{2}-\d{4}\b", # SSN pattern
]
class SafeResponder(dspy.Module):
def __init__(self):
self.respond = dspy.ChainOfThought(GenerateResponse)
def forward(self, question):
result = self.respond(question=question)
# Check for leaked sensitive data
for pattern in BLOCKED_PATTERNS:
dspy.Assert(
not re.search(pattern, result.answer, re.IGNORECASE),
f"Response may contain sensitive data (pattern: {pattern})"
)
return resultpython
BLOCKED_PATTERNS = [
r"\b(password|secret|api.?key)\b",
r"\b\d{3}-\d{2}-\d{4}\b", # 社保号格式
]
class SafeResponder(dspy.Module):
def __init__(self):
self.respond = dspy.ChainOfThought(GenerateResponse)
def forward(self, question):
result = self.respond(question=question)
# 检查是否包含泄露的敏感数据
for pattern in BLOCKED_PATTERNS:
dspy.Assert(
not re.search(pattern, result.answer, re.IGNORECASE),
f"响应可能包含敏感数据(匹配规则:{pattern})"
)
return resultAI-as-safety-judge
用AI作为安全审核员
python
class SafetyCheck(dspy.Signature):
"""Check if the response is safe and appropriate."""
question: str = dspy.InputField()
response: str = dspy.InputField()
is_safe: bool = dspy.OutputField()
concern: str = dspy.OutputField(desc="Safety concern if not safe, empty if safe")
class SafetyCheckedResponder(dspy.Module):
def __init__(self):
self.respond = dspy.ChainOfThought(GenerateResponse)
self.check = dspy.Predict(SafetyCheck)
def forward(self, question):
result = self.respond(question=question)
safety = self.check(question=question, response=result.answer)
dspy.Assert(
safety.is_safe,
f"Response flagged as unsafe: {safety.concern}. Regenerate."
)
return resultpython
class SafetyCheck(dspy.Signature):
"""检查响应是否安全合规。"""
question: str = dspy.InputField()
response: str = dspy.InputField()
is_safe: bool = dspy.OutputField()
concern: str = dspy.OutputField(desc="若不安全,说明安全问题;若安全则留空")
class SafetyCheckedResponder(dspy.Module):
def __init__(self):
self.respond = dspy.ChainOfThought(GenerateResponse)
self.check = dspy.Predict(SafetyCheck)
def forward(self, question):
result = self.respond(question=question)
safety = self.check(question=question, response=result.answer)
dspy.Assert(
safety.is_safe,
f"响应被标记为不安全:{safety.concern}。请重新生成。"
)
return resultStep 6: Generate → Filter → Pick best (ensemble pattern)
步骤6:生成→过滤→择优(集成模式)
For high-stakes outputs, generate multiple candidates and filter:
python
class FilteredEnsemble(dspy.Module):
def __init__(self, num_candidates=5):
self.generators = [dspy.ChainOfThought(GenerateAnswer) for _ in range(num_candidates)]
self.judge = dspy.ChainOfThought(RankAnswers)
def forward(self, question):
candidates = []
for gen in self.generators:
try:
result = gen(question=question)
# Only keep candidates that pass basic checks
if len(result.answer) > 0 and len(result.answer.split()) < 200:
candidates.append(result.answer)
except Exception:
continue
dspy.Assert(len(candidates) > 0, "No valid candidates generated")
return self.judge(question=question, candidates=candidates)
class RankAnswers(dspy.Signature):
"""Pick the best answer from the candidates."""
question: str = dspy.InputField()
candidates: list[str] = dspy.InputField()
best_answer: str = dspy.OutputField()针对高风险场景,可生成多个候选结果后再过滤:
python
class FilteredEnsemble(dspy.Module):
def __init__(self, num_candidates=5):
self.generators = [dspy.ChainOfThought(GenerateAnswer) for _ in range(num_candidates)]
self.judge = dspy.ChainOfThought(RankAnswers)
def forward(self, question):
candidates = []
for gen in self.generators:
try:
result = gen(question=question)
# 仅保留通过基础检查的候选结果
if len(result.answer) > 0 and len(result.answer.split()) < 200:
candidates.append(result.answer)
except Exception:
continue
dspy.Assert(len(candidates) > 0, "未生成有效候选结果")
return self.judge(question=question, candidates=candidates)
class RankAnswers(dspy.Signature):
"""从候选结果中挑选最优回答。"""
question: str = dspy.InputField()
candidates: list[str] = dspy.InputField()
best_answer: str = dspy.OutputField()How backtracking works
回溯机制原理
When fails, DSPy doesn't just retry blindly:
dspy.Assert- The assertion failure is caught
- The error message is fed back to the LM as additional context
- The LM retries with this feedback (e.g., "your answer was 350 words, must be under 280")
- This repeats up to times (default: 2)
max_backtrack_attempts - If all retries fail, the assertion raises an error
This is why specific error messages matter — they're the model's self-correction instructions. "Response is 350 words, must be under 280" is much more useful than "too long."
When combined with optimization (), the model learns to satisfy constraints on the first try, reducing retries in production.
/ai-improving-accuracy当失败时,DSPy不会盲目重试:
dspy.Assert- 捕获断言失败事件
- 将错误信息作为额外上下文反馈给大模型
- 大模型基于该反馈重新生成(例如:“你的回答有350词,需控制在280词以内”)
- 重复上述过程,直到达到次数(默认:2次)
max_backtrack_attempts - 若所有重试均失败,则断言抛出错误
这也是明确错误信息至关重要的原因——它们是模型自我修正的指令。“回答有350词,需控制在280词以内”比“太长了”有用得多。
与优化功能()结合使用时,模型会学习在首次生成时就满足约束条件,从而减少生产环境中的重试次数。
/ai-improving-accuracyKey patterns
核心模式总结
- Assert for hard requirements — format, length, safety. DSPy retries automatically.
- Suggest for soft preferences — style, tone, detail level. Won't block but nudges.
- Pydantic for structure — catches malformed output automatically.
- Self-verification for facts — ask the AI "is this grounded in the sources?"
- Cross-checking for reliability — generate twice independently, compare.
- Regex for sensitive data — block SSNs, API keys, passwords in output.
- Ensemble for high stakes — generate many, filter, pick the best.
- 强制断言满足硬性要求——格式、长度、安全。DSPy自动重试。
- 温和提示满足软性偏好——风格、语气、详细程度。不会拦截但会给出改进建议。
- Pydantic确保结构合规——自动捕获格式错误的输出。
- 自我验证确保事实准确性——让AI回答“该内容是否基于源数据?”
- 交叉核查提升可靠性——独立生成两次后对比结果。
- 正则表达式拦截敏感数据——在输出中拦截社保号、API密钥、密码等。
- 集成模式应对高风险场景——生成多个结果、过滤、挑选最优。
Checklist: what to check
检查项清单
| Check | When to use | How |
|---|---|---|
| Non-empty output | Always | |
| Length limits | User-facing text | |
| Valid format | Structured output | Pydantic model + |
| Grounded in sources | RAG / doc search | Verification signature |
| No sensitive data | Any user-facing output | Regex patterns |
| Safe content | Public-facing apps | AI safety judge |
| Consistent | Critical decisions | Cross-check with two generations |
| High quality | High-stakes outputs | Ensemble + ranking |
| 检查内容 | 适用场景 | 实现方式 |
|---|---|---|
| 非空输出 | 所有场景 | |
| 长度限制 | 面向用户的文本 | |
| 格式合规 | 结构化输出 | Pydantic模型 + |
| 基于源数据 | RAG/文档检索 | 验证签名 |
| 无敏感数据 | 所有面向用户的输出 | 正则表达式规则 |
| 内容安全 | 公开应用 | AI安全审核员 |
| 结果一致 | 关键决策场景 | 两次生成结果交叉对比 |
| 高质量输出 | 高风险场景 | 集成生成 + 排序择优 |
Additional resources
额外资源
- Use for citation enforcement, faithfulness verification, and grounding AI in facts
/ai-stopping-hallucinations - Use for defining and enforcing content policies, format rules, and business constraints
/ai-following-rules - Use to wire checks into multi-step systems
/ai-building-pipelines - Use for output consistency (not correctness)
/ai-making-consistent - Use to stress-test your guardrails with adversarial attacks
/ai-testing-safety - Need to evaluate human work against criteria? Use
/ai-scoring - Next: to measure and improve quality
/ai-improving-accuracy
- 如需引用强制、事实一致性验证及AI事实锚定,可使用
/ai-stopping-hallucinations - 如需定义与执行内容政策、格式规则及业务约束,可使用
/ai-following-rules - 如需将检查机制接入多步骤系统,可使用
/ai-building-pipelines - 如需确保输出一致性(非正确性),可使用
/ai-making-consistent - 如需通过对抗攻击测试防护护栏,可使用
/ai-testing-safety - 如需根据标准评估人工工作,可使用
/ai-scoring - 下一步:使用衡量并提升输出质量
/ai-improving-accuracy