guard

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

MANDATORY PREPARATION

强制准备

Invoke {{command_prefix}}agent-workflow — it contains workflow principles, anti-patterns, and the Context Gathering Protocol. Follow the protocol before proceeding — if no workflow context exists yet, you MUST run {{command_prefix}}teach-maestro first.
Consult the guardrails-safety reference in the agent-workflow skill for the full defense-in-depth framework.

Add safety boundaries to a workflow. Guards protect against malicious inputs, unintended outputs, data leakage, cost explosion, and all the ways an autonomous system can go wrong in the real world.
调用{{command_prefix}}agent-workflow —— 它包含工作流原则、反模式以及上下文收集协议。在继续操作前请遵循该协议,如果尚无工作流上下文,你必须先运行{{command_prefix}}teach-maestro。
请查阅agent-workflow技能中的护栏安全参考文档,获取完整的深度防御框架。

为工作流添加安全边界。安全护栏可防范恶意输入、非预期输出、数据泄露、成本暴增,以及自治系统在真实环境中可能出现的各类异常问题。

Threat Assessment

威胁评估

Before adding guards, understand what you're protecting against:
ThreatRisk LevelGuard Type
Prompt injectionHighInput sanitization, instruction hierarchy
PII leakageHighOutput filtering, data masking
Cost explosionHighToken budgets, rate limits
Unauthorized actionsMediumPermission scoping, confirmation gates
HallucinationMediumSource attribution, fact checking
Service abuseMediumRate limiting, authentication
在添加护栏前,先明确你需要防范的风险:
威胁风险等级护栏类型
提示词注入输入清理、指令层级控制
个人可识别信息(PII)泄露输出过滤、数据掩码
成本暴增Token预算限制、速率限制
未授权操作权限范围管控、确认闸门
幻觉输出来源归因、事实校验
服务滥用速率限制、身份认证

Guard Implementation

护栏实现

Input Guards
text
Before processing any input:
1. Validate against schema (reject malformed)
2. Check size limits (reject oversized)
3. Sanitize for injection patterns
4. Rate limit check (reject if exceeded)
5. Authentication/authorization check
Output Guards
text
Before returning any output:
1. Schema validation (format correct?)
2. PII scan (names, emails, SSNs, etc.)
3. Content policy check
4. Confidence threshold check
5. Source attribution present?
Cost Guards
text
Before every model/API call:
1. Check remaining budget
2. Estimate request cost
3. If estimate > remaining budget → reject or use cheaper alternative
4. After call → update spent amount
5. Circuit breaker check (too many failures?)
Permission Guards
text
For every tool call:
1. Is this tool allowed for this user/context?
2. Is this a destructive operation? → require confirmation
3. Is this accessing data the user is authorized for?
4. Log the access for audit trail
输入护栏
text
Before processing any input:
1. Validate against schema (reject malformed)
2. Check size limits (reject oversized)
3. Sanitize for injection patterns
4. Rate limit check (reject if exceeded)
5. Authentication/authorization check
输出护栏
text
Before returning any output:
1. Schema validation (format correct?)
2. PII scan (names, emails, SSNs, etc.)
3. Content policy check
4. Confidence threshold check
5. Source attribution present?
成本护栏
text
Before every model/API call:
1. Check remaining budget
2. Estimate request cost
3. If estimate > remaining budget → reject or use cheaper alternative
4. After call → update spent amount
5. Circuit breaker check (too many failures?)
权限护栏
text
For every tool call:
1. Is this tool allowed for this user/context?
2. Is this a destructive operation? → require confirmation
3. Is this accessing data the user is authorized for?
4. Log the access for audit trail

Guard Checklist

护栏检查清单

  • All inputs validated before processing
  • PII detection on all outputs
  • Cost ceiling set with enforcement
  • Prompt injection defenses active
  • Destructive operations require confirmation
  • All access logged for audit
  • Circuit breakers on external services
  • Rate limits on all endpoints
  • 所有输入在处理前已完成校验
  • 所有输出已完成PII检测
  • 已设置并执行成本上限
  • 提示词注入防御已生效
  • 破坏性操作需要确认才可执行
  • 所有访问已记录用于审计
  • 外部服务已配置熔断机制
  • 所有端点已配置速率限制

Recommended Next Step

推荐后续步骤

After adding guards, run
{{command_prefix}}evaluate
with adversarial test scenarios to verify guards hold under attack.
NEVER:
  • Deploy without input validation
  • Trust model output for high-stakes decisions without verification
  • Run without cost controls
  • Skip logging (you need the audit trail)
  • Assume the model will follow safety instructions 100% of the time
添加护栏后,使用对抗测试场景运行
{{command_prefix}}evaluate
,验证护栏在攻击下是否有效。
绝对禁止
  • 未做输入校验就部署
  • 未经验证就信任模型输出用于高风险决策
  • 运行时未配置成本控制
  • 跳过日志记录(你需要审计追踪能力)
  • 假定模型会100%遵循安全指令