guard
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMANDATORY PREPARATION
强制准备
Invoke {{command_prefix}}agent-workflow — it contains workflow principles, anti-patterns, and the Context Gathering Protocol. Follow the protocol before proceeding — if no workflow context exists yet, you MUST run {{command_prefix}}teach-maestro first.
Consult the guardrails-safety reference in the agent-workflow skill for the full defense-in-depth framework.
Add safety boundaries to a workflow. Guards protect against malicious inputs, unintended outputs, data leakage, cost explosion, and all the ways an autonomous system can go wrong in the real world.
调用{{command_prefix}}agent-workflow —— 它包含工作流原则、反模式以及上下文收集协议。在继续操作前请遵循该协议,如果尚无工作流上下文,你必须先运行{{command_prefix}}teach-maestro。
请查阅agent-workflow技能中的护栏安全参考文档,获取完整的深度防御框架。
为工作流添加安全边界。安全护栏可防范恶意输入、非预期输出、数据泄露、成本暴增,以及自治系统在真实环境中可能出现的各类异常问题。
Threat Assessment
威胁评估
Before adding guards, understand what you're protecting against:
| Threat | Risk Level | Guard Type |
|---|---|---|
| Prompt injection | High | Input sanitization, instruction hierarchy |
| PII leakage | High | Output filtering, data masking |
| Cost explosion | High | Token budgets, rate limits |
| Unauthorized actions | Medium | Permission scoping, confirmation gates |
| Hallucination | Medium | Source attribution, fact checking |
| Service abuse | Medium | Rate limiting, authentication |
在添加护栏前,先明确你需要防范的风险:
| 威胁 | 风险等级 | 护栏类型 |
|---|---|---|
| 提示词注入 | 高 | 输入清理、指令层级控制 |
| 个人可识别信息(PII)泄露 | 高 | 输出过滤、数据掩码 |
| 成本暴增 | 高 | Token预算限制、速率限制 |
| 未授权操作 | 中 | 权限范围管控、确认闸门 |
| 幻觉输出 | 中 | 来源归因、事实校验 |
| 服务滥用 | 中 | 速率限制、身份认证 |
Guard Implementation
护栏实现
Input Guards
text
Before processing any input:
1. Validate against schema (reject malformed)
2. Check size limits (reject oversized)
3. Sanitize for injection patterns
4. Rate limit check (reject if exceeded)
5. Authentication/authorization checkOutput Guards
text
Before returning any output:
1. Schema validation (format correct?)
2. PII scan (names, emails, SSNs, etc.)
3. Content policy check
4. Confidence threshold check
5. Source attribution present?Cost Guards
text
Before every model/API call:
1. Check remaining budget
2. Estimate request cost
3. If estimate > remaining budget → reject or use cheaper alternative
4. After call → update spent amount
5. Circuit breaker check (too many failures?)Permission Guards
text
For every tool call:
1. Is this tool allowed for this user/context?
2. Is this a destructive operation? → require confirmation
3. Is this accessing data the user is authorized for?
4. Log the access for audit trail输入护栏
text
Before processing any input:
1. Validate against schema (reject malformed)
2. Check size limits (reject oversized)
3. Sanitize for injection patterns
4. Rate limit check (reject if exceeded)
5. Authentication/authorization check输出护栏
text
Before returning any output:
1. Schema validation (format correct?)
2. PII scan (names, emails, SSNs, etc.)
3. Content policy check
4. Confidence threshold check
5. Source attribution present?成本护栏
text
Before every model/API call:
1. Check remaining budget
2. Estimate request cost
3. If estimate > remaining budget → reject or use cheaper alternative
4. After call → update spent amount
5. Circuit breaker check (too many failures?)权限护栏
text
For every tool call:
1. Is this tool allowed for this user/context?
2. Is this a destructive operation? → require confirmation
3. Is this accessing data the user is authorized for?
4. Log the access for audit trailGuard Checklist
护栏检查清单
- All inputs validated before processing
- PII detection on all outputs
- Cost ceiling set with enforcement
- Prompt injection defenses active
- Destructive operations require confirmation
- All access logged for audit
- Circuit breakers on external services
- Rate limits on all endpoints
- 所有输入在处理前已完成校验
- 所有输出已完成PII检测
- 已设置并执行成本上限
- 提示词注入防御已生效
- 破坏性操作需要确认才可执行
- 所有访问已记录用于审计
- 外部服务已配置熔断机制
- 所有端点已配置速率限制
Recommended Next Step
推荐后续步骤
After adding guards, run with adversarial test scenarios to verify guards hold under attack.
{{command_prefix}}evaluateNEVER:
- Deploy without input validation
- Trust model output for high-stakes decisions without verification
- Run without cost controls
- Skip logging (you need the audit trail)
- Assume the model will follow safety instructions 100% of the time
添加护栏后,使用对抗测试场景运行,验证护栏在攻击下是否有效。
{{command_prefix}}evaluate绝对禁止:
- 未做输入校验就部署
- 未经验证就信任模型输出用于高风险决策
- 运行时未配置成本控制
- 跳过日志记录(你需要审计追踪能力)
- 假定模型会100%遵循安全指令