modeio-guardrail

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Run safety checks for instructions and skill repos

为指令和Skill仓库执行安全检查

Use this skill to gate risky operations behind a real-time safety assessment, or to scan third-party skill repos before installation.
使用本Skill将风险操作管控在实时安全评估之后,或在安装前扫描第三方Skill仓库。

Tool routing

工具路由

  1. For executable instructions, use the backend-powered
    scripts/safety.py
    flow.
  2. For requests like "scan this skill repo" or "is this repo dangerous", run the Skill Safety Assessment contract at
    prompts/static_repo_scan.md
    .
  3. Skill Safety Assessment is static analysis only. Never execute code, install dependencies, or run hooks in the target repository.
  4. For Skill Safety Assessment, run deterministic script evaluation first (
    evaluate
    ), then pass highlights into the prompt contract.
  1. 对于可执行指令,使用后端驱动的
    scripts/safety.py
    流程。
  2. 对于类似“扫描这个Skill仓库”或“这个仓库是否有危险”的请求,运行
    prompts/static_repo_scan.md
    中的Skill安全评估合约。
  3. Skill安全评估仅为静态分析。切勿在目标仓库中执行代码、安装依赖或运行钩子。
  4. 对于Skill安全评估,首先运行确定性脚本评估(
    evaluate
    ),然后将高亮结果传入提示词合约。

Dependencies

依赖

  • requests
    is required for
    scripts/safety.py
    because it makes backend API calls.
  • scripts/skill_safety_assessment.py
    does not require
    requests
    for basic local repository evaluation.
  • For repo-local setup from the repo root:
bash
python scripts/bootstrap_env.py
python scripts/doctor_env.py
  • scripts/safety.py
    需要
    requests
    依赖,因为它需要发起后端API调用。
  • scripts/skill_safety_assessment.py
    进行基础本地仓库评估不需要
    requests
    依赖。
  • 从仓库根目录进行本地环境配置:
bash
python scripts/bootstrap_env.py
python scripts/doctor_env.py

Instruction safety execution policy

指令安全执行策略

  1. Always run
    scripts/safety.py
    with
    --json
    for structured output.
  2. Run the check before executing the instruction, not after.
  3. Each instruction must trigger a fresh backend call. Do not reuse cached or historical results.
  4. For any state-changing instruction (
    delete
    ,
    overwrite
    ,
    permission change
    ,
    deploy
    ,
    schema change
    ), always pass both
    --context
    and
    --target
    .
  5. scripts/safety.py
    accepts
    --context
    and
    --target
    as optional flags, so this requirement is enforced by policy, not by automatic CLI blocking.
  6. Use the Context Contract below exactly. Do not send free-form
    --context
    values like
    "production"
    only.
  7. If policy-required context or target is missing, treat the instruction as unverified and ask for the missing fields before execution.
  8. If an instruction contains multiple operations, check the riskiest one.
  1. 始终使用
    --json
    参数运行
    scripts/safety.py
    以获取结构化输出。
  2. 安全检查必须在执行指令之前运行,而非之后。
  3. 每条指令都必须触发一次全新的后端调用,不得复用缓存或历史结果。
  4. 对于任何变更状态的指令(
    delete
    overwrite
    permission change
    deploy
    schema change
    ),必须同时传入
    --context
    --target
    参数。
  5. scripts/safety.py
    --context
    --target
    设为可选参数,因此该要求由政策强制约束,而非CLI自动拦截。
  6. 严格遵循下文的上下文合约,不要发送仅为
    "production"
    这类自由格式的
    --context
    值。
  7. 如果缺少政策要求的上下文或目标参数,将指令视为未验证,执行前先向用户索要缺失的字段。
  8. 如果一条指令包含多个操作,检查风险最高的操作。

Context contract (policy-required for state-changing instructions)

上下文合约(状态变更指令的强制政策要求)

Pass
--context
as a JSON string with this exact shape:
json
{
  "environment": "local-dev|ci|staging|production|unknown",
  "operation_intent": "read-only|cleanup|maintenance|migration|permission-change|destructive|unknown",
  "scope": "single-resource|bounded-batch|broad|unknown",
  "data_sensitivity": "public|internal|sensitive|regulated|unknown",
  "rollback": "easy|partial|none|unknown",
  "change_control": "ticket:<id>|approved-manual|none|unknown"
}
Rules:
  1. Include all six keys. If a value is unknown, set it to
    unknown
    instead of omitting the key.
  2. --target
    must be a concrete resource identifier (absolute file path, table name, service name, or URL). Avoid generic targets such as
    "database"
    .
  3. For a file deletion request that should usually be allowed, use:
    environment=local-dev|ci
    ,
    operation_intent=cleanup
    ,
    scope=single-resource
    ,
    data_sensitivity=public|internal
    , and
    rollback=easy
    .
  4. If those conditions are not met, expect stricter output (
    approved=false
    or higher
    risk_level
    ) and require explicit user confirmation.
--context
作为符合以下结构的JSON字符串传入:
json
{
  "environment": "local-dev|ci|staging|production|unknown",
  "operation_intent": "read-only|cleanup|maintenance|migration|permission-change|destructive|unknown",
  "scope": "single-resource|bounded-batch|broad|unknown",
  "data_sensitivity": "public|internal|sensitive|regulated|unknown",
  "rollback": "easy|partial|none|unknown",
  "change_control": "ticket:<id>|approved-manual|none|unknown"
}
规则:
  1. 需包含全部6个键。如果某个值未知,将其设为
    unknown
    而非省略该键。
  2. --target
    必须是具体的资源标识符(绝对文件路径、表名、服务名或URL)。避免使用泛化的目标,例如
    "database"
  3. 对于通常会被允许的文件删除请求,使用:
    environment=local-dev|ci
    operation_intent=cleanup
    scope=single-resource
    data_sensitivity=public|internal
    rollback=easy
  4. 如果不满足这些条件,预计会得到更严格的输出(
    approved=false
    或更高的
    risk_level
    ),需要用户明确确认。

Action policy

执行策略

This table applies to
scripts/safety.py
responses.
Use the result to gate execution. Never silently ignore a safety check result.
approved
risk_level
Agent action
true
low
Proceed. No user prompt needed.
true
medium
Proceed. Mention the risk and recommendation to the user.
false
medium
Warn user with
concerns
and
recommendation
. Proceed only with explicit user confirmation.
false
high
Block execution. Show
concerns
and
recommendation
. Ask user for explicit override.
false
critical
Block execution. Show full assessment. Require user to explicitly acknowledge the risk before proceeding.
Additional signals:
  • is_destructive: true
    combined with
    is_reversible: false
    : always surface the recommendation to the user, regardless of approval status.
  • If the safety check itself fails (network error, API error): warn the user that safety could not be verified. Do not silently proceed with unverified instructions.
下表适用于
scripts/safety.py
的返回结果。
使用结果管控执行流程,绝不能静默忽略安全检查结果。
approved
risk_level
Agent动作
true
low
继续执行,无需提示用户。
true
medium
继续执行,向用户告知风险和建议。
false
medium
向用户警告
concerns
recommendation
内容,仅在用户明确确认后继续执行。
false
high
阻断执行,展示
concerns
recommendation
内容,请求用户明确覆盖权限。
false
critical
阻断执行,展示完整评估结果,要求用户在执行前明确确认风险。
额外信号:
  • is_destructive: true
    搭配
    is_reversible: false
    :无论审批状态如何,始终向用户展示建议。
  • 如果安全检查本身失败(网络错误、API错误):警告用户无法验证安全性,切勿在未验证指令的情况下静默执行。

Scripts

脚本

scripts/safety.py

scripts/safety.py

  • -i, --input
    : required, instruction text to evaluate (whitespace-only rejected)
  • -c, --context
    : policy-required for state-changing instructions (CLI accepts it as optional); JSON string following the Context Contract above
  • -t, --target
    : policy-required for state-changing instructions (CLI accepts it as optional); concrete operation target (file path, table name, service name, URL)
  • --json
    : output unified JSON envelope for machine consumption
  • Endpoint:
    https://safety-cf.modeio.ai/api/cf/safety
    (override via
    SAFETY_API_URL
    )
  • Retries: automatic retry on HTTP 502/503/504 and connection/timeout errors (up to 2 retries with exponential backoff)
  • Request timeout: 60 seconds per attempt
bash
python scripts/safety.py -i "Delete /tmp/cache/build-123.log" \
  -c '{"environment":"local-dev","operation_intent":"cleanup","scope":"single-resource","data_sensitivity":"internal","rollback":"easy","change_control":"none"}' \
  -t "/tmp/cache/build-123.log" --json

python scripts/safety.py -i "DROP TABLE users" \
  -c '{"environment":"production","operation_intent":"destructive","scope":"broad","data_sensitivity":"regulated","rollback":"none","change_control":"ticket:DB-9021"}' \
  -t "postgres://prod/maindb.users" --json

python scripts/safety.py -i "chmod 777 /etc/passwd" \
  -c '{"environment":"production","operation_intent":"permission-change","scope":"single-resource","data_sensitivity":"regulated","rollback":"partial","change_control":"ticket:SEC-118"}' \
  -t "/etc/passwd" --json

python scripts/safety.py -i "List all running containers and display their resource usage" --json
  • -i, --input
    :必填,待评估的指令文本(仅空白内容会被拒绝)
  • -c, --context
    :状态变更指令的政策必填项(CLI接受为可选参数);符合上述上下文合约的JSON字符串
  • -t, --target
    :状态变更指令的政策必填项(CLI接受为可选参数);具体的操作目标(文件路径、表名、服务名、URL)
  • --json
    :输出统一的JSON结构供机器读取
  • 端点:
    https://safety-cf.modeio.ai/api/cf/safety
    (可通过
    SAFETY_API_URL
    覆盖)
  • 重试:遇到HTTP 502/503/504和连接/超时错误时自动重试(最多2次指数退避重试)
  • 请求超时:每次尝试60秒
bash
python scripts/safety.py -i "Delete /tmp/cache/build-123.log" \
  -c '{"environment":"local-dev","operation_intent":"cleanup","scope":"single-resource","data_sensitivity":"internal","rollback":"easy","change_control":"none"}' \
  -t "/tmp/cache/build-123.log" --json

python scripts/safety.py -i "DROP TABLE users" \
  -c '{"environment":"production","operation_intent":"destructive","scope":"broad","data_sensitivity":"regulated","rollback":"none","change_control":"ticket:DB-9021"}' \
  -t "postgres://prod/maindb.users" --json

python scripts/safety.py -i "chmod 777 /etc/passwd" \
  -c '{"environment":"production","operation_intent":"permission-change","scope":"single-resource","data_sensitivity":"regulated","rollback":"partial","change_control":"ticket:SEC-118"}' \
  -t "/etc/passwd" --json

python scripts/safety.py -i "List all running containers and display their resource usage" --json

scripts/skill_safety_assessment.py

scripts/skill_safety_assessment.py

  • evaluate
    : authoritative v2 layered evaluator with deterministic evidence IDs, integrity fingerprinting, and risk scoring
    • Native first-layer gate: GitHub metadata/README/issue-search precheck runs by default and hard-rejects on high-risk attack-demo/malware signals before local file scan.
  • scan
    : compatibility alias to
    evaluate
    for existing automation
  • prompt
    : renders prompt payload with script highlights and structured scan JSON
  • validate
    : validates model output against scan evidence IDs (
    evidence_refs
    ), required highlights, and score/decision consistency checks
  • adjudicate
    : context-aware LLM adjudication bridge (prompt generation + merge decisions back into deterministic score/decision)
Context profile (optional, no user identity required):
json
{
  "environment": "local-dev|ci|staging|production|unknown",
  "execution_mode": "read-only|build-test|install|deploy|mutating|unknown",
  "risk_tolerance": "strict|balanced|permissive",
  "data_sensitivity": "public|internal|sensitive|regulated|unknown"
}
bash
undefined
  • evaluate
    :权威v2分层评估器,具备确定性证据ID、完整性指纹识别和风险评分功能
    • 原生第一层管控:默认运行GitHub元数据/README/issue搜索预检查,在本地文件扫描前如果发现高风险攻击演示/恶意软件信号会直接拒绝
  • scan
    evaluate
    的兼容别名,用于现有自动化流程
  • prompt
    :渲染包含脚本高亮结果和结构化扫描JSON的提示词 payload
  • validate
    :对照扫描证据ID(
    evidence_refs
    )、必填高亮项和评分/决策一致性校验模型输出
  • adjudicate
    :上下文感知的LLM裁决桥接(生成提示词 + 将决策合并回确定性评分/决策)
上下文配置(可选,无需用户身份):
json
{
  "environment": "local-dev|ci|staging|production|unknown",
  "execution_mode": "read-only|build-test|install|deploy|mutating|unknown",
  "risk_tolerance": "strict|balanced|permissive",
  "data_sensitivity": "public|internal|sensitive|regulated|unknown"
}
bash
undefined

1) Deterministic layered evaluation (v2)

1) Deterministic layered evaluation (v2)

python scripts/skill_safety_assessment.py evaluate --target-repo /path/to/repo --json > /tmp/skill_scan.json python scripts/skill_safety_assessment.py evaluate --target-repo /path/to/repo --context-profile '{"environment":"ci","execution_mode":"build-test","risk_tolerance":"balanced","data_sensitivity":"internal"}' --json > /tmp/skill_scan.json python scripts/skill_safety_assessment.py evaluate --target-repo /path/to/repo --github-osint-timeout 8 --json > /tmp/skill_scan.json python scripts/skill_safety_assessment.py evaluate --target-repo /path/to/repo --context-profile-file ./context_profile.json --output /tmp/skill_scan.json --json
python scripts/skill_safety_assessment.py evaluate --target-repo /path/to/repo --json > /tmp/skill_scan.json python scripts/skill_safety_assessment.py evaluate --target-repo /path/to/repo --context-profile '{"environment":"ci","execution_mode":"build-test","risk_tolerance":"balanced","data_sensitivity":"internal"}' --json > /tmp/skill_scan.json python scripts/skill_safety_assessment.py evaluate --target-repo /path/to/repo --github-osint-timeout 8 --json > /tmp/skill_scan.json python scripts/skill_safety_assessment.py evaluate --target-repo /path/to/repo --context-profile-file ./context_profile.json --output /tmp/skill_scan.json --json

(compat) legacy alias still supported

(compat) legacy alias still supported

python scripts/skill_safety_assessment.py scan --target-repo /path/to/repo --json > /tmp/skill_scan.json
python scripts/skill_safety_assessment.py scan --target-repo /path/to/repo --json > /tmp/skill_scan.json

2) Build prompt payload with highlights + full findings (recommended for strict evidence_refs linking)

2) Build prompt payload with highlights + full findings (recommended for strict evidence_refs linking)

python scripts/skill_safety_assessment.py prompt --target-repo /path/to/repo --scan-file /tmp/skill_scan.json --include-full-findings
python scripts/skill_safety_assessment.py prompt --target-repo /path/to/repo --scan-file /tmp/skill_scan.json --include-full-findings

3) Validate model output for evidence linkage + integrity

3) Validate model output for evidence linkage + integrity

python scripts/skill_safety_assessment.py validate --scan-file /tmp/skill_scan.json --assessment-file /tmp/assessment.md --json
python scripts/skill_safety_assessment.py validate --scan-file /tmp/skill_scan.json --assessment-file /tmp/assessment.md --json

--rescan-on-validate requires --target-repo

--rescan-on-validate requires --target-repo

python scripts/skill_safety_assessment.py validate --scan-file /tmp/skill_scan.json --assessment-file /tmp/assessment.md --target-repo /path/to/repo --rescan-on-validate --json
python scripts/skill_safety_assessment.py validate --scan-file /tmp/skill_scan.json --assessment-file /tmp/assessment.md --target-repo /path/to/repo --rescan-on-validate --json

4) Optional adjudication bridge (LLM interprets context, engine keeps deterministic control)

4) Optional adjudication bridge (LLM interprets context, engine keeps deterministic control)

python scripts/skill_safety_assessment.py adjudicate --scan-file /tmp/skill_scan.json python scripts/skill_safety_assessment.py adjudicate --scan-file /tmp/skill_scan.json --assessment-file /tmp/adjudication.json --json
undefined
python scripts/skill_safety_assessment.py adjudicate --scan-file /tmp/skill_scan.json python scripts/skill_safety_assessment.py adjudicate --scan-file /tmp/skill_scan.json --assessment-file /tmp/adjudication.json --json
undefined

Output contract

输出合约

Success response (
--json
)

成功响应 (
--json
)

json
{
  "success": true,
  "tool": "modeio-guardrail",
  "mode": "api",
  "data": {
    "approved": false,
    "risk_level": "critical",
    "risk_types": ["data loss"],
    "concerns": ["Irreversible destructive operation targeting all user data"],
    "recommendation": "Create a backup before deletion. Use staged rollback plan.",
    "is_destructive": true,
    "is_reversible": false
  }
}
Response fields in
data
:
FieldTypeValuesMeaning
approved
boolean
true
/
false
Whether execution is recommended
risk_level
string
low
/
medium
/
high
/
critical
Severity of identified risks
risk_types
string[]
open-endedRisk categories (e.g.,
"data loss"
,
"injection attacks"
,
"unauthorized access"
,
"denial-of-service"
)
concerns
string[]
open-endedSpecific risk points in natural language
recommendation
string
open-endedSuggested safer alternative or mitigation
is_destructive
boolean
true
/
false
Whether the action involves destruction (deletion, overwrite, system modification)
is_reversible
boolean
true
/
false
Whether the action can be rolled back
Any field may be
null
if the backend could not determine it. Treat
null
in
approved
as
false
.
json
{
  "success": true,
  "tool": "modeio-guardrail",
  "mode": "api",
  "data": {
    "approved": false,
    "risk_level": "critical",
    "risk_types": ["data loss"],
    "concerns": ["Irreversible destructive operation targeting all user data"],
    "recommendation": "Create a backup before deletion. Use staged rollback plan.",
    "is_destructive": true,
    "is_reversible": false
  }
}
data
中的响应字段说明:
字段类型取值含义
approved
boolean
true
/
false
是否建议执行
risk_level
string
low
/
medium
/
high
/
critical
识别到的风险严重程度
risk_types
string[]
开放值风险分类(例如
"data loss"
"injection attacks"
"unauthorized access"
"denial-of-service"
concerns
string[]
开放值自然语言描述的具体风险点
recommendation
string
开放值建议的更安全替代方案或缓解措施
is_destructive
boolean
true
/
false
操作是否涉及破坏性操作(删除、覆盖、系统修改)
is_reversible
boolean
true
/
false
操作是否可回滚
如果后端无法确定某个字段的值,该字段可能为
null
approved
字段为
null
时视为
false

Failure envelope (
--json
)

失败响应结构 (
--json
)

json
{
  "success": false,
  "tool": "modeio-guardrail",
  "mode": "api",
  "error": {
    "type": "network_error",
    "message": "safety request failed: ConnectionError"
  }
}
Error types:
validation_error
(empty input),
dependency_error
(missing local package such as
requests
),
network_error
(HTTP/connection failure),
api_error
(backend returned error payload).
Exit code is non-zero on any failure.
json
{
  "success": false,
  "tool": "modeio-guardrail",
  "mode": "api",
  "error": {
    "type": "network_error",
    "message": "safety request failed: ConnectionError"
  }
}
错误类型:
validation_error
(输入为空)、
dependency_error
(缺少本地包,例如
requests
)、
network_error
(HTTP/连接失败)、
api_error
(后端返回错误 payload)。
任何失败情况下退出码均为非零。

Failure policy

失败处理政策

Safety verification failures must never be silently ignored.
  • Network/API error: Tell the user the safety check could not be completed. Present the original instruction and ask whether to proceed without verification.
  • Validation error (empty input): Fix the input and retry before executing anything.
  • Unexpected response (null or missing fields): Treat as unverified. Warn the user.
  • Never assume an instruction is safe because the check failed to run.
安全验证失败绝不能被静默忽略。
  • 网络/API错误:告知用户无法完成安全检查,展示原始指令并询问是否在未验证的情况下继续执行。
  • 验证错误(输入为空):修复输入并重试后再执行任何操作。
  • 意外响应(字段为null或缺失):视为未验证,警告用户。
  • 绝不能因为检查运行失败就假设指令是安全的。

Skill Safety Assessment policy (static prompt contract)

Skill安全评估政策(静态提示词合约)

  1. Use
    prompts/static_repo_scan.md
    as the strict contract.
  2. Run
    scripts/skill_safety_assessment.py evaluate
    first (or
    scan
    compatibility alias) and pass its highlights into prompt input.
  3. When model output must include strict
    evidence_refs
    , render prompt input with
    --include-full-findings
    so scan evidence IDs and snippets are available in
    SCRIPT_SCAN_JSON
    .
  4. Every finding must include
    path:line
    evidence, exact snippet quote, and
    evidence_refs
    linked to scan evidence IDs.
  5. Always include all required highlight evidence IDs from scan output in final findings.
  6. Keep decision/score consistent with referenced evidence severity and coverage constraints.
  7. Use
    adjudicate
    when context interpretation is required (docs/examples/tests vs runtime/install paths).
  8. Return one of:
    reject
    ,
    caution
    , or
    approve
    .
  9. If coverage is partial or evidence is insufficient, return
    caution
    with explicit coverage note.
  10. Include a prioritized remediation plan so users can fix and re-scan quickly.
  1. 严格遵循
    prompts/static_repo_scan.md
    合约。
  2. 首先运行
    scripts/skill_safety_assessment.py evaluate
    (或兼容别名
    scan
    ),将其高亮结果传入提示词输入。
  3. 当模型输出必须包含严格的
    evidence_refs
    时,使用
    --include-full-findings
    渲染提示词输入,这样扫描证据ID和代码片段就会在
    SCRIPT_SCAN_JSON
    中可用。
  4. 每个发现结果必须包含
    path:line
    证据、准确的片段引用,以及与扫描证据ID关联的
    evidence_refs
  5. 最终发现结果中必须包含扫描输出的所有必填高亮证据ID。
  6. 保持决策/评分与引用证据的严重程度和覆盖范围约束一致。
  7. 当需要上下文解释时(文档/示例/测试 vs 运行时/安装路径)使用
    adjudicate
  8. 返回以下三者之一:
    reject
    caution
    approve
  9. 如果覆盖范围不完整或证据不足,返回
    caution
    并附上明确的覆盖范围说明。
  10. 包含优先级排序的修复方案,方便用户修复后快速重新扫描。

When not to use

不适用场景

  • For PII redaction or anonymization — use
    modeio-redact
    instead.
  • For tasks with no executable instruction or repository target to evaluate (pure discussion, documentation, questions).
  • For operations that are clearly read-only (listing files, reading configs,
    git status
    ).
  • 用于PII脱敏或匿名化——请改用
    modeio-redact
  • 没有可执行指令或仓库目标可供评估的任务(纯讨论、文档、问题)。
  • 明显只读的操作(列出文件、读取配置、
    git status
    )。

Resources

资源

  • scripts/safety.py
    — CLI entry point for instruction safety checks
  • scripts/skill_safety_assessment.py
    — CLI entry point for skill repo assessment (evaluate/scan/prompt/validate/adjudicate)
  • prompts/static_repo_scan.md
    — Skill Safety Assessment prompt contract
  • ARCHITECTURE.md
    — package boundaries and compatibility notes
  • SAFETY_API_URL
    env var — optional endpoint override (default:
    https://safety-cf.modeio.ai/api/cf/safety
    )
  • scripts/safety.py
    — 指令安全检查的CLI入口
  • scripts/skill_safety_assessment.py
    — Skill仓库评估的CLI入口(evaluate/scan/prompt/validate/adjudicate)
  • prompts/static_repo_scan.md
    — Skill安全评估提示词合约
  • ARCHITECTURE.md
    — 包边界和兼容性说明
  • SAFETY_API_URL
    环境变量 — 可选的端点覆盖(默认:
    https://safety-cf.modeio.ai/api/cf/safety