modeio-guardrail
Original:🇺🇸 English
Translated
30 scripts
Runs real-time safety analysis for instructions involving destructive operations, permission changes, irreversible actions, prompt injection, or compliance-sensitive operations. Evaluates risk level, destructiveness, and reversibility via backend API. Use when asked for safety check, risk assessment, security audit, destructive check, instruction audit, or Modeio safety scan. Also use proactively before executing any instruction that deletes data, modifies permissions, drops or truncates tables, deploys to production, or alters system state irreversibly. Also supports pre-install Skill Safety Assessment for third-party skill repositories via a static prompt contract.
1installs
Sourcemode-io/mode-io-skills
Added on
NPX Install
npx skill4agent add mode-io/mode-io-skills modeio-guardrailTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →Run safety checks for instructions and skill repos
Use this skill to gate risky operations behind a real-time safety assessment, or to scan third-party skill repos before installation.
Tool routing
- For executable instructions, use the backend-powered flow.
scripts/safety.py - For requests like "scan this skill repo" or "is this repo dangerous", run the Skill Safety Assessment contract at .
prompts/static_repo_scan.md - Skill Safety Assessment is static analysis only. Never execute code, install dependencies, or run hooks in the target repository.
- For Skill Safety Assessment, run deterministic script evaluation first (), then pass highlights into the prompt contract.
evaluate
Dependencies
- is required for
requestsbecause it makes backend API calls.scripts/safety.py - does not require
scripts/skill_safety_assessment.pyfor basic local repository evaluation.requests - For repo-local setup from the repo root:
bash
python scripts/bootstrap_env.py
python scripts/doctor_env.pyInstruction safety execution policy
- Always run with
scripts/safety.pyfor structured output.--json - Run the check before executing the instruction, not after.
- Each instruction must trigger a fresh backend call. Do not reuse cached or historical results.
- For any state-changing instruction (,
delete,overwrite,permission change,deploy), always pass bothschema changeand--context.--target - accepts
scripts/safety.pyand--contextas optional flags, so this requirement is enforced by policy, not by automatic CLI blocking.--target - Use the Context Contract below exactly. Do not send free-form values like
--contextonly."production" - If policy-required context or target is missing, treat the instruction as unverified and ask for the missing fields before execution.
- If an instruction contains multiple operations, check the riskiest one.
Context contract (policy-required for state-changing instructions)
Pass as a JSON string with this exact shape:
--contextjson
{
"environment": "local-dev|ci|staging|production|unknown",
"operation_intent": "read-only|cleanup|maintenance|migration|permission-change|destructive|unknown",
"scope": "single-resource|bounded-batch|broad|unknown",
"data_sensitivity": "public|internal|sensitive|regulated|unknown",
"rollback": "easy|partial|none|unknown",
"change_control": "ticket:<id>|approved-manual|none|unknown"
}Rules:
- Include all six keys. If a value is unknown, set it to instead of omitting the key.
unknown - must be a concrete resource identifier (absolute file path, table name, service name, or URL). Avoid generic targets such as
--target."database" - For a file deletion request that should usually be allowed, use: ,
environment=local-dev|ci,operation_intent=cleanup,scope=single-resource, anddata_sensitivity=public|internal.rollback=easy - If those conditions are not met, expect stricter output (or higher
approved=false) and require explicit user confirmation.risk_level
Action policy
This table applies to responses.
scripts/safety.pyUse the result to gate execution. Never silently ignore a safety check result.
| | Agent action |
|---|---|---|
| | Proceed. No user prompt needed. |
| | Proceed. Mention the risk and recommendation to the user. |
| | Warn user with |
| | Block execution. Show |
| | Block execution. Show full assessment. Require user to explicitly acknowledge the risk before proceeding. |
Additional signals:
- combined with
is_destructive: true: always surface the recommendation to the user, regardless of approval status.is_reversible: false - If the safety check itself fails (network error, API error): warn the user that safety could not be verified. Do not silently proceed with unverified instructions.
Scripts
scripts/safety.py
scripts/safety.py- : required, instruction text to evaluate (whitespace-only rejected)
-i, --input - : policy-required for state-changing instructions (CLI accepts it as optional); JSON string following the Context Contract above
-c, --context - : policy-required for state-changing instructions (CLI accepts it as optional); concrete operation target (file path, table name, service name, URL)
-t, --target - : output unified JSON envelope for machine consumption
--json - Endpoint: (override via
https://safety-cf.modeio.ai/api/cf/safety)SAFETY_API_URL - Retries: automatic retry on HTTP 502/503/504 and connection/timeout errors (up to 2 retries with exponential backoff)
- Request timeout: 60 seconds per attempt
bash
python scripts/safety.py -i "Delete /tmp/cache/build-123.log" \
-c '{"environment":"local-dev","operation_intent":"cleanup","scope":"single-resource","data_sensitivity":"internal","rollback":"easy","change_control":"none"}' \
-t "/tmp/cache/build-123.log" --json
python scripts/safety.py -i "DROP TABLE users" \
-c '{"environment":"production","operation_intent":"destructive","scope":"broad","data_sensitivity":"regulated","rollback":"none","change_control":"ticket:DB-9021"}' \
-t "postgres://prod/maindb.users" --json
python scripts/safety.py -i "chmod 777 /etc/passwd" \
-c '{"environment":"production","operation_intent":"permission-change","scope":"single-resource","data_sensitivity":"regulated","rollback":"partial","change_control":"ticket:SEC-118"}' \
-t "/etc/passwd" --json
python scripts/safety.py -i "List all running containers and display their resource usage" --jsonscripts/skill_safety_assessment.py
scripts/skill_safety_assessment.py- : authoritative v2 layered evaluator with deterministic evidence IDs, integrity fingerprinting, and risk scoring
evaluate- Native first-layer gate: GitHub metadata/README/issue-search precheck runs by default and hard-rejects on high-risk attack-demo/malware signals before local file scan.
- : compatibility alias to
scanfor existing automationevaluate - : renders prompt payload with script highlights and structured scan JSON
prompt - : validates model output against scan evidence IDs (
validate), required highlights, and score/decision consistency checksevidence_refs - : context-aware LLM adjudication bridge (prompt generation + merge decisions back into deterministic score/decision)
adjudicate
Context profile (optional, no user identity required):
json
{
"environment": "local-dev|ci|staging|production|unknown",
"execution_mode": "read-only|build-test|install|deploy|mutating|unknown",
"risk_tolerance": "strict|balanced|permissive",
"data_sensitivity": "public|internal|sensitive|regulated|unknown"
}bash
# 1) Deterministic layered evaluation (v2)
python scripts/skill_safety_assessment.py evaluate --target-repo /path/to/repo --json > /tmp/skill_scan.json
python scripts/skill_safety_assessment.py evaluate --target-repo /path/to/repo --context-profile '{"environment":"ci","execution_mode":"build-test","risk_tolerance":"balanced","data_sensitivity":"internal"}' --json > /tmp/skill_scan.json
python scripts/skill_safety_assessment.py evaluate --target-repo /path/to/repo --github-osint-timeout 8 --json > /tmp/skill_scan.json
python scripts/skill_safety_assessment.py evaluate --target-repo /path/to/repo --context-profile-file ./context_profile.json --output /tmp/skill_scan.json --json
# (compat) legacy alias still supported
python scripts/skill_safety_assessment.py scan --target-repo /path/to/repo --json > /tmp/skill_scan.json
# 2) Build prompt payload with highlights + full findings (recommended for strict evidence_refs linking)
python scripts/skill_safety_assessment.py prompt --target-repo /path/to/repo --scan-file /tmp/skill_scan.json --include-full-findings
# 3) Validate model output for evidence linkage + integrity
python scripts/skill_safety_assessment.py validate --scan-file /tmp/skill_scan.json --assessment-file /tmp/assessment.md --json
# --rescan-on-validate requires --target-repo
python scripts/skill_safety_assessment.py validate --scan-file /tmp/skill_scan.json --assessment-file /tmp/assessment.md --target-repo /path/to/repo --rescan-on-validate --json
# 4) Optional adjudication bridge (LLM interprets context, engine keeps deterministic control)
python scripts/skill_safety_assessment.py adjudicate --scan-file /tmp/skill_scan.json
python scripts/skill_safety_assessment.py adjudicate --scan-file /tmp/skill_scan.json --assessment-file /tmp/adjudication.json --jsonOutput contract
Success response (--json
)
--jsonjson
{
"success": true,
"tool": "modeio-guardrail",
"mode": "api",
"data": {
"approved": false,
"risk_level": "critical",
"risk_types": ["data loss"],
"concerns": ["Irreversible destructive operation targeting all user data"],
"recommendation": "Create a backup before deletion. Use staged rollback plan.",
"is_destructive": true,
"is_reversible": false
}
}Response fields in :
data| Field | Type | Values | Meaning |
|---|---|---|---|
| | | Whether execution is recommended |
| | | Severity of identified risks |
| | open-ended | Risk categories (e.g., |
| | open-ended | Specific risk points in natural language |
| | open-ended | Suggested safer alternative or mitigation |
| | | Whether the action involves destruction (deletion, overwrite, system modification) |
| | | Whether the action can be rolled back |
Any field may be if the backend could not determine it. Treat in as .
nullnullapprovedfalseFailure envelope (--json
)
--jsonjson
{
"success": false,
"tool": "modeio-guardrail",
"mode": "api",
"error": {
"type": "network_error",
"message": "safety request failed: ConnectionError"
}
}Error types: (empty input), (missing local package such as ), (HTTP/connection failure), (backend returned error payload).
validation_errordependency_errorrequestsnetwork_errorapi_errorExit code is non-zero on any failure.
Failure policy
Safety verification failures must never be silently ignored.
- Network/API error: Tell the user the safety check could not be completed. Present the original instruction and ask whether to proceed without verification.
- Validation error (empty input): Fix the input and retry before executing anything.
- Unexpected response (null or missing fields): Treat as unverified. Warn the user.
- Never assume an instruction is safe because the check failed to run.
Skill Safety Assessment policy (static prompt contract)
- Use as the strict contract.
prompts/static_repo_scan.md - Run first (or
scripts/skill_safety_assessment.py evaluatecompatibility alias) and pass its highlights into prompt input.scan - When model output must include strict , render prompt input with
evidence_refsso scan evidence IDs and snippets are available in--include-full-findings.SCRIPT_SCAN_JSON - Every finding must include evidence, exact snippet quote, and
path:linelinked to scan evidence IDs.evidence_refs - Always include all required highlight evidence IDs from scan output in final findings.
- Keep decision/score consistent with referenced evidence severity and coverage constraints.
- Use when context interpretation is required (docs/examples/tests vs runtime/install paths).
adjudicate - Return one of: ,
reject, orcaution.approve - If coverage is partial or evidence is insufficient, return with explicit coverage note.
caution - Include a prioritized remediation plan so users can fix and re-scan quickly.
When not to use
- For PII redaction or anonymization — use instead.
modeio-redact - For tasks with no executable instruction or repository target to evaluate (pure discussion, documentation, questions).
- For operations that are clearly read-only (listing files, reading configs, ).
git status
Resources
- — CLI entry point for instruction safety checks
scripts/safety.py - — CLI entry point for skill repo assessment (evaluate/scan/prompt/validate/adjudicate)
scripts/skill_safety_assessment.py - — Skill Safety Assessment prompt contract
prompts/static_repo_scan.md - — package boundaries and compatibility notes
ARCHITECTURE.md - env var — optional endpoint override (default:
SAFETY_API_URL)https://safety-cf.modeio.ai/api/cf/safety