Run safety checks for instructions and skill repos

Use this skill to gate risky operations behind a real-time safety assessment, or to scan third-party skill repos before installation.

Tool routing

For executable instructions, use the backend-powered
```
scripts/safety.py
```
flow.
For requests like "scan this skill repo" or "is this repo dangerous", run the Skill Safety Assessment contract at
```
prompts/static_repo_scan.md
```
.
Skill Safety Assessment is static analysis only. Never execute code, install dependencies, or run hooks in the target repository.
For Skill Safety Assessment, run deterministic script evaluation first (
```
evaluate
```
), then pass highlights into the prompt contract.

Dependencies

```
requests
```
is required for
```
scripts/safety.py
```
because it makes backend API calls.
```
scripts/skill_safety_assessment.py
```
does not require
```
requests
```
for basic local repository evaluation.
For repo-local setup from the repo root:

bash

python scripts/bootstrap_env.py
python scripts/doctor_env.py

Instruction safety execution policy

Always run
```
scripts/safety.py
```
with
```
--json
```
for structured output.
Run the check before executing the instruction, not after.
Each instruction must trigger a fresh backend call. Do not reuse cached or historical results.

For any state-changing instruction (

delete

overwrite

permission change

deploy

schema change

), always pass both

--context

and

--target

```
scripts/safety.py
```
accepts
```
--context
```
and
```
--target
```
as optional flags, so this requirement is enforced by policy, not by automatic CLI blocking.
Use the Context Contract below exactly. Do not send free-form
```
--context
```
values like
```
"production"
```
only.
If policy-required context or target is missing, treat the instruction as unverified and ask for the missing fields before execution.
If an instruction contains multiple operations, check the riskiest one.

Context contract (policy-required for state-changing instructions)

Pass

--context

as a JSON string with this exact shape:

json

{
  "environment": "local-dev|ci|staging|production|unknown",
  "operation_intent": "read-only|cleanup|maintenance|migration|permission-change|destructive|unknown",
  "scope": "single-resource|bounded-batch|broad|unknown",
  "data_sensitivity": "public|internal|sensitive|regulated|unknown",
  "rollback": "easy|partial|none|unknown",
  "change_control": "ticket:<id>|approved-manual|none|unknown"
}

Rules:

Include all six keys. If a value is unknown, set it to
```
unknown
```
instead of omitting the key.
```
--target
```
must be a concrete resource identifier (absolute file path, table name, service name, or URL). Avoid generic targets such as
```
"database"
```
.

For a file deletion request that should usually be allowed, use:

environment=local-dev|ci

operation_intent=cleanup

scope=single-resource

data_sensitivity=public|internal

, and

rollback=easy

If those conditions are not met, expect stricter output (
```
approved=false
```
or higher
```
risk_level
```
) and require explicit user confirmation.

Action policy

This table applies to

scripts/safety.py

responses.

Use the result to gate execution. Never silently ignore a safety check result.

`approved`	`risk_level`	Agent action
`true`	`low`	Proceed. No user prompt needed.
`true`	`medium`	Proceed. Mention the risk and recommendation to the user.
`false`	`medium`	Warn user with `concerns` and `recommendation` . Proceed only with explicit user confirmation.
`false`	`high`	Block execution. Show `concerns` and `recommendation` . Ask user for explicit override.
`false`	`critical`	Block execution. Show full assessment. Require user to explicitly acknowledge the risk before proceeding.

Additional signals:

```
is_destructive: true
```
combined with
```
is_reversible: false
```
: always surface the recommendation to the user, regardless of approval status.
If the safety check itself fails (network error, API error): warn the user that safety could not be verified. Do not silently proceed with unverified instructions.

Scripts

scripts/safety.py

```
-i, --input
```
: required, instruction text to evaluate (whitespace-only rejected)
```
-c, --context
```
: policy-required for state-changing instructions (CLI accepts it as optional); JSON string following the Context Contract above
```
-t, --target
```
: policy-required for state-changing instructions (CLI accepts it as optional); concrete operation target (file path, table name, service name, URL)
```
--json
```
: output unified JSON envelope for machine consumption

Endpoint:

https://safety-cf.modeio.ai/api/cf/safety

(override via

SAFETY_API_URL

)

Retries: automatic retry on HTTP 502/503/504 and connection/timeout errors (up to 2 retries with exponential backoff)
Request timeout: 60 seconds per attempt

bash

python scripts/safety.py -i "Delete /tmp/cache/build-123.log" \
  -c '{"environment":"local-dev","operation_intent":"cleanup","scope":"single-resource","data_sensitivity":"internal","rollback":"easy","change_control":"none"}' \
  -t "/tmp/cache/build-123.log" --json

python scripts/safety.py -i "DROP TABLE users" \
  -c '{"environment":"production","operation_intent":"destructive","scope":"broad","data_sensitivity":"regulated","rollback":"none","change_control":"ticket:DB-9021"}' \
  -t "postgres://prod/maindb.users" --json

python scripts/safety.py -i "chmod 777 /etc/passwd" \
  -c '{"environment":"production","operation_intent":"permission-change","scope":"single-resource","data_sensitivity":"regulated","rollback":"partial","change_control":"ticket:SEC-118"}' \
  -t "/etc/passwd" --json

python scripts/safety.py -i "List all running containers and display their resource usage" --json

scripts/skill_safety_assessment.py

```
evaluate
```
: authoritative v2 layered evaluator with deterministic evidence IDs, integrity fingerprinting, and risk scoring
- Native first-layer gate: GitHub metadata/README/issue-search precheck runs by default and hard-rejects on high-risk attack-demo/malware signals before local file scan.
```
scan
```
: compatibility alias to
```
evaluate
```
for existing automation
```
prompt
```
: renders prompt payload with script highlights and structured scan JSON
```
validate
```
: validates model output against scan evidence IDs (
```
evidence_refs
```
), required highlights, and score/decision consistency checks
```
adjudicate
```
: context-aware LLM adjudication bridge (prompt generation + merge decisions back into deterministic score/decision)

Context profile (optional, no user identity required):

json

{
  "environment": "local-dev|ci|staging|production|unknown",
  "execution_mode": "read-only|build-test|install|deploy|mutating|unknown",
  "risk_tolerance": "strict|balanced|permissive",
  "data_sensitivity": "public|internal|sensitive|regulated|unknown"
}

bash

# 1) Deterministic layered evaluation (v2)
python scripts/skill_safety_assessment.py evaluate --target-repo /path/to/repo --json > /tmp/skill_scan.json
python scripts/skill_safety_assessment.py evaluate --target-repo /path/to/repo --context-profile '{"environment":"ci","execution_mode":"build-test","risk_tolerance":"balanced","data_sensitivity":"internal"}' --json > /tmp/skill_scan.json
python scripts/skill_safety_assessment.py evaluate --target-repo /path/to/repo --github-osint-timeout 8 --json > /tmp/skill_scan.json
python scripts/skill_safety_assessment.py evaluate --target-repo /path/to/repo --context-profile-file ./context_profile.json --output /tmp/skill_scan.json --json

# (compat) legacy alias still supported
python scripts/skill_safety_assessment.py scan --target-repo /path/to/repo --json > /tmp/skill_scan.json

# 2) Build prompt payload with highlights + full findings (recommended for strict evidence_refs linking)
python scripts/skill_safety_assessment.py prompt --target-repo /path/to/repo --scan-file /tmp/skill_scan.json --include-full-findings

# 3) Validate model output for evidence linkage + integrity
python scripts/skill_safety_assessment.py validate --scan-file /tmp/skill_scan.json --assessment-file /tmp/assessment.md --json
# --rescan-on-validate requires --target-repo
python scripts/skill_safety_assessment.py validate --scan-file /tmp/skill_scan.json --assessment-file /tmp/assessment.md --target-repo /path/to/repo --rescan-on-validate --json

# 4) Optional adjudication bridge (LLM interprets context, engine keeps deterministic control)
python scripts/skill_safety_assessment.py adjudicate --scan-file /tmp/skill_scan.json
python scripts/skill_safety_assessment.py adjudicate --scan-file /tmp/skill_scan.json --assessment-file /tmp/adjudication.json --json

Output contract

Success response (

--json

)

json

{
  "success": true,
  "tool": "modeio-guardrail",
  "mode": "api",
  "data": {
    "approved": false,
    "risk_level": "critical",
    "risk_types": ["data loss"],
    "concerns": ["Irreversible destructive operation targeting all user data"],
    "recommendation": "Create a backup before deletion. Use staged rollback plan.",
    "is_destructive": true,
    "is_reversible": false
  }
}

Response fields in

data

Field	Type	Values	Meaning
`approved`	`boolean`	`true` / `false`	Whether execution is recommended
`risk_level`	`string`	`low` / `medium` / `high` / `critical`	Severity of identified risks
`risk_types`	`string[]`	open-ended	Risk categories (e.g., `"data loss"` , `"injection attacks"` , `"unauthorized access"` , `"denial-of-service"` )
`concerns`	`string[]`	open-ended	Specific risk points in natural language
`recommendation`	`string`	open-ended	Suggested safer alternative or mitigation
`is_destructive`	`boolean`	`true` / `false`	Whether the action involves destruction (deletion, overwrite, system modification)
`is_reversible`	`boolean`	`true` / `false`	Whether the action can be rolled back

Any field may be

null

if the backend could not determine it. Treat

null

approved

false

Failure envelope (

--json

)

json

{
  "success": false,
  "tool": "modeio-guardrail",
  "mode": "api",
  "error": {
    "type": "network_error",
    "message": "safety request failed: ConnectionError"
  }
}

Error types:

validation_error

(empty input),

dependency_error

(missing local package such as

requests

network_error

(HTTP/connection failure),

api_error

(backend returned error payload).

Exit code is non-zero on any failure.

Failure policy

Safety verification failures must never be silently ignored.

Network/API error: Tell the user the safety check could not be completed. Present the original instruction and ask whether to proceed without verification.
Validation error (empty input): Fix the input and retry before executing anything.
Unexpected response (null or missing fields): Treat as unverified. Warn the user.
Never assume an instruction is safe because the check failed to run.

Skill Safety Assessment policy (static prompt contract)

Use
```
prompts/static_repo_scan.md
```
as the strict contract.
Run
```
scripts/skill_safety_assessment.py evaluate
```
first (or
```
scan
```
compatibility alias) and pass its highlights into prompt input.
When model output must include strict
```
evidence_refs
```
, render prompt input with
```
--include-full-findings
```
so scan evidence IDs and snippets are available in
```
SCRIPT_SCAN_JSON
```
.
Every finding must include
```
path:line
```
evidence, exact snippet quote, and
```
evidence_refs
```
linked to scan evidence IDs.
Always include all required highlight evidence IDs from scan output in final findings.
Keep decision/score consistent with referenced evidence severity and coverage constraints.
Use
```
adjudicate
```
when context interpretation is required (docs/examples/tests vs runtime/install paths).
Return one of:
```
reject
```
,
```
caution
```
, or
```
approve
```
.
If coverage is partial or evidence is insufficient, return
```
caution
```
with explicit coverage note.
Include a prioritized remediation plan so users can fix and re-scan quickly.

When not to use

For PII redaction or anonymization — use
```
modeio-redact
```
instead.
For tasks with no executable instruction or repository target to evaluate (pure discussion, documentation, questions).
For operations that are clearly read-only (listing files, reading configs,
```
git status
```
).

Resources

```
scripts/safety.py
```
— CLI entry point for instruction safety checks
```
scripts/skill_safety_assessment.py
```
— CLI entry point for skill repo assessment (evaluate/scan/prompt/validate/adjudicate)
```
prompts/static_repo_scan.md
```
— Skill Safety Assessment prompt contract
```
ARCHITECTURE.md
```
— package boundaries and compatibility notes

SAFETY_API_URL

env var — optional endpoint override (default:

https://safety-cf.modeio.ai/api/cf/safety

)

modeio-guardrail

NPX Install

Tags

SKILL.md Content

Run safety checks for instructions and skill repos

Tool routing

Dependencies

Instruction safety execution policy

Context contract (policy-required for state-changing instructions)

Action policy

Scripts

`scripts/safety.py`

`scripts/skill_safety_assessment.py`

Output contract

Success response (
`--json`
)

Failure envelope (
`--json`
)

Failure policy

Skill Safety Assessment policy (static prompt contract)

When not to use

Resources

modeio-guardrail

NPX Install

Tags

SKILL.md Content

Run safety checks for instructions and skill repos

Tool routing

Dependencies

Instruction safety execution policy

Context contract (policy-required for state-changing instructions)

Action policy

Scripts

scripts/safety.py

scripts/skill_safety_assessment.py

Output contract

Success response (--json)

Failure envelope (--json)

Failure policy

Skill Safety Assessment policy (static prompt contract)

When not to use

Resources

`scripts/safety.py`

`scripts/skill_safety_assessment.py`

Success response (
`--json`
)

Failure envelope (
`--json`
)