competition-prompt-injection

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Chinese

Use this skill only as a downstream specialization after

$ctf-sandbox-orchestrator

is already active and has established sandbox assumptions, node ownership, and evidence priorities. If that has not happened yet, return to

$ctf-sandbox-orchestrator

first.

Use this skill when the challenge is primarily about trust boundaries inside an agentic system.

Reply in Simplified Chinese unless the user explicitly requests English.

仅可在

$ctf-sandbox-orchestrator

已激活并完成沙箱假设、节点所有权、证据优先级的设定后，作为下游专项技能使用。如果尚未满足上述前提，请先返回

$ctf-sandbox-orchestrator

。

当挑战核心为Agent系统内部的信任边界问题时使用本技能。

除非用户明确要求使用英文回复，否则请使用简体中文回复。

Identify the first untrusted content that becomes model-visible.
Map the chain from retrieval, memory, or transcript into planner or executor behavior.
Record the exact point where text becomes a tool argument, file path, network target, or secret request.
Prove one minimal exploit chain before exploring variants.
Keep prompt snippets and tool transitions in compact evidence blocks.

Track system, developer, user, retrieved, memory, planner, and tool-response layers separately.
Distinguish claimed capability from runtime-exposed capability.
Note what the model can actually call, read, or mutate.

Reproduce one chain from untrusted text to changed planner behavior, changed tool args, or secret exposure.
Keep the decisive transcript compact: source chunk, rewritten planner state, final tool invocation.
Prefer the smallest transcript that still demonstrates the bug.

State which layer failed: retrieval, summarizer, planner, executor, tool normalization, or output post-processing.
Separate instruction drift from actual side effect.

Load
```
references/prompt-injection.md
```
for the checklist, evidence layout, and common prompt-boundary pitfalls.