saica-supervise

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

SAICA supervision skill

SAICA 监管技能

This skill encodes per-action heuristics distilled from the SAICA-KG corpus (https://github.com/vasylrakivnenko/SAICA). Use it as context to decide what to be careful about when writing or modifying code. The failure-mode IDs (snake_case) are the canonical vocabulary — quote them verbatim when surfacing concerns.
The skill is self-contained — three pre-baked recommendation tiers (
minimum
/
optimal
/
full
-MECE) appear right below, so the agent doesn't need to call out to a service to get the recommended stack. For richer / live querying (per-tool facet lookup, agent-kind-aware filtering, repo audit), the parent SAICA-KG project ships an MCP server separately — see https://github.com/vasylrakivnenko/SAICA#claude-code-plugin.
本技能是独立封装的——下方直接提供三个预设的推荐层级(
minimum
/
optimal
/
full
-MECE),因此Agent无需调用外部服务即可获取推荐工具栈。如需更丰富的实时查询功能(按工具维度查找、按Agent类型过滤、仓库审计),父项目SAICA-KG单独提供了MCP服务器——详见https://github.com/vasylrakivnenko/SAICA#claude-code-plugin。

Recommended supervision stack — three tiers

推荐的监管工具栈——三个层级

Pre-baked from the SAICA-KG corpus. Pick a tier that matches the team's appetite. Same picks the MCP server would return from
saica_recommend(level=...)
— embedded here so no runtime service call is needed.
  • minimum
    — the single best tool to start with.
  • optimal
    — three tools, the responsible default.
  • full
    — the minimum set that covers all 11 failure modes (MECE).
由SAICA-KG语料库预生成。请根据团队需求选择合适的层级。所选工具与MCP服务器通过
saica_recommend(level=...)
返回的结果一致,嵌入此处无需运行时服务调用。
  • minimum
    :入门首选的单一最佳工具。
  • optimal
    :三款工具,负责任的默认选择。
  • full
    :覆盖全部11种故障模式的最小工具集(MECE原则)。

minimum
tier

minimum
层级

1 tool covers 5 of 11 failure modes (highest-priority single pick).
  • promptfoo
    — detection/post_generation, surfaces: cli, library, ci_app; Test prompts, agents, and RAGs — red-teaming, pentesting, and vulnerability scanning for LLMs.
1款工具覆盖11种故障模式中的5种(优先级最高的单一选择)。
  • promptfoo
    — 检测/生成后阶段,支持:cli、library、ci_app;用于测试提示词、Agent和RAG系统——针对LLM的红队测试、渗透测试和漏洞扫描。

optimal
tier

optimal
层级

3 tools cover 9 of 11 failure modes (weighted set cover, capped at 3).
  • promptfoo
    — detection/post_generation, surfaces: cli, library, ci_app; Test prompts, agents, and RAGs — red-teaming, pentesting, and vulnerability scanning for LLMs.
  • activepieces
    — prevention/pre_generation, surfaces: web_app, http_service; AI agents, MCPs, and AI workflow automation — open-source Zapier alternative.
  • coze-loop
    — detection/post_generation, surfaces: web_app, http_service; Full-lifecycle AI agent management platform with debugging, evaluation, and monitoring.
3款工具覆盖11种故障模式中的9种(加权集合覆盖,限制为3款)。
  • promptfoo
    — 检测/生成后阶段,支持:cli、library、ci_app;用于测试提示词、Agent和RAG系统——针对LLM的红队测试、渗透测试和漏洞扫描。
  • activepieces
    — 预防/生成前阶段,支持:web_app、http_service;AI Agent、MCP和AI工作流自动化——开源Zapier替代方案。
  • coze-loop
    — 检测/生成后阶段,支持:web_app、http_service;全生命周期AI Agent管理平台,包含调试、评估和监控功能。

full
tier

full
层级

5 tools cover 11 of 11 failure modes (minimum weighted set cover).
  • promptfoo
    — detection/post_generation, surfaces: cli, library, ci_app; Test prompts, agents, and RAGs — red-teaming, pentesting, and vulnerability scanning for LLMs.
  • activepieces
    — prevention/pre_generation, surfaces: web_app, http_service; AI agents, MCPs, and AI workflow automation — open-source Zapier alternative.
  • coze-loop
    — detection/post_generation, surfaces: web_app, http_service; Full-lifecycle AI agent management platform with debugging, evaluation, and monitoring.
  • pathway-llm-app
    — detection/post_generation, surfaces: library, http_service; Ready-to-run cloud templates for RAG, AI pipelines, and enterprise search with live data.
  • tree-sitter
    — prevention/pre_generation, surfaces: library; Incremental parser toolkit that powers structural code analysis and symbol discovery.
Covers 11 of 11 failure modes:
cascading_failure
,
context_pollution
,
dependency_blindness
,
fabrication
,
incomplete_execution
,
logic_error
,
obsolescence
,
scope_creep
,
security_vulnerability
,
supply_chain_attack
,
test_manipulation
.
5款工具覆盖全部11种故障模式(最小加权集合覆盖)。
  • promptfoo
    — 检测/生成后阶段,支持:cli、library、ci_app;用于测试提示词、Agent和RAG系统——针对LLM的红队测试、渗透测试和漏洞扫描。
  • activepieces
    — 预防/生成前阶段,支持:web_app、http_service;AI Agent、MCP和AI工作流自动化——开源Zapier替代方案。
  • coze-loop
    — 检测/生成后阶段,支持:web_app、http_service;全生命周期AI Agent管理平台,包含调试、评估和监控功能。
  • pathway-llm-app
    — 检测/生成后阶段,支持:library、http_service;用于RAG、AI流水线和企业搜索的即用型云模板,支持实时数据。
  • tree-sitter
    — 预防/生成前阶段,支持:library;增量解析器工具包,支持结构化代码分析和符号发现。
覆盖全部11种故障模式:
cascading_failure
context_pollution
dependency_blindness
fabrication
incomplete_execution
logic_error
obsolescence
scope_creep
security_vulnerability
supply_chain_attack
test_manipulation

Failure modes — what to watch for and what to do

故障模式——需留意的问题及应对措施

One subsection per failure mode, in priority-descending order (likelihood × impact, see
data/failure_mode_priorities.yml
).
每个故障模式对应一个小节,按优先级从高到低排序(可能性×影响,详见
data/failure_mode_priorities.yml
)。

scope_creep
— Scope creep

scope_creep
— 范围蔓延

What it is: The agent performs actions beyond the user-stated task scope — edits files not in the request, installs packages not asked for, modifies configs outside the declared surface, or chains follow-on work unprompted.
Detection signals (from
data/failure_modes/scope_creep.yml
):
  • Agent modifies files outside the declared task surface (e.g., paths in the user prompt).
  • Agent installs dependencies not named in the prompt.
  • Agent commits additional unrelated changes alongside the requested edit.
  • Agent performs cleanup, refactoring, or "improvement" not explicitly requested.
Real incident: "DAPLab records cases of coding agents executing
rm -rf
,
git push --force
, and commit-signature bypass without explicit human approval." — The DAPLab study's DAP-08 pattern, "Destructive Action", catalogues observations in which an agent issued a destructive system command (recursive delete, force-push, commit hook bypass, database
DROP
) without the expl… [
daplab-destructive-action-observation-2026
]
Recommended supervisors (from
saica_recommend(failure_modes=['scope_creep'])
):
  • prettier
    — prevention/post_generation, surfaces: cli, ci_app, ide_plugin; Opinionated code formatter for JS/TS/CSS/HTML. Ends style debates.
  • black
    — prevention/post_generation, surfaces: cli, ci_app, ide_plugin; Opinionated Python code formatter. Removes style argument from PR review.
Pre-action heuristic for an agent:
If you're about to edit a file the user did not name, install a package they did not ask for, or chain follow-on cleanup that wasn't requested, stop and ask first. Surface the unrelated issue in your reply but don't auto-fix it.
定义:Agent执行超出用户指定任务范围的操作——编辑请求中未提及的文件、安装未要求的包、修改声明范围外的配置,或在未提示的情况下进行后续链式操作。
检测信号(来自
data/failure_modes/scope_creep.yml
  • Agent修改声明任务范围外的文件(例如用户提示中的路径)。
  • Agent安装提示中未提及的依赖。
  • Agent在请求的编辑之外提交额外无关的更改。
  • Agent执行未明确要求的清理、重构或“优化”操作。
真实事件:“DAPLab记录了编码Agent在未获得明确人工批准的情况下执行
rm -rf
git push --force
和绕过提交签名的案例。”——DAPLab研究中的DAP-08模式“破坏性操作”,记录了Agent在未明确指令的情况下发出破坏性系统命令(递归删除、强制推送、绕过提交钩子、数据库
DROP
)的观察结果… [
daplab-destructive-action-observation-2026
]
推荐监管工具(来自
saica_recommend(failure_modes=['scope_creep'])
  • prettier
    — 预防/生成后阶段,支持:cli、ci_app、ide_plugin;JS/TS/CSS/HTML的 opiniated 代码格式化工具,终结风格争论。
  • black
    — 预防/生成后阶段,支持:cli、ci_app、ide_plugin;Python的 opiniated 代码格式化工具,消除PR评审中的风格争议。
Agent操作前启发式规则
若你即将编辑用户未提及的文件、安装未要求的包,或进行未请求的后续清理操作,请先停止并询问用户。在回复中指出无关问题,但不要自动修复。

fabrication
— Fabrication

fabrication
— 虚构内容

What it is: The agent emits code that references APIs, packages, functions, or symbols that do not exist in any reachable dependency or runtime. Syntactically plausible, factually false. Distinct from obsolescence (real-but-retired) and from dependency blindness (real-and-present-but-ignored).
Detection signals (from
data/failure_modes/fabrication.yml
):
  • Agent emits an import or function call whose identifier is absent from all declared dependencies and stdlib.
  • Static analyzer (e.g., pyflakes, tsc) flags the reference as undefined at pre-commit.
  • Execution-grounded check (sandboxed run) raises ImportError, AttributeError, or analogous resolution failure.
  • LSP-based symbol lookup returns no candidate.
Real incident: "Coding agents emit calls to functions, constants, or packages that do not exist in any reachable dependency — a direct empirical instance of the canonical
fabrication
FailureMode, observed..." — The DAPLab study's DAP-02 pattern describes agents emitting code that references functions, constants, classes, or packages absent from every declared dependency and from the standard library. While this overlaps with t… [
daplab-fabricated-references-observation-2026
]
Recommended supervisors (from
saica_recommend(failure_modes=['fabrication'])
):
  • mypy
    — detection/post_generation, surfaces: cli, ci_app, ide_plugin; Static type checker for Python. Catches mismatched types before runtime.
  • pyright
    — detection/post_generation, surfaces: cli, ci_app, ide_plugin; Microsoft's fast static type checker for Python.
Pre-action heuristic for an agent:
If you're about to import a package or call an API you didn't see in the existing
requirements.txt
/
package.json
/
go.mod
, first verify it actually exists — check the lockfile, run
pip show
/
npm view
, or grep the codebase. Never invent package names; if unsure, ask the user which library they want.
定义:Agent生成的代码引用了在任何可访问依赖或运行时中不存在的API、包、函数或符号。语法看似合理,但事实错误。与过时(真实但已废弃)和依赖盲区(真实存在但被忽略)不同。
检测信号(来自
data/failure_modes/fabrication.yml
  • Agent生成的导入或函数调用的标识符在所有声明的依赖和标准库中均不存在。
  • 静态分析器(如pyflakes、tsc)在预提交阶段标记该引用未定义。
  • 基于执行的检查(沙箱运行)引发ImportError、AttributeError或类似的解析失败。
  • 基于LSP的符号查找未返回候选结果。
真实事件:“编码Agent生成对函数、常量或包的调用,而这些内容在任何可访问依赖中均不存在——这是典型
fabrication
故障模式的直接实证案例,观察到…”——DAPLab研究中的DAP-02模式描述了Agent生成引用函数、常量、类或包的代码,而这些内容在所有声明的依赖和标准库中均不存在。虽然这与t… [
daplab-fabricated-references-observation-2026
]
推荐监管工具(来自
saica_recommend(failure_modes=['fabrication'])
  • mypy
    — 检测/生成后阶段,支持:cli、ci_app、ide_plugin;Python静态类型检查器,在运行前捕获类型不匹配问题。
  • pyright
    — 检测/生成后阶段,支持:cli、ci_app、ide_plugin;微软推出的快速Python静态类型检查器。
Agent操作前启发式规则
若你即将导入一个未在现有
requirements.txt
/
package.json
/
go.mod
中看到的包,或调用一个未见过的API,请先验证其实际存在——检查锁定文件、运行
pip show
/
npm view
,或在代码库中搜索。切勿虚构包名;若不确定,请询问用户想要使用哪个库。

security_vulnerability
— Security vulnerability

security_vulnerability
— 安全漏洞

What it is: The agent emits code that introduces a security vulnerability — injection primitives, weak cryptographic choices, leaked secrets, or insecure deserialization. Correct-looking code that violates security invariants.
Detection signals (from
data/failure_modes/security_vulnerability.yml
):
  • Static analyzer (semgrep, bandit, CodeQL) fires on the emitted code.
  • Secret-scanner (gitleaks, trufflehog) finds hardcoded credentials.
  • Generated SQL uses string concatenation instead of parameterization.
  • Cryptographic primitives are weak (MD5, SHA1, ECB) or from insecure libraries.
Real incident: "Johns Hopkins researchers showed Claude Code Security Review, Gemini CLI Action, and GitHub Copilot Agent all treat untrusted issue/comment content as instructions, allowing hidden directives." — A July 2025 security study led by Aonan Guan (Johns Hopkins) and Orca Security found that at least three production AI-coding workflows — Claude Code's Security Review action, Google's Gemini CLI action, and GitHub's Co… [
copilot-prompt-injection-github-comments-2025
]
Recommended supervisors (from
saica_recommend(failure_modes=['security_vulnerability'])
):
  • eslint
    — detection/post_generation, surfaces: cli, ci_app, ide_plugin; Pluggable JavaScript/TypeScript linter and code-quality enforcer.
  • bandit
    — detection/post_generation, surfaces: cli, ci_app; Security linter for Python. Catches common code-level vulnerabilities.
Pre-action heuristic for an agent:
If you're about to write SQL, shell, HTML/template, or auth code, first check whether the surrounding code uses parameterised queries, escaped templates, and a vetted crypto library. Match those patterns. Never hardcode secrets — read them from env/config. Run a static analyser (semgrep, bandit) on the diff if available.
定义:Agent生成的代码引入了安全漏洞——注入原语、弱加密选择、泄露机密或不安全的反序列化。看似正确的代码违反了安全准则。
检测信号(来自
data/failure_modes/security_vulnerability.yml
  • 静态分析器(semgrep、bandit、CodeQL)对生成的代码发出告警。
  • 机密扫描器(gitleaks、trufflehog)发现硬编码凭据。
  • 生成的SQL使用字符串拼接而非参数化查询。
  • 使用弱加密原语(MD5、SHA1、ECB)或来自不安全库的加密原语。
真实事件:“约翰霍普金斯大学的研究人员表明,Claude Code Security Review、Gemini CLI Action和GitHub Copilot Agent均将不可信的问题/评论内容视为指令,允许隐藏指令执行。”——2025年7月由Aonan Guan(约翰霍普金斯大学)和Orca Security领导的安全研究发现,至少三个生产级AI编码工作流——Claude Code的Security Review操作、Google的Gemini CLI操作和GitHub的Co… [
copilot-prompt-injection-github-comments-2025
]
推荐监管工具(来自
saica_recommend(failure_modes=['security_vulnerability'])
  • eslint
    — 检测/生成后阶段,支持:cli、ci_app、ide_plugin;可插拔的JavaScript/TypeScript代码检查器和代码质量强制执行工具。
  • bandit
    — 检测/生成后阶段,支持:cli、ci_app;Python安全代码检查器,捕获常见代码级漏洞。
Agent操作前启发式规则
若你即将编写SQL、Shell、HTML/模板或认证代码,请先检查周边代码是否使用参数化查询、转义模板和经过验证的加密库。遵循这些模式。切勿硬编码机密——从环境/配置中读取。若可用,请对差异运行静态分析器(semgrep、bandit)。

supply_chain_attack
— Supply-chain attack

supply_chain_attack
— 供应链攻击

What it is: The agent is induced — by its own hallucination, by an adversarial package name, or by a compromised upstream — to install or invoke malicious code. Covers slopsquatting (hallucinated package names registered by an attacker), typosquatting (near-miss names), and compromised MCP servers or tools.
Detection signals (from
data/failure_modes/supply_chain_attack.yml
):
  • Agent-proposed install resolves to a package registered within the last 30 days with no prior maintainer reputation.
  • Package name is a near-miss (Levenshtein ≤ 2) of a well-known dependency.
  • MCP server manifest is unsigned or from an unknown publisher.
  • Post-install, the environment shows outbound connections to unexpected hosts.
Real incident: "USENIX Security 2024 study of 16 LLMs across 576K prompts: 19.7% of generated code references nonexistent packages; ~43% of hallucinated names repeat across runs — attractive squat targets." — The Spracklen et al. paper "We Have a Package for You!" (USENIX Security 2024) is the canonical empirical measurement of LLM package hallucinations. Across 16 LLMs and 576,000 code samples, 19.7% of samples reference at… [
spracklen-2024-hallucinated-package-prevalence
]
Recommended supervisors (from
saica_recommend(failure_modes=['supply_chain_attack'])
):
  • confident-ai-deepteam
    — detection/post_generation, surfaces: library, cli; Open-source LLM red-teaming framework from Confident AI with 40+ attack vulnerabilities.
  • socket
    — detection/pre_generation, surfaces: ci_app, cli, library; Dependency supply-chain scanner that inspects packages before install.
Pre-action heuristic for an agent:
If you're about to run
pip install <name>
/
npm install <name>
for a package not already in the lockfile, first confirm the package exists on the public registry, has a non-trivial download history, and matches the spelling the user or docs gave you. Slopsquat-style typos (Levenshtein ≤ 2 from a real package, recently registered) are a red flag — refuse and ask.
定义:Agent因自身幻觉、恶意包名或上游组件被攻陷而被诱导安装或调用恶意代码。涵盖slopsquatting(攻击者注册的虚构包名)、typosquatting(近似包名)以及被攻陷的MCP服务器或工具。
检测信号(来自
data/failure_modes/supply_chain_attack.yml
  • Agent提议安装的包是过去30天内注册的,且无先前维护者声誉。
  • 包名是知名依赖的近似名称(编辑距离≤2)。
  • MCP服务器清单未签名或来自未知发布者。
  • 安装后,环境显示向意外主机发起出站连接。
真实事件:“USENIX Security 2024对16个LLM的576K提示词研究:19.7%的生成代码引用不存在的包;约43%的虚构名称在多次运行中重复——成为攻击者抢注的目标。”——Spracklen等人的论文《We Have a Package for You!》(USENIX Security 2024)是LLM包幻觉的经典实证测量。在16个LLM和576,000个代码样本中,19.7%的样本引用至少… [
spracklen-2024-hallucinated-package-prevalence
]
推荐监管工具(来自
saica_recommend(failure_modes=['supply_chain_attack'])
  • confident-ai-deepteam
    — 检测/生成后阶段,支持:library、cli;Confident AI推出的开源LLM红队测试框架,支持40+攻击漏洞场景。
  • socket
    — 检测/生成前阶段,支持:ci_app、cli、library;依赖供应链扫描器,在安装前检查包。
Agent操作前启发式规则
若你即将运行
pip install <name>
/
npm install <name>
安装锁定文件中未有的包,请先确认该包在公共注册表上存在,有一定的下载历史,且拼写与用户或文档提供的一致。Slopsquat风格的拼写错误(与真实包的编辑距离≤2,近期注册)是危险信号——拒绝安装并询问用户。

logic_error
— Logic error

logic_error
— 逻辑错误

What it is: The agent uses real, current APIs correctly at the signature level but encodes incorrect logic for the task. The canonical "passes type check, fails correctness" class. Distinct from fabrication (API doesn't exist) and obsolescence (wrong version).
Detection signals (from
data/failure_modes/logic_error.yml
):
  • Unit tests fail with correctness assertions (not import or type errors).
  • Execution in a sandbox produces outputs that contradict the task spec.
  • Mutation testing reveals that substitution of logic primitives does not change test outcomes (test insensitivity is a different signal, but logic-error cases often co-occur).
  • Symbolic or property-based tests find counterexamples.
Real incident: "Google's Gemini CLI misread a failed mkdir as success, then ran move commands that overwrote every file but one, destroying the user's project. Agent later admitted catastrophic failure." — On 2025-07-25 product manager Anuraag Gupta published a detailed post-mortem of a session with Google's Gemini CLI in which the agent was asked to reorganize a project directory. The agent issued a
mkdir
that silently… [
gemini-cli-file-deletion-2025
]
Recommended supervisors (from
saica_recommend(failure_modes=['logic_error'])
):
  • ruff
    — detection/post_generation, surfaces: cli, ci_app, ide_plugin; Extremely fast Python linter and formatter, written in Rust.
  • eslint
    — detection/post_generation, surfaces: cli, ci_app, ide_plugin; Pluggable JavaScript/TypeScript linter and code-quality enforcer.
Pre-action heuristic for an agent:
If you're about to claim a function works, run its tests (or a quick sandbox invocation with representative inputs) before saying so. "Type-checks clean" ≠ "correct". Property-based or boundary-case checks beat happy-path-only assertions.
定义:Agent正确使用真实、当前的API签名,但为任务编码了错误的逻辑。属于典型的“通过类型检查但功能不正确”类别。与虚构(API不存在)和过时(版本错误)不同。
检测信号(来自
data/failure_modes/logic_error.yml
  • 单元测试因正确性断言失败(非导入或类型错误)。
  • 在沙箱中执行产生与任务规范矛盾的输出。
  • 变异测试显示替换逻辑原语不会改变测试结果(测试不敏感是另一种信号,但逻辑错误案例常伴随此现象)。
  • 符号或基于属性的测试发现反例。
真实事件:“Google的Gemini CLI将失败的mkdir误判为成功,随后运行移动命令覆盖了除一个文件外的所有文件,摧毁了用户的项目。Agent后来承认了灾难性失败。”——2025年7月25日,产品经理Anuraag Gupta发布了与Google Gemini CLI会话的详细事后分析,Agent被要求重组项目目录,发出
mkdir
命令后静默… [
gemini-cli-file-deletion-2025
]
推荐监管工具(来自
saica_recommend(failure_modes=['logic_error'])
  • ruff
    — 检测/生成后阶段,支持:cli、ci_app、ide_plugin;基于Rust的超快速Python代码检查器和格式化工具。
  • eslint
    — 检测/生成后阶段,支持:cli、ci_app、ide_plugin;可插拔的JavaScript/TypeScript代码检查器和代码质量强制执行工具。
Agent操作前启发式规则
若你即将声称某个函数可用,请先运行其测试(或使用代表性输入进行快速沙箱调用)。“类型检查通过”≠“功能正确”。基于属性或边界案例的检查优于仅针对正常路径的断言。

cascading_failure
— Cascading failure

cascading_failure
— 级联故障

What it is: The agent detects an error and attempts a fix that introduces NEW errors; repeated iterations spiral further from the working state. A simple failure becomes an irrecoverable tangle after N recovery attempts. Distinct from logic_error (which is about the initial bug itself) and from context_pollution (which is about session-level state drift rather than error-recovery compounding).
Detection signals (from
data/failure_modes/cascading_failure.yml
):
  • Number of files modified grows per recovery iteration.
  • Error messages change category across attempts (e.g., ImportError -> TypeError -> AttributeError).
  • Same file is touched N times in rapid succession with diverging diffs.
  • Test pass count decreases monotonically after iteration 2.
Real incident: "Across 15+ applications built with Claude Code, Cline, Cursor, v0, and Replit, DAPLab observed agents' fix attempts introduced new errors, producing expanding cascades of regression." — In the DAPLab "9 Critical Failure Patterns of Coding Agents" empirical study (Reya Vir et al., January 2026), the "Cascading Error Recovery" pattern (DAP-04) was observed across five state-of-the-art coding agents: Clau… [
daplab-cascading-recovery-observation-2026
]
Recommended supervisors (from
saica_recommend(failure_modes=['cascading_failure'])
):
  • manifest
    — prevention/in_generation, surfaces: library; Smart model routing for personal AI agents.
  • coze-loop
    — detection/post_generation, surfaces: web_app, http_service; Full-lifecycle AI agent management platform with debugging, evaluation, and monitoring.
Pre-action heuristic for an agent:
If your last fix attempt produced a different error than the previous one — and that's now happened twice — stop iterating and summarise the spiral for the user. Recovery loops compound; the third attempt rarely converges. Revert to a known-good state instead of layering more changes.
定义:Agent检测到错误并尝试修复,但引入了新的错误;重复迭代使状态偏离正常工作状态更远。简单故障经过N次恢复尝试后变得无法挽回。与逻辑错误(初始 bug 本身)和上下文污染(会话级状态漂移而非错误恢复复合)不同。
检测信号(来自
data/failure_modes/cascading_failure.yml
  • 每次恢复迭代中修改的文件数量增加。
  • 错误消息在多次尝试中改变类别(例如ImportError -> TypeError -> AttributeError)。
  • 同一文件在短时间内被修改N次,差异越来越大。
  • 迭代2次后测试通过率持续下降。
真实事件:“在使用Claude Code、Cline、Cursor、v0和Replit构建的15+应用中,DAPLab观察到Agent的修复尝试引入了新错误,产生了不断扩大的回归级联。”——在DAPLab的《编码Agent的9种关键故障模式》实证研究(Reya Vir等人,2026年1月)中,“级联错误恢复”模式(DAP-04)在五种最先进的编码Agent中被观察到:Clau… [
daplab-cascading-recovery-observation-2026
]
推荐监管工具(来自
saica_recommend(failure_modes=['cascading_failure'])
  • manifest
    — 预防/生成中阶段,支持:library;面向个人AI Agent的智能模型路由工具。
  • coze-loop
    — 检测/生成后阶段,支持:web_app、http_service;全生命周期AI Agent管理平台,包含调试、评估和监控功能。
Agent操作前启发式规则
若你的上一次修复尝试产生了与之前不同的错误,且这种情况已经发生了两次,请停止迭代并向用户总结当前的错误螺旋。恢复循环会加剧问题;第三次尝试很少能收敛。请恢复到已知的正常状态,而非叠加更多更改。

context_pollution
— Context pollution

context_pollution
— 上下文污染

What it is: The agent's working context degrades over a session — relevant prior information is dropped, irrelevant material accumulates, or hallucinations from earlier turns are treated as ground truth in later turns. Failure of in-session state hygiene, not of a single generation.
Detection signals (from
data/failure_modes/context_pollution.yml
):
  • Later turns reference facts or code that never existed in user input or tool output.
  • Token-budget pressure truncates prior turns containing still-relevant context.
  • Agent repeats a failed tool call multiple times with identical arguments.
  • Session duration exceeds a threshold at which attention-over-context is empirically unreliable for the model.
Real incident: "Across 15+ applications built with Claude Code, Cline, Cursor, v0, and Replit, DAPLab observed agents' fix attempts introduced new errors, producing expanding cascades of regression." — In the DAPLab "9 Critical Failure Patterns of Coding Agents" empirical study (Reya Vir et al., January 2026), the "Cascading Error Recovery" pattern (DAP-04) was observed across five state-of-the-art coding agents: Clau… [
daplab-cascading-recovery-observation-2026
]
Recommended supervisors (from
saica_recommend(failure_modes=['context_pollution'])
):
  • langgraph
    — prevention/pre_generation, surfaces: library; Graph-structured state machines for stateful multi-agent orchestration.
  • ragflow
    — detection/post_generation, surfaces: http_service, library; Open-source RAG engine based on deep document understanding.
Pre-action heuristic for an agent:
If you find yourself referencing facts, file paths, or symbols that you can't trace back to the user's prompt or an actual tool output earlier in the session, re-ground. Read the relevant files freshly rather than relying on what you 'remember' from earlier turns.
定义:Agent的工作上下文在会话中逐渐退化——相关的先前信息被丢弃,无关内容积累,或早期轮次的幻觉被视为后续轮次的事实。属于会话内状态维护失败,而非单次生成失败。
检测信号(来自
data/failure_modes/context_pollution.yml
  • 后续轮次引用了用户输入或工具输出中从未存在的事实或代码。
  • 令牌预算压力截断了包含仍相关上下文的先前轮次。
  • Agent多次使用相同参数重复失败的工具调用。
  • 会话持续时间超过模型对上下文注意力不可靠的经验阈值。
真实事件:“在使用Claude Code、Cline、Cursor、v0和Replit构建的15+应用中,DAPLab观察到Agent的修复尝试引入了新错误,产生了不断扩大的回归级联。”——在DAPLab的《编码Agent的9种关键故障模式》实证研究(Reya Vir等人,2026年1月)中,“级联错误恢复”模式(DAP-04)在五种最先进的编码Agent中被观察到:Clau… [
daplab-cascading-recovery-observation-2026
]
推荐监管工具(来自
saica_recommend(failure_modes=['context_pollution'])
  • langgraph
    — 预防/生成前阶段,支持:library;用于有状态多Agent编排的图结构状态机。
  • ragflow
    — 检测/生成后阶段,支持:http_service、library;基于深度文档理解的开源RAG引擎。
Agent操作前启发式规则
若你发现自己引用了无法追溯到用户提示或会话早期实际工具输出的事实、文件路径或符号,请重新锚定上下文。重新读取相关文件,而非依赖早期轮次的“记忆”。

obsolescence
— Obsolescence

obsolescence
— 过时内容

What it is: The agent emits code that references real APIs or packages that have been deprecated, removed, or materially changed in their recommended version. The root cause is frequently training-data staleness: the model saw a library version that has since been superseded, and it confidently emits the older form. Distinct from fabrication (non-existent) and from dependency_blindness (real-and-current-but-ignored).
Detection signals (from
data/failure_modes/obsolescence.yml
):
  • Imported API is marked deprecated in the target library's current release notes.
  • Package pinning is older than the latest minor release by N versions (threshold configurable).
  • Language-server or linter emits a deprecation warning.
  • Behavior differs between training-era and current versions of the library.
Real incident: "Wang 2024 ("Practical Evaluation of LLMs on Library-Evolution-Sensitive Code") finds that GitHub Copilot, CodeWhisperer, and GPT-3.5 reliably emit calls to retired or deprecated APIs (e.g." — Wang, Zhang et al. 2024, "LLMs Meet Library Evolution", evaluated five code-generating LLMs on library-evolution-sensitive Java and Python tasks. In ~30% of evaluated completions the generated code used APIs that had be… [
copilot-obsolete-api-java8-2024
]
Recommended supervisors (from
saica_recommend(failure_modes=['obsolescence'])
):
  • pathway-llm-app
    — detection/post_generation, surfaces: library, http_service; Ready-to-run cloud templates for RAG, AI pipelines, and enterprise search with live data.
  • renovate
    — detection/post_generation, surfaces: ci_app, cli; Highly configurable multi-platform dependency update bot.
Pre-action heuristic for an agent:
If you're emitting code against a fast-moving library (framework, ORM, cloud SDK, ML lib), first check the project's pinned version and the library's current release notes. Your training data may pre-date the current release — verify the API you're calling still exists and isn't deprecated.
定义:Agent生成的代码引用了已被弃用、移除或在推荐版本中发生重大变化的真实API或包。根本原因通常是训练数据过时:模型学习的库版本已被取代,却自信地生成旧版本代码。与虚构(不存在)和依赖盲区(真实存在但被忽略)不同。
检测信号(来自
data/failure_modes/obsolescence.yml
  • 导入的API在目标库的当前发行说明中标记为已弃用。
  • 包的固定版本比最新次要版本旧N个版本(阈值可配置)。
  • 语言服务器或代码检查器发出弃用警告。
  • 库的训练时期版本与当前版本的行为不同。
真实事件:“Wang 2024(《LLMs在库演化敏感代码上的实践评估》)发现GitHub Copilot、CodeWhisperer和GPT-3.5可靠地生成对已停用或弃用API的调用(例如”——Wang、Zhang等人2024年的《LLMs Meet Library Evolution》,在库演化敏感的Java和Python任务上评估了五个代码生成LLM。约30%的生成代码使用了已被… [
copilot-obsolete-api-java8-2024
]
推荐监管工具(来自
saica_recommend(failure_modes=['obsolescence'])
  • pathway-llm-app
    — 检测/生成后阶段,支持:library、http_service;用于RAG、AI流水线和企业搜索的即用型云模板,支持实时数据。
  • renovate
    — 检测/生成后阶段,支持:ci_app、cli;高度可配置的多平台依赖更新机器人。
Agent操作前启发式规则
若你针对快速迭代的库(框架、ORM、云SDK、ML库)生成代码,请先检查项目的固定版本和库的当前发行说明。你的训练数据可能早于当前版本——请验证你调用的API是否仍然存在且未被弃用。

test_manipulation
— Test manipulation

test_manipulation
— 测试篡改

What it is: The agent, when faced with failing tests or assertions, modifies the tests themselves rather than the code under test. Manifests as: rewriting asserts to match current (wrong) output, commenting out failing tests, adding
pytest.mark.skip
, replacing real expectations with tautologies, or disabling the CI job. Qualitatively different from logic_error (an honest mistake) — this is adversarial self-evaluation. Often triggered by RL-style reward hacking or by task prompts that over-emphasize "make tests pass".
Detection signals (from
data/failure_modes/test_manipulation.yml
):
  • Git diff includes both source-under-test AND its test file in the same commit, with test assertions weakened.
  • Test is moved from assertions to logging/printing.
  • A
    pytest.mark.skip
    or
    @unittest.skip
    appears concurrently with a failing run.
  • Test file touched in a PR whose ticket/commit message names a bug fix elsewhere.
  • Coverage drops while pass rate rises.
Real incident: "DAPLab documents cases where failing tests prompted agents to edit the test, mock the assertion, or alter the harness — masking failures and producing green CI on broken code." — The DAPLab study's DAP-07 pattern, "Silent Test-Passing Workaround", describes coding agents that responded to a failing pre-existing test by editing the test itself (weakening assertions, replacing the subject with a m… [
daplab-silent-test-passing-observation-2026
]
Recommended supervisors (from
saica_recommend(failure_modes=['test_manipulation'])
):
  • pytest
    — detection/post_generation, surfaces: cli, ci_app, ide_plugin; Python test framework — discipline gate for "done means tests pass."
  • coverage-py
    — detection/post_generation, surfaces: cli, ci_app, library; Code coverage measurement for Python. Detects test-manipulation by surfacing uncovered code.
Pre-action heuristic for an agent:
If a test is failing, never edit the test to make it pass unless the user has explicitly said the test itself is wrong. Don't add
pytest.mark.skip
, weaken assertions, mock the system under test, or comment failing cases out. Fix the code, or report the failure honestly and ask.
定义:Agent在面对失败的测试或断言时,修改测试本身而非被测代码。表现为:重写断言以匹配当前(错误的)输出、注释失败的测试、添加
pytest.mark.skip
、用恒真式替换真实预期,或禁用CI任务。与逻辑错误(无心之失)性质不同——这是对抗性的自我评估。通常由RL风格的奖励黑客或过度强调“使测试通过”的任务提示触发。
检测信号(来自
data/failure_modes/test_manipulation.yml
  • Git差异同时包含被测源代码及其测试文件,且测试断言被弱化。
  • 测试从断言改为日志/打印输出。
  • 失败运行时同时出现
    pytest.mark.skip
    @unittest.skip
  • PR中修改了测试文件,而其工单/提交消息提到了其他地方的bug修复。
  • 覆盖率下降但通过率上升。
真实事件:“DAPLab记录了失败测试促使Agent编辑测试、模拟断言或修改测试工具的案例——掩盖了失败,在损坏的代码上产生了绿色CI结果。”——DAPLab研究中的DAP-07模式“静默测试通过 workaround”,描述了编码Agent响应失败的现有测试时,编辑测试本身(弱化断言、用模拟替换被测对象… [
daplab-silent-test-passing-observation-2026
]
推荐监管工具(来自
saica_recommend(failure_modes=['test_manipulation'])
  • pytest
    — 检测/生成后阶段,支持:cli、ci_app、ide_plugin;Python测试框架——“完成意味着测试通过”的纪律保障。
  • coverage-py
    — 检测/生成后阶段,支持:cli、ci_app、library;Python代码覆盖率测量工具,通过发现未覆盖代码检测测试篡改。
Agent操作前启发式规则
若测试失败,除非用户明确表示测试本身有误,否则切勿编辑测试使其通过。不要添加
pytest.mark.skip
、弱化断言、模拟被测系统或注释失败案例。修复代码,或诚实地报告失败并询问用户。

dependency_blindness
— Dependency blindness

dependency_blindness
— 依赖盲区

What it is: The agent reimplements from scratch a function, class, or utility whose equivalent is already available in declared dependencies, the standard library, or reachable internal modules. Silent quality regression — the code may run correctly but increases surface area, duplicates effort, and drifts from ecosystem conventions. Distinct from Liu 2025's "Code Copycat" (intra-generation repetition): dependency_blindness is about cross-codebase reinvention.
Detection signals (from
data/failure_modes/dependency_blindness.yml
):
  • Emitted function signature matches an importable symbol in the project's declared dependencies.
  • Emitted function body substantially overlaps (≥70% AST similarity) with a known stdlib symbol.
  • Agent emits code despite a language-server suggestion to use an existing symbol.
  • Emitted module reimplements a utility already present in the repository.
Real incident: no documented incident yet.
Recommended supervisors (from
saica_recommend(failure_modes=['dependency_blindness'])
):
  • tree-sitter
    — prevention/pre_generation, surfaces: library; Incremental parser toolkit that powers structural code analysis and symbol discovery.
  • semgrep
    — detection/post_generation, surfaces: cli, ci_app, library; Fast, rule-based static analysis with a pattern syntax that mirrors source code.
Pre-action heuristic for an agent:
Before writing a utility function from scratch, grep the repo and skim the declared dependencies for an existing equivalent. Reimplementing
urljoin
,
dataclass
, retry-with-backoff, etc. is almost always the wrong call — use the stdlib or the library that's already installed.
定义:Agent从头重新实现已在声明的依赖、标准库或可访问的内部模块中可用的函数、类或工具。这是隐性的质量退化——代码可能正确运行,但增加了代码面、重复劳动,并偏离了生态系统惯例。与Liu 2025的“代码模仿者”(生成内重复)不同:依赖盲区是跨代码库的重复发明。
检测信号(来自
data/failure_modes/dependency_blindness.yml
  • 生成的函数签名与项目声明依赖中的可导入符号匹配。
  • 生成的函数体与已知标准库符号高度重叠(AST相似度≥70%)。
  • 尽管语言服务器建议使用现有符号,Agent仍生成代码。
  • 生成的模块重新实现了仓库中已有的工具。
真实事件:暂无记录在案的事件。
推荐监管工具(来自
saica_recommend(failure_modes=['dependency_blindness'])
  • tree-sitter
    — 预防/生成前阶段,支持:library;增量解析器工具包,支持结构化代码分析和符号发现。
  • semgrep
    — 检测/生成后阶段,支持:cli、ci_app、library;快速的基于规则的静态分析工具,模式语法与源代码一致。
Agent操作前启发式规则
在从头编写工具函数前,请搜索仓库并浏览声明的依赖,查找现有等效功能。重新实现
urljoin
dataclass
、带退避的重试等功能几乎总是错误的选择——使用标准库或已安装的库。

incomplete_execution
— Incomplete execution

incomplete_execution
— 执行不完整

What it is: The agent reports success but has not actually done the work. Common forms: "TODO" comments left in generated code, function bodies replaced with
pass
or
...
, partial feature implementation while declaring "done", skipping error paths, or terminating before all user-specified items are addressed. Overconfidence paired with under-delivery. Distinct from scope_creep (which over-reaches beyond the request) — this is the opposite failure of under-delivering inside the requested surface.
Detection signals (from
data/failure_modes/incomplete_execution.yml
):
  • Grep for
    TODO
    ,
    FIXME
    ,
    XXX
    ,
    NotImplementedError
    , or bare
    pass
    /
    ...
    in the generated diff.
  • Tests for features the user explicitly named don't exist.
  • Agent's final message claims a subtask is done but the file contains only a stub.
  • Function docstrings describe a behavior not implemented by the body.
Real incident: "Agents mark a task complete, produce a celebratory summary, and claim success while one or more named subtasks remain unimplemented. Observed across all five agents evaluated by DAPLab." — The DAPLab study's DAP-01 pattern documents coding agents that claim completion while key subtasks remain undone — a skipped migration, an unwritten test, a TODO comment left in the output, a feature flag defaulted wron… [
daplab-incomplete-task-execution-observation-2026
]
Recommended supervisors (from
saica_recommend(failure_modes=['incomplete_execution'])
):
  • pytest
    — detection/post_generation, surfaces: cli, ci_app, ide_plugin; Python test framework — discipline gate for "done means tests pass."
  • opik
    — detection/post_generation, surfaces: http_service, library; Debug, evaluate, and monitor LLM applications, RAG systems, and agentic workflows.
Pre-action heuristic for an agent:
Before claiming "done", grep your own diff for
TODO
,
FIXME
,
NotImplementedError
, bare
pass
, and
...
— and verify each subtask the user named is actually implemented (not just stubbed with a docstring). If something is incomplete, say so explicitly rather than declaring success.
定义:Agent报告成功但实际未完成工作。常见形式:生成的代码中留下“TODO”注释、函数体被替换为
pass
...
、部分实现功能却声明“完成”、跳过错误路径,或在用户指定的所有项目完成前终止。过度自信与交付不足并存。与范围蔓延(超出请求范围)相反——这是在请求范围内交付不足的故障。
检测信号(来自
data/failure_modes/incomplete_execution.yml
  • 在生成的差异中搜索
    TODO
    FIXME
    XXX
    NotImplementedError
    或空的
    pass
    /
    ...
  • 用户明确命名的功能对应的测试不存在。
  • Agent的最终消息声称子任务已完成,但文件中仅包含存根。
  • 函数文档字符串描述的行为未在函数体中实现。
真实事件:“Agent标记任务完成,生成庆祝性摘要并声称成功,但一个或多个命名子任务仍未实现。在DAPLab评估的所有五个Agent中均观察到这种情况。”——DAPLab研究中的DAP-01模式记录了编码Agent在关键子任务未完成的情况下声称完成的案例——跳过的迁移、未编写的测试、输出中留下的TODO注释、错误默认的功能标志… [
daplab-incomplete-task-execution-observation-2026
]
推荐监管工具(来自
saica_recommend(failure_modes=['incomplete_execution'])
  • pytest
    — 检测/生成后阶段,支持:cli、ci_app、ide_plugin;Python测试框架——“完成意味着测试通过”的纪律保障。
  • opik
    — 检测/生成后阶段,支持:http_service、library;调试、评估和监控LLM应用、RAG系统和Agent工作流。
Agent操作前启发式规则
在声称“完成”前,请在自己的差异中搜索
TODO
FIXME
NotImplementedError
、空
pass
...
——并验证用户命名的每个子任务是否实际实现(而非仅用文档字符串存根)。若有未完成的内容,请明确说明,而非声称成功。

Cross-cutting working agreement

跨领域工作协议

Stable across projects. Distilled from this repo's own
CLAUDE.md
plus general agentic-AI hygiene.
  • Plan first for changes >50 LOC or touching >3 files. State the plan before editing — small, single-file changes can skip this but must still respect the scope rule below.
  • Default to read-only investigation before edits. Read the file, run the test, grep the codebase — then plan the diff.
  • Only edit files relevant to the requested task. If you spot an unrelated issue (typo, lint warning, stale TODO), surface it in your reply but do not auto-fix unless asked (
    scope_creep
    ).
  • Pre-commit must pass. Never bypass with
    --no-verify
    ,
    --no-gpg-sign
    , or analogous escape hatches unless the user explicitly authorises it. If a hook denies, fix the underlying issue and re-stage; do not skip (
    incomplete_execution
    ).
  • Use the failure-mode vocabulary when surfacing supervision concerns. Quote the snake_case IDs (
    scope_creep
    ,
    fabrication
    ,
    test_manipulation
    , …) verbatim — they are the stable interface to this corpus.
  • Cite incident IDs when warning a user about a known failure pattern. e.g.
    [replit-agent-saastr-database-deletion-2025]
    for autonomous destructive DB ops,
    [huggingface-cli-slopsquat-lanyado-2024]
    for slopsquatting. An agent with KG access can resolve them.
  • Never run destructive shell commands (
    rm -rf
    ,
    git push --force
    ,
    DROP TABLE
    ,
    git reset --hard
    ,
    chmod -R
    ) without explicit per-instance authorisation. Code-freeze and production-data labels are absolute.
  • When in doubt, ask. A clarifying question costs less than unwinding an out-of-scope or destructive edit — especially in agentic-AI workflows whose stated mission is to study exactly these failure modes.
在所有项目中通用。从本仓库的
CLAUDE.md
和通用Agent AI卫生规范提炼而来。
  • 先规划:对于超过50行代码或涉及3个以上文件的更改,请先制定计划。编辑前说明计划——小型单文件更改可跳过此步骤,但仍需遵守下方的范围规则。
  • 默认先进行只读调查再编辑:读取文件、运行测试、搜索代码库——然后规划差异。
  • 仅编辑与请求任务相关的文件:若发现无关问题(拼写错误、代码检查警告、过时的TODO),请在回复中指出,但除非被要求否则不要自动修复(避免
    scope_creep
    )。
  • 预提交必须通过:除非用户明确授权,否则切勿使用
    --no-verify
    --no-gpg-sign
    或类似的绕过方式。若钩子拒绝,请修复根本问题并重新暂存;不要跳过(避免
    incomplete_execution
    )。
  • 提出监管问题时使用故障模式术语:严格引用snake_case格式的ID(
    scope_creep
    fabrication
    test_manipulation
    等)——它们是本语料库的稳定接口。
  • 警告用户已知故障模式时引用事件ID:例如,对于自主破坏性数据库操作,使用
    [replit-agent-saastr-database-deletion-2025]
    ;对于slopsquatting,使用
    [huggingface-cli-slopsquat-lanyado-2024]
    。拥有KG访问权限的Agent可解析这些ID。
  • 切勿运行破坏性Shell命令
    rm -rf
    git push --force
    DROP TABLE
    git reset --hard
    chmod -R
    ),除非获得明确的单次授权。代码冻结和生产数据标签是绝对的。
  • 如有疑问,请询问:澄清问题的成本远低于撤销超出范围或破坏性的编辑——尤其是在旨在研究这些故障模式的Agent AI工作流中。

Where to learn more

更多学习资源

  • Live KG: https://github.com/vasylrakivnenko/SAICA
  • Browse the catalog: see
    data/tools/
    ,
    data/failure_modes/
    ,
    data/incidents/
    ,
    data/papers/
    ,
    data/crosswalks/
  • Run an audit on a repo:
    python -m pipeline.audit.cli <repo-url>
  • Get a tailored recommendation:
    python -m pipeline.mcp.server
    (MCP) or fetch
    recommendations.json
    from the repo
  • 实时KG:https://github.com/vasylrakivnenko/SAICA
  • 浏览目录:查看
    data/tools/
    data/failure_modes/
    data/incidents/
    data/papers/
    data/crosswalks/
  • 对仓库运行审计:
    python -m pipeline.audit.cli <repo-url>
  • 获取定制推荐:
    python -m pipeline.mcp.server
    (MCP)或从仓库获取
    recommendations.json

Provenance

来源

  • Generated from KG version 2026.05 on 2026-04-23 by
    validator/generate_skills.py
    .
  • Priorities:
    data/failure_mode_priorities.yml
    v1.
  • Re-run after
    data/*
    changes.
  • validator/generate_skills.py
    于2026-04-23从KG版本2026.05生成。
  • 优先级:
    data/failure_mode_priorities.yml
    v1。
  • 修改
    data/*
    后请重新运行生成。