ci-triage
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chineseci-triage
ci-triage
Shared logic for classifying a failed CI workflow run and recording the
outcome. Used by humans (or Claude in an interactive session) when
triaging a red workflow or a red check on an open PR — the latter
via the consumer repo's skill (e.g.
, , ).
main<repo>-pr-lifecycleonsager-pr-lifecyclelean-spec-pr-lifecycleduhem-pr-lifecycleThis skill owns the taxonomy, the de-dup rules for the issue, and
the issue template. Workflow-specific reproduction steps live in the
consumer repo's skill (for build-tool-specific
detail) and in (for e2e).
main-red<repo>-pr-lifecycleweb-testing用于分类失败的CI工作流运行并记录结果的共享逻辑。当处理分支故障工作流或开放PR上的故障检查时,由人工(或交互式会话中的Claude)使用——后者通过消费仓库的技能调用(例如、、)。
main<repo>-pr-lifecycleonsager-pr-lifecyclelean-spec-pr-lifecycleduhem-pr-lifecycle该技能负责分类体系、议题的去重规则以及议题模板。工作流特定的复现步骤存放在消费仓库的技能中(针对构建工具的细节)和技能中(针对e2e测试)。
main-red<repo>-pr-lifecycleweb-testingTaxonomy
分类体系
Every failure lands in exactly one bucket. Be explicit — "unclear" is not a
bucket, is.
needs-human| Bucket | Signal | Default action |
|---|---|---|
| Reproduces deterministically on | File/update |
| Same workflow passed on the previous main commit without code change | Comment on the existing |
| Postgres service didn't come up, rustup 403, pnpm registry down | File/update |
| Logs truncated, auth failed, classification genuinely ambiguous | Open a |
Do not invent a fifth bucket. If the signal you see fits nothing above,
it's .
needs-human每个故障都属于且仅属于一个分类。分类需明确——“不明确”不属于分类,才是。
needs-human| Bucket | 信号描述 | 默认操作 |
|---|---|---|
| 在 | 创建/更新 |
| 未修改代码的情况下,相同工作流在上一个main分支提交时运行成功 | 若已有开放的 |
| Postgres服务未启动、rustup 403错误、pnpm registry宕机等 | 创建/更新带 |
| 日志被截断、认证失败、分类结果确实不明确 | 创建包含原始日志片段的 |
请勿新增第五个分类。如果遇到的信号不符合上述任何一种,归为。
needs-humanSuspect commit identification
可疑提交识别
The GitHub payload includes . That's the commit that
triggered this run — on a event, it is the merge commit.
workflow_runhead_shapush: main- Fetch the commit via .
mcp__github__get_commit - If the commit message matches , the suspect PR is
Merge pull request #N; its author is the suspect author.#N - Otherwise (direct push to main, squash-merge), use the commit author directly.
Never blame more than one commit per failure. If the previous main commit's
CI was also red, link that issue rather than opening a new one.
GitHub的负载包含,即触发本次运行的提交——在事件中,它是合并提交。
workflow_runhead_shapush: main- 通过获取提交信息。
mcp__github__get_commit - 如果提交信息匹配,则可疑PR为
Merge pull request #N;其作者为可疑作者。#N - 其他情况(直接推送到main分支、 squash合并),直接使用提交作者。
每次故障最多指向一个提交。如果上一个main分支提交的CI也失败,链接到该议题而非创建新议题。
The rolling main-red
issue
main-red滚动main-red
议题
main-redOne open issue at a time. If main is broken for three days,
that's one issue accumulating comments — not twelve.
main-redBefore filing:
- with
mcp__github__list_issues.labels: main-red, state: open - If one exists, append a comment:
Run #<run-id> also failed. Workflow, bucket
<name>, suspect <sha-short> (#<pr-or-none>, @<author>). <one-line-cause>.<bucket> - Only open a new issue if none is open. Title:
. Labels:
main is red: <workflow> (<bucket>), plusmain-redorinfraif applicable.needs-human
When main goes green again (the next successful run on the same workflow),
close the issue with a comment naming the green run id. This close step
is manual.
同一时间仅保留一个开放的议题。如果main分支故障持续三天,应在同一个议题中累积评论——而非创建12个议题。
main-red创建前:
- 使用,参数为
mcp__github__list_issues。labels: main-red, state: open - 如果存在开放议题,添加评论:
运行#<run-id>同样失败。工作流,分类
<name>,可疑提交<sha-short> (#<pr-or-none>, @<author>)。<故障简要说明>。<bucket> - 仅当没有开放议题时才创建新议题。标题:
。标签:
main is red: <workflow> (<bucket>),若适用则添加main-red或infra。needs-human
当main分支恢复正常(同一工作流的下一次成功运行)时,关闭该议题并在评论中注明恢复正常的运行ID。关闭操作需手动执行。
Issue body template
议题正文模板
markdown
**Workflow**: <workflow-name>
**Run**: <run-url>
**First failed step**: <step-name>
**Bucket**: <regression|flake|infra|needs-human>
**Suspect**: <sha-short> — <commit-subject> (PR #<n>, @<author>)markdown
**Workflow**: <workflow-name>
**Run**: <run-url>
**First failed step**: <step-name>
**Bucket**: <regression|flake|infra|needs-human>
**Suspect**: <sha-short> — <commit-subject> (PR #<n>, @<author>)Failure excerpt
Failure excerpt
<last 30 lines of the failing step, or the ripgrep-extracted error block>
<last 30 lines of the failing step, or the ripgrep-extracted error block>
Reproduction
Reproduction
<one of:>
- Deterministic: `<exact command from the workflow yaml>`
- Flake: passed on <prev-sha-short>; rerun button: <run-url>/attempts/2
- Infra: <service name> — <symptom>
- Needs human: <why the logs are ambiguous>
<one of:>
- Deterministic: `<exact command from the workflow yaml>`
- Flake: passed on <prev-sha-short>; rerun button: <run-url>/attempts/2
- Infra: <service name> — <symptom>
- Needs human: <why the logs are ambiguous>
Next action
Next action
<one line — "revert #N and reland", "rerun", "fix <specific thing>", "human eyes">
Keep the excerpt tight. Dumping the full log helps nobody.<one line — "revert #N and reland", "rerun", "fix <specific thing>", "human eyes">
日志片段需精简。粘贴完整日志毫无帮助。Reproducing locally
本地复现
A human invoking this skill via the consumer repo's
skill should reproduce before filing, using the
commands in that skill's CI-triage section.
For a failure caught from outside a PR, check out at the
suspect SHA and run the same commands locally before filing.
<repo>-pr-lifecyclemainmainFor failures specifically, delegate classification to
's triage mode — it handles
regression-vs-flake for browser-driven tests (the ambiguous case).
e2eweb-testing通过消费仓库技能调用本技能的人员,应在创建议题前使用该技能CI处理部分的命令进行复现。对于在PR外发现的分支故障,应检出可疑SHA对应的分支,在创建议题前本地运行相同命令。
<repo>-pr-lifecyclemainmain针对故障,需将分类工作委托给的处理模式——它负责处理浏览器驱动测试的回归vs偶发故障(模糊场景)。
e2eweb-testingLog access
日志访问
WebFetch- with
mcp__github__pull_request_read— step names, status, timings (no log body).method: get_check_runs - The workflow run's via
jobs_url+mcp__github__get_committraversal — same metadata.check_suite
Log bodies are not reliably accessible from the GitHub MCP. When the log
body is unavailable, classify from step names + exit codes + the workflow
yaml, and bias toward rather than guessing.
needs-humanWebFetch- ,参数
mcp__github__pull_request_read——步骤名称、状态、耗时(无日志内容)。method: get_check_runs - 通过+
mcp__github__get_commit遍历获取工作流运行的check_suite——仅元数据。jobs_url
GitHub MCP无法可靠获取日志内容。当无法获取日志内容时,根据步骤名称、退出码和工作流yaml进行分类,优先归为而非猜测。
needs-humanFlake-detection heuristic
偶发故障(flake)检测启发式
A failure is only if both hold:
flake- The same workflow on the previous main commit passed (check via
+ check runs on the prior sha).
mcp__github__list_commits - The failing step's logs do not contain a symbol name, file path, or assertion message that appears in the suspect commit's diff.
One of those alone is not enough. A deterministic regression can pass on the
prior commit; a real flake can mention a touched file by coincidence.
仅当同时满足以下两个条件时,故障才归为:
flake- 上一个main分支提交的相同工作流运行成功(通过+ 前一个sha的检查运行验证)。
mcp__github__list_commits - 失败步骤的日志中不包含可疑提交diff中出现的符号名、文件路径或断言消息。
仅满足其中一个条件不足以判定。确定性回归故障也可能在上一个提交时运行成功;真实的偶发故障也可能巧合提及被修改的文件。
Constraints
约束
- Never open a PR from this skill. Triage is read-only on the codebase.
- Never @-mention for or
infrabuckets — alert fatigue kills the signal.flake - Never close a issue without a green run id to cite.
main-red - Scope: any GitHub-Actions–driven repo. The taxonomy and rolling
pattern are repo-agnostic; the consumer repo's CLAUDE.md can override triggers and label conventions.
main-red
- 切勿通过本技能创建PR。本技能仅对代码库进行只读处理。
- 切勿针对或
infra分类@提及人员——告警疲劳会削弱信号。flake - 切勿在未引用恢复正常的运行ID的情况下关闭议题。
main-red - 适用范围:任何GitHub Actions驱动的仓库。分类体系和滚动模式与仓库无关;消费仓库的CLAUDE.md可覆盖触发条件和标签约定。
main-red
Relationship to other surfaces
与其他技能的关系
| Surface | Role |
|---|---|
| Interactive caller; humans use this when triaging a red PR check. |
| Delegated to for |
| Surface | 角色 |
|---|---|
| 交互式调用者;人员处理PR故障检查时使用本技能。 |
| 负责 |
| ", |