msverl-daily-regression-triage

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

MSVerl Daily Regression Triage

MSVerl 每日回归排查

Use this skill when a fixed daily

verl + MindSpeed

training job has run and Codex needs to decide whether the result is healthy, whether there is a training failure or an accuracy regression, and which recent commit is the most likely cause.

当固定的每日

verl + MindSpeed

训练作业运行完成后，若需要Codex判断结果是否正常、是否存在训练失败或精度回归问题，以及哪条近期提交最可能是问题根源时，可使用本技能。

Defaults

默认配置

Baseline comparison log:
```
/home/st_daily_verl/msverl.log
```

Training log pattern:

/home/st_daily_verl/logs/msverl_YYYYMMDD.log

verl

repo:

https://github.com/verl-project/verl.git

main

MindSpeed

repo:

https://gitcode.com/Ascend/MindSpeed.git

master

Cache root for temporary clones:
```
/tmp/msverl-skill-cache
```
Time window: from local previous day
```
00:00:00
```
to the task execution time

基线对比日志：
```
/home/st_daily_verl/msverl.log
```

训练日志路径模板：

/home/st_daily_verl/logs/msverl_YYYYMMDD.log

verl

代码仓库：

https://github.com/verl-project/verl.git

的

main

分支

MindSpeed

代码仓库：

https://gitcode.com/Ascend/MindSpeed.git

的

master

分支

临时克隆缓存根目录：
```
/tmp/msverl-skill-cache
```
时间窗口：从本地时间前一天
```
00:00:00
```
到任务执行时间

Hard Stop Rules

强制终止规则

Read the comparison log first.
If it contains
```
mean abs diff:
```
and the parsed value is exactly
```
0
```
, stop and report success.
If it contains
```
mean abs diff:
```
and the value is non-zero, classify as
```
accuracy_regression
```
.
If it contains
```
error, please check log
```
, classify as
```
train_error
```
.
If the comparison log is ambiguous, report
```
unknown
```
and explain what evidence is missing before doing expensive work.

优先读取对比日志。
若日志中包含
```
mean abs diff:
```
且解析值恰好为
```
0
```
，则终止流程并报告成功。
若日志中包含
```
mean abs diff:
```
且值非零，则归类为
```
accuracy_regression
```
（精度回归）。
若日志中包含
```
error, please check log
```
，则归类为
```
train_error
```
（训练错误）。
若对比日志内容模糊不清，则报告
```
unknown
```
（未知状态），并说明在执行高成本操作前缺少哪些必要证据。

Workflow

工作流程

Run parse_result_log.py on the comparison log.
Stop immediately on
```
pass
```
.
For
```
train_error
```
, run extract_failure_tail.py against the daily training log and keep only the final high-signal error block.
For
```
accuracy_regression
```
, use the parsed reward lists and
```
mean abs diff
```
as the primary evidence.
Sync lightweight local clones with sync_repos.py.
Collect recent commits with list_recent_commits.py for both repositories inside the default time window unless the user gives a different one.
Rank suspects with rank_candidate_commits.py.
Inspect diffs only for the top few commits when titles and touched files are not enough to explain a plausible fix direction.

在对比日志上运行parse_result_log.py脚本。
若结果为
```
pass
```
则立即终止流程。
若为
```
train_error
```
，针对每日训练日志运行extract_failure_tail.py脚本，仅保留最终的高信号错误块。
若为
```
accuracy_regression
```
，将解析得到的奖励列表和
```
mean abs diff
```
作为主要证据。
使用sync_repos.py脚本同步轻量本地克隆仓库。
使用list_recent_commits.py脚本收集两个仓库在默认时间窗口内的近期提交记录（用户指定其他时间窗口时除外）。
使用rank_candidate_commits.py脚本对可疑提交进行排序。
仅当提交标题和涉及文件不足以说明合理的修复方向时，才检查排名靠前的几个提交的差异内容。

Cost Controls

成本控制

Never load the whole training log unless the tail-based extractor fails twice.
Start with the log tail only; prefer the last traceback or last
```
ERROR
```
block.
Rank commits using title and touched files before reading diffs.
Limit deep diff reading to the top
```
3
```
candidates per repository unless the evidence is still weak.

除非基于日志尾部的提取器连续两次失败，否则绝不加载完整的训练日志。
仅从日志尾部开始分析；优先获取最后一段回溯信息或最后一个
```
ERROR
```
块。
在查看提交差异前，先通过提交标题和涉及文件对提交进行排序。
除非证据仍不充分，否则每个仓库仅对排名前
```
3
```
的候选提交进行深度差异查看。

Expected Output

预期输出

Return a compact report with:

status

pass

train_error

accuracy_regression

, or

unknown

```
time_window
```
```
evidence_summary
```
```
candidate_repo
```
```
candidate_commits
```
```
confidence
```
:
```
high
```
,
```
medium
```
, or
```
low
```
```
fix_direction
```

When evidence is weak, say so clearly instead of forcing a single-commit claim.

返回一份简洁的报告，包含以下内容：

```
status
```
：
```
pass
```
（通过）、
```
train_error
```
（训练错误）、
```
accuracy_regression
```
（精度回归）或
```
unknown
```
（未知）
```
time_window
```
：时间窗口
```
evidence_summary
```
：证据摘要
```
candidate_repo
```
：可疑仓库
```
candidate_commits
```
：可疑提交
```
confidence
```
：
```
high
```
（高）、
```
medium
```
（中）或
```
low
```
（低）
```
fix_direction
```
：修复方向

当证据不足时，需明确说明，而非强行指定某一个提交为问题根源。

References

参考信息

Run triage_msverl_regression.py for an end-to-end local workflow.
Use parse_result_log.py and extract_failure_tail.py separately when validating logs by hand.
Use list_recent_commits.py when you need a raw recent-commit inventory without ranking.

运行triage_msverl_regression.py脚本可执行完整的本地端到端工作流程。
手动验证日志时，可单独使用parse_result_log.py和extract_failure_tail.py脚本。
若仅需要未排序的原始近期提交清单，可使用list_recent_commits.py脚本。