msverl-daily-regression-triage
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMSVerl Daily Regression Triage
MSVerl 每日回归排查
Use this skill when a fixed daily training job has run and Codex needs to decide whether the result is healthy, whether there is a training failure or an accuracy regression, and which recent commit is the most likely cause.
verl + MindSpeed当固定的每日训练作业运行完成后,若需要Codex判断结果是否正常、是否存在训练失败或精度回归问题,以及哪条近期提交最可能是问题根源时,可使用本技能。
verl + MindSpeedDefaults
默认配置
- Baseline comparison log:
/home/st_daily_verl/msverl.log - Training log pattern:
/home/st_daily_verl/logs/msverl_YYYYMMDD.log - repo:
verlonhttps://github.com/verl-project/verl.gitmain - repo:
MindSpeedonhttps://gitcode.com/Ascend/MindSpeed.gitmaster - Cache root for temporary clones:
/tmp/msverl-skill-cache - Time window: from local previous day to the task execution time
00:00:00
- 基线对比日志:
/home/st_daily_verl/msverl.log - 训练日志路径模板:
/home/st_daily_verl/logs/msverl_YYYYMMDD.log - 代码仓库:
verl的https://github.com/verl-project/verl.git分支main - 代码仓库:
MindSpeed的https://gitcode.com/Ascend/MindSpeed.git分支master - 临时克隆缓存根目录:
/tmp/msverl-skill-cache - 时间窗口:从本地时间前一天到任务执行时间
00:00:00
Hard Stop Rules
强制终止规则
- Read the comparison log first.
- If it contains and the parsed value is exactly
mean abs diff:, stop and report success.0 - If it contains and the value is non-zero, classify as
mean abs diff:.accuracy_regression - If it contains , classify as
error, please check log.train_error - If the comparison log is ambiguous, report and explain what evidence is missing before doing expensive work.
unknown
- 优先读取对比日志。
- 若日志中包含且解析值恰好为
mean abs diff:,则终止流程并报告成功。0 - 若日志中包含且值非零,则归类为
mean abs diff:(精度回归)。accuracy_regression - 若日志中包含,则归类为
error, please check log(训练错误)。train_error - 若对比日志内容模糊不清,则报告(未知状态),并说明在执行高成本操作前缺少哪些必要证据。
unknown
Workflow
工作流程
- Run parse_result_log.py on the comparison log.
- Stop immediately on .
pass - For , run extract_failure_tail.py against the daily training log and keep only the final high-signal error block.
train_error - For , use the parsed reward lists and
accuracy_regressionas the primary evidence.mean abs diff - Sync lightweight local clones with sync_repos.py.
- Collect recent commits with list_recent_commits.py for both repositories inside the default time window unless the user gives a different one.
- Rank suspects with rank_candidate_commits.py.
- Inspect diffs only for the top few commits when titles and touched files are not enough to explain a plausible fix direction.
- 在对比日志上运行parse_result_log.py脚本。
- 若结果为则立即终止流程。
pass - 若为,针对每日训练日志运行extract_failure_tail.py脚本,仅保留最终的高信号错误块。
train_error - 若为,将解析得到的奖励列表和
accuracy_regression作为主要证据。mean abs diff - 使用sync_repos.py脚本同步轻量本地克隆仓库。
- 使用list_recent_commits.py脚本收集两个仓库在默认时间窗口内的近期提交记录(用户指定其他时间窗口时除外)。
- 使用rank_candidate_commits.py脚本对可疑提交进行排序。
- 仅当提交标题和涉及文件不足以说明合理的修复方向时,才检查排名靠前的几个提交的差异内容。
Cost Controls
成本控制
- Never load the whole training log unless the tail-based extractor fails twice.
- Start with the log tail only; prefer the last traceback or last block.
ERROR - Rank commits using title and touched files before reading diffs.
- Limit deep diff reading to the top candidates per repository unless the evidence is still weak.
3
- 除非基于日志尾部的提取器连续两次失败,否则绝不加载完整的训练日志。
- 仅从日志尾部开始分析;优先获取最后一段回溯信息或最后一个块。
ERROR - 在查看提交差异前,先通过提交标题和涉及文件对提交进行排序。
- 除非证据仍不充分,否则每个仓库仅对排名前的候选提交进行深度差异查看。
3
Expected Output
预期输出
Return a compact report with:
- :
status,pass,train_error, oraccuracy_regressionunknown time_windowevidence_summarycandidate_repocandidate_commits- :
confidence,high, ormediumlow fix_direction
When evidence is weak, say so clearly instead of forcing a single-commit claim.
返回一份简洁的报告,包含以下内容:
- :
status(通过)、pass(训练错误)、train_error(精度回归)或accuracy_regression(未知)unknown - :时间窗口
time_window - :证据摘要
evidence_summary - :可疑仓库
candidate_repo - :可疑提交
candidate_commits - :
confidence(高)、high(中)或medium(低)low - :修复方向
fix_direction
当证据不足时,需明确说明,而非强行指定某一个提交为问题根源。
References
参考信息
- Run triage_msverl_regression.py for an end-to-end local workflow.
- Use parse_result_log.py and extract_failure_tail.py separately when validating logs by hand.
- Use list_recent_commits.py when you need a raw recent-commit inventory without ranking.
- 运行triage_msverl_regression.py脚本可执行完整的本地端到端工作流程。
- 手动验证日志时,可单独使用parse_result_log.py和extract_failure_tail.py脚本。
- 若仅需要未排序的原始近期提交清单,可使用list_recent_commits.py脚本。