Loading...
Loading...
Investigate a failing GitHub Actions run or job and create a GitHub issue for the failure.
npx skill4agent add nvidia/skills mcore-create-issueNVIDIA/Megatron-LMhttps://github.com/<owner>/<repo>/actions/runs/<run_id>/job/<job_id>https://github.com/<owner>/<repo>/actions/runs/<run_id>run_idjob_idjob_idrun_idgh run view <run_id> --repo NVIDIA/Megatron-LM --json jobs \
--jq '[.jobs[] | select(.conclusion == "failure") | {id: .databaseId, name: .name, url: .url}]'# Pull the raw log and keep only error-bearing lines
gh api repos/NVIDIA/Megatron-LM/actions/jobs/<job_id>/logs 2>&1 \
| grep -E "(FAILED|ERROR|\bError\b|assert|Traceback|Exception|##\[error\])" \
| head -200gh run view --job <job_id> --repo NVIDIA/Megatron-LM --json name --jq .nameFAILURESpull-request/<number>gh run view <run_id> --repo NVIDIA/Megatron-LM --json headBranch --jq .headBranch
# → e.g. "pull-request/4332"
# Extract PR number and fetch metadata:
gh pr view <pr_number> --repo NVIDIA/Megatron-LM --json number,title,url \
--jq '{number: .number, title: .title, url: .url}'main# 1. Get the PR's base branch (e.g. "main", "dev", "release/X.Y")
gh pr view <pr_number> --repo NVIDIA/Megatron-LM --json baseRefName --jq .baseRefName
# 2. Search commits on that base branch
gh api "repos/NVIDIA/Megatron-LM/commits?path=<test-file-path>&sha=<base-branch>&per_page=1" \
--jq '.[0] | {login: .author.login, name: .commit.author.name, sha: .sha}'gh api "repos/NVIDIA/Megatron-LM/pulls/<pr_number>/commits" \
--jq '[.[] | select(.files? // [] | any(.filename == "<test-file-path>"))] | .[0].author.login'FAILED tests/...::...tests/unit_tests/transformer/moe/**/*.py - latestgh issue list --repo NVIDIA/Megatron-LM \
--state open \
--search "<failed-test-filename>" \
--json number,title,url \
--limit 10--assignee <test-author-login>gh issue create \
--repo NVIDIA/Megatron-LM \
--title "🐛 CI failure: <failed-test-node-id>" \
--label "bug" \
--assignee "<test-author-login>" \
--body "..."**Describe the bug**
CI test `<failed-test-node-id>` failed in job [`<job-name>`](<job-url>).
Tag @NVIDIA/mcore-oncall to get oncall's attention to this issue.
**Failing run**
| Field | Value |
|-------|-------|
| PR | [#<pr_number>: <pr_title>](<pr_url>) |
| Run | [<run_id>](<run_url>) |
| Job | [<job_name>](<job_url>) |
**Error**
**Steps/Code to reproduce bug**
Re-run the failing CI job linked above, or locally inside the dev container:
```bash
pytest <failed-test-node-id>/triage-issue
If multiple tests failed in the same job, list each one as a separate bullet
under "Describe the bug" and include the combined error snippets. Assign the
issue to the author of whichever test file appears first in the failure list.
### 8. Report back to the user
Print the URL of the newly created issue (or the duplicate, if found) so the
user can review or share it.
## Important guidelines
- Never create an issue if a duplicate already exists — link the existing one instead.
- Always include the triggering PR link in the issue body.
- Always assign the issue to the test file's most recent author. If the author
lookup fails (e.g. the commit was made by a bot or the login is unavailable),
skip `--assignee` and note it in the "Additional context" section.
- Keep the error snippet concise (≤30 lines). Truncate long tracebacks and note that the full log is available via the job URL.
- Do not guess the root cause — quote the actual log output verbatim.
- If the job is still in progress or the logs are unavailable, say so and ask the user to retry once the run completes.