Loading...
Loading...
Monitor submitted jobs (PTQ, evaluation, deployment) on SLURM clusters. Use when the user asks "check job status", "is my job done", "monitor my evaluation", "what's the status of the PTQ", "check on job <slurm_job_id>", or after any skill submits a long-running job. Also triggers on "nel status", "squeue", or any request to check progress of a previously submitted job.
npx skill4agent add nvidia/skills monitor.claude/active_jobs.json[
{
"type": "nel",
"id": "<invocation_id or slurm_job_id>",
"host": "<cluster_hostname>",
"user": "<ssh_user>",
"submitted": "YYYY-MM-DD HH:MM",
"description": "<what this job does>",
"last_status": "<last known status>"
}
]typenelslurmlauncher.claude/active_jobs.json.claude/active_jobs.jsonlast_statuslast_statustype: nelnel status <id>nel info <id>nel info <id> --logstype: launcherTracebackErrorFAILEDtype: slurmssh <host> "squeue -j <id> -h -o '%T %M %R'"ssh <host> "sacct -j <id> --format=State,ExitCode,Elapsed -n".claude/active_jobs.jsonnel ls runs --since 1dssh <host> "squeue -u <user>"ls -lt tools/launcher/experiments/cicd/ | head -10