Loading...
Loading...
Found 6 Skills
Remote SLURM GPU cluster execution over SSH with sbatch/srun, Pyxis/Enroot containers, and Lustre-backed results. Use when running TAO training/eval/inference jobs on an on-prem or DGX SLURM cluster. Trigger phrases include "run on SLURM", "submit sbatch", "DGX SLURM cluster", "Pyxis/Enroot container", "Lustre dataset".
BullMQ queue system reference for Redis-backed job queues, workers, flows, and schedulers. Use when: (1) creating queues and workers with BullMQ, (2) adding jobs (delayed, prioritized, repeatable, deduplicated), (3) setting up FlowProducer parent-child job hierarchies, (4) configuring retry strategies, rate limiting, or concurrency, (5) implementing job schedulers with cron/interval patterns, (6) preparing BullMQ for production (graceful shutdown, Redis config, monitoring), or (7) debugging stalled jobs or connection issues
Background job scheduling with sails-hook-quest for Sails.js applications. Use this skill when creating, scheduling, or managing background jobs, cron tasks, recurring scripts, or any deferred/periodic work in a Sails.js application.
Develop and deploy Lakeflow Jobs on Databricks. Use when creating data engineering jobs with notebooks, Python wheels, or SQL tasks. Invoke BEFORE starting implementation.
Submit or run an ML experiment on a compute environment (local, SLURM HPC, RunAI/Kubernetes). Use when the user wants to launch a training run, submit a job, run ablations, or execute an experiment script on any compute cluster.
Master Active Job framework for asynchronous background job processing. Use when building background workers, handling async operations, managing job queues, scheduling recurring tasks, or processing long-running operations. Covers job definition, Solid Queue backend, error handling, and advanced patterns.