gpu-lease

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

GPU Lease

GPU租约

Use this skill before running local GPU workloads from Codex or another code agent.
GPU workloads include PyTorch training or inference, SGLang serving, Ray workers or clusters, CUDA benchmarks, and scripts that import GPU frameworks or launch GPU-serving processes.
在运行来自Codex或其他代码Agent的本地GPU工作负载前使用本技能。
GPU工作负载包括PyTorch训练或推理、SGLang服务、Ray工作节点或集群、CUDA基准测试,以及导入GPU框架或启动GPU服务进程的脚本。

Workflow

工作流程

  1. Use the machine daemon through the default socket
    /var/run/gpu-lease.sock
    . Do not start a new daemon for routine GPU work. Do not pass
    --socket
    or set
    GPU_LEASE_SOCKET
    unless the user explicitly provides another socket.
  2. Wrap every GPU command with
    gpu-lease run
    . By default, request the number of GPUs you need with
    --count
    and include
    --wait
    so the command starts when GPUs are ready:
    bash
    gpu-lease run --count 2 --wait -- python train.py --batch-size 8
    Use exact GPU IDs only when the user specifically requires fixed devices:
    bash
    gpu-lease run --ids 0,1 -- python train.py --batch-size 8
  3. Let
    gpu-lease run
    own
    CUDA_VISIBLE_DEVICES
    . Do not set it separately unless you are intentionally composing with another scheduler.
  4. Keep the GPU process as the direct child of
    gpu-lease run
    . The lease is released when that command exits.
  1. 通过默认套接字
    /var/run/gpu-lease.sock
    使用机器守护进程。日常GPU工作无需启动新的守护进程。除非用户明确提供其他套接字,否则不要传递
    --socket
    参数或设置
    GPU_LEASE_SOCKET
    环境变量。
  2. gpu-lease run
    包裹所有GPU命令。默认情况下,使用
    --count
    参数指定所需GPU数量,并添加
    --wait
    参数,以便GPU就绪后再启动命令:
    bash
    gpu-lease run --count 2 --wait -- python train.py --batch-size 8
    仅当用户明确要求固定设备时,才使用具体的GPU ID:
    bash
    gpu-lease run --ids 0,1 -- python train.py --batch-size 8
  3. gpu-lease run
    管理
    CUDA_VISIBLE_DEVICES
    。除非有意与其他调度器配合使用,否则不要单独设置该环境变量。
  4. 让GPU进程作为
    gpu-lease run
    的直接子进程。当该命令退出时,租约将被释放。

Examples

示例

bash
gpu-lease run --count 1 --wait -- python -m torch.distributed.run --nproc_per_node=1 train.py
gpu-lease run --count 2 --wait -- python -m sglang.launch_server --model-path ./model
gpu-lease run --count 4 --wait -- ray start --head --num-gpus=4
bash
gpu-lease run --count 1 --wait -- python -m torch.distributed.run --nproc_per_node=1 train.py
gpu-lease run --count 2 --wait -- python -m sglang.launch_server --model-path ./model
gpu-lease run --count 4 --wait -- ray start --head --num-gpus=4