gpu-lease

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

GPU Lease

GPU租约

Use this skill before running local GPU workloads from Codex or another code agent.

GPU workloads include PyTorch training or inference, SGLang serving, Ray workers or clusters, CUDA benchmarks, and scripts that import GPU frameworks or launch GPU-serving processes.

在运行来自Codex或其他代码Agent的本地GPU工作负载前使用本技能。

GPU工作负载包括PyTorch训练或推理、SGLang服务、Ray工作节点或集群、CUDA基准测试，以及导入GPU框架或启动GPU服务进程的脚本。

Workflow

工作流程

Use the machine daemon through the default socket
```
/var/run/gpu-lease.sock
```
. Do not start a new daemon for routine GPU work. Do not pass
```
--socket
```
or set
```
GPU_LEASE_SOCKET
```
unless the user explicitly provides another socket.
Wrap every GPU command with
```
gpu-lease run
```
. By default, request the number of GPUs you need with
```
--count
```
and include
```
--wait
```
so the command starts when GPUs are ready:
bash
```
gpu-lease run --count 2 --wait -- python train.py --batch-size 8
```
Use exact GPU IDs only when the user specifically requires fixed devices:
bash
```
gpu-lease run --ids 0,1 -- python train.py --batch-size 8
```
Let
```
gpu-lease run
```
own
```
CUDA_VISIBLE_DEVICES
```
. Do not set it separately unless you are intentionally composing with another scheduler.
Keep the GPU process as the direct child of
```
gpu-lease run
```
. The lease is released when that command exits.

通过默认套接字
```
/var/run/gpu-lease.sock
```
使用机器守护进程。日常GPU工作无需启动新的守护进程。除非用户明确提供其他套接字，否则不要传递
```
--socket
```
参数或设置
```
GPU_LEASE_SOCKET
```
环境变量。
用
```
gpu-lease run
```
包裹所有GPU命令。默认情况下，使用
```
--count
```
参数指定所需GPU数量，并添加
```
--wait
```
参数，以便GPU就绪后再启动命令：
bash
```
gpu-lease run --count 2 --wait -- python train.py --batch-size 8
```
仅当用户明确要求固定设备时，才使用具体的GPU ID：
bash
```
gpu-lease run --ids 0,1 -- python train.py --batch-size 8
```
让
```
gpu-lease run
```
管理
```
CUDA_VISIBLE_DEVICES
```
。除非有意与其他调度器配合使用，否则不要单独设置该环境变量。
让GPU进程作为
```
gpu-lease run
```
的直接子进程。当该命令退出时，租约将被释放。

Examples

示例

bash

gpu-lease run --count 1 --wait -- python -m torch.distributed.run --nproc_per_node=1 train.py
gpu-lease run --count 2 --wait -- python -m sglang.launch_server --model-path ./model
gpu-lease run --count 4 --wait -- ray start --head --num-gpus=4

bash

gpu-lease run --count 1 --wait -- python -m torch.distributed.run --nproc_per_node=1 train.py
gpu-lease run --count 2 --wait -- python -m sglang.launch_server --model-path ./model
gpu-lease run --count 4 --wait -- ray start --head --num-gpus=4