exec-remote
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseRemote Execution Skill
远程执行技能
This skill handles running code on remote GPU or TPU clusters via SkyPilot.
本技能负责通过SkyPilot在远程GPU或TPU集群上运行代码。
1. Determine Target Device
1. 确定目标设备
Identify the target device from the user's request:
| Target | Cluster name file | Launch script | UV extra | Env prefix |
|---|---|---|---|---|
| GPU | | | | |
| TPU | | | | (none) |
Execution Instructions:
Before running the launch script, you must find its absolute path. It is located in the directory alongside this skill definition. Use your file search tools (e.g., or ) to locate or before executing it.
scripts/globfindlaunch_gpu.shlaunch_tpu.shIf the user does not specify a device, ask them which one to use.
从用户的请求中识别目标设备:
| 目标 | 集群名称文件 | 启动脚本 | UV额外参数 | 环境变量前缀 |
|---|---|---|---|---|
| GPU | | | | |
| TPU | | | | (无) |
执行说明:
在运行启动脚本之前,你必须找到它的绝对路径。它位于此技能定义所在的目录下。在执行前,请使用文件搜索工具(如或)定位或。
scripts/globfindlaunch_gpu.shlaunch_tpu.sh如果用户未指定设备,请询问他们要使用哪一种。
2. Prerequisites
2. 前提条件
- The cluster must already be provisioned. Check that the corresponding cluster name file (or
.cluster_name_gpu) exists and is non-empty in the project root..cluster_name_tpu - If the file does not exist or is empty, ask the user to provision a cluster first using the appropriate launch script.
- 集群必须已完成配置。请检查项目根目录下是否存在对应的集群名称文件(或
.cluster_name_gpu)且文件非空。.cluster_name_tpu - 如果文件不存在或为空,请要求用户先使用相应的启动脚本配置集群。
3. Cluster Management
3. 集群管理
Provisioning
配置
bash
undefinedbash
undefinedNote: First locate the scripts as instructed above, then run them.
注意:先按照上述说明定位脚本,再运行它们。
GPU — common accelerator types: H100:1, A100:1, L4:1
GPU — 常见加速器类型:H100:1, A100:1, L4:1
bash <absolute_path_to_launch_gpu.sh> <accelerator_type> <experiment_name>
bash <absolute_path_to_launch_gpu.sh> <accelerator_type> <experiment_name>
TPU — common accelerator types: tpu-v4-8, tpu-v4-16, tpu-v6e-1, tpu-v6e-4
TPU — 常见加速器类型:tpu-v4-8, tpu-v4-16, tpu-v6e-1, tpu-v6e-4
bash <absolute_path_to_launch_tpu.sh> <accelerator_type> <experiment_name>
The launch script automatically updates the corresponding `.cluster_name_*` file.bash <absolute_path_to_launch_tpu.sh> <accelerator_type> <experiment_name>
启动脚本会自动更新对应的`.cluster_name_*`文件。Teardown
销毁
bash
undefinedbash
undefinedGPU
GPU
sky down $(cat .cluster_name_gpu) -y
sky down $(cat .cluster_name_gpu) -y
TPU
TPU
sky down $(cat .cluster_name_tpu) -y
undefinedsky down $(cat .cluster_name_tpu) -y
undefined4. Execution Command
4. 执行命令
GPU
GPU
bash
sky exec $(cat .cluster_name_gpu) --workdir . "export CUDA_VISIBLE_DEVICES=0; uv run --extra gpu python <PATH_TO_SCRIPT> [ARGS]"- ensures deterministic single-GPU execution. Adjust for multi-GPU jobs.
export CUDA_VISIBLE_DEVICES=0; - activates GPU optional dependencies (e.g.
--extra gpu).jax[cuda]
bash
sky exec $(cat .cluster_name_gpu) --workdir . "export CUDA_VISIBLE_DEVICES=0; uv run --extra gpu python <PATH_TO_SCRIPT> [ARGS]"- 确保确定性的单GPU执行。多GPU任务可按需调整。
export CUDA_VISIBLE_DEVICES=0; - 激活GPU可选依赖(如
--extra gpu)。jax[cuda]
TPU
TPU
bash
sky exec $(cat .cluster_name_tpu) --workdir . "uv run --extra tpu python <PATH_TO_SCRIPT> [ARGS]"- activates TPU optional dependencies (e.g.
--extra tpu).jax[tpu]
bash
sky exec $(cat .cluster_name_tpu) --workdir . "uv run --extra tpu python <PATH_TO_SCRIPT> [ARGS]"- 激活TPU可选依赖(如
--extra tpu)。jax[tpu]
Common flags
通用参数
- syncs the current local directory to the remote instance before running.
--workdir . - For pytest, use instead of calling pytest directly.
python -m pytest <test_path>
- 在运行前将当前本地目录同步到远程实例。
--workdir . - 对于pytest,请使用而非直接调用pytest。
python -m pytest <test_path>
5. Usage Examples
5. 使用示例
Run a benchmark on GPU:
bash
sky exec $(cat .cluster_name_gpu) --workdir . "export CUDA_VISIBLE_DEVICES=0; uv run --extra gpu python src/lynx/perf/benchmark_train.py"Run tests on TPU:
bash
sky exec $(cat .cluster_name_tpu) --workdir . "uv run --extra tpu python -m pytest src/lynx/test/"在GPU上运行基准测试:
bash
sky exec $(cat .cluster_name_gpu) --workdir . "export CUDA_VISIBLE_DEVICES=0; uv run --extra gpu python src/lynx/perf/benchmark_train.py"在TPU上运行测试:
bash
sky exec $(cat .cluster_name_tpu) --workdir . "uv run --extra tpu python -m pytest src/lynx/test/"6. Operational Notes
6. 操作注意事项
- Logs: SkyPilot streams and
stdoutdirectly to the terminal.stderr - Interruption: may not kill the remote process; check SkyPilot docs for cleanup if needed.
Ctrl+C
- 日志: SkyPilot会将和
stdout直接流式传输到终端。stderr - 中断: 可能无法终止远程进程;如有需要,请查阅SkyPilot文档了解清理方法。
Ctrl+C