h100

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

H100

Overview

概述

Use this skill to do SGLang development on the H100 box through

h100_sglang

. The default container is

sglang_bbuf

and the repo lives at

/sgl-workspace/sglang

. Prefer it whenever local validation is insufficient for CUDA, Triton, diffusion pipelines, or other GPU-backed SGLang behavior.

This environment is already prepared:

```
sglang_bbuf
```
is running on
```
lmsysorg/sglang:dev
```
the repo is cloned at
```
/sgl-workspace/sglang
```
editable installs for
```
python[all]
```
and
```
python[diffusion]
```
are already done
```
/root/.cache
```
is mounted as the cache path
Infiniband paths are mounted into the container for RDMA-aware workflows:
```
/sys/class/infiniband
```
,
```
/dev/infiniband
```
, and
```
/usr/sbin/show_gids
```

Hugging Face cache is already mounted, but do not assume

HF_TOKEN

is visible in every

docker exec

context. Interactive shells and non-interactive

docker exec ... bash -lc "<cmd>"

can behave differently. Always verify with

echo ${HF_TOKEN:+set}

before gated-model or Hub-backed runs.

使用此技能通过

h100_sglang

在H100设备上进行SGLang开发。默认容器为

sglang_bbuf

，代码仓库位于

/sgl-workspace/sglang

。当本地验证不足以覆盖CUDA、Triton、扩散流水线或其他基于GPU的SGLang行为时，优先使用此环境。

此环境已预先配置完成：

```
sglang_bbuf
```
基于
```
lmsysorg/sglang:dev
```
镜像运行
代码仓库已克隆至
```
/sgl-workspace/sglang
```
已完成
```
python[all]
```
和
```
python[diffusion]
```
的可编辑安装
```
/root/.cache
```
已挂载为缓存路径
Infiniband路径已挂载到容器中，支持RDMA相关工作流：
```
/sys/class/infiniband
```
、
```
/dev/infiniband
```
和
```
/usr/sbin/show_gids
```

Hugging Face缓存已挂载，但不要假设

HF_TOKEN

在所有

docker exec

环境中都可见。交互式shell和非交互式

docker exec... bash -lc "<cmd>"

的行为可能不同。在运行 gated-model 或基于Hub的任务前，务必通过

echo ${HF_TOKEN:+set}

验证。

Quick Start

快速开始

Check the host, container, and GPU state.

bash

ssh h100_sglang 'hostname && whoami'
ssh h100_sglang 'docker ps --format "table {{.Names}}\t{{.Status}}" | sed -n "1,20p"'
ssh h100_sglang 'nvidia-smi --query-gpu=index,name,utilization.gpu,memory.used,memory.total --format=csv,noheader,nounits'

Enter the container and repo.

bash

ssh h100_sglang 'docker exec -it sglang_bbuf /bin/zsh'
cd /sgl-workspace/sglang
echo ${HF_TOKEN:+set}

HF_TOKEN

is unexpectedly missing in the current shell, export it manually before Hub-backed workflows:

bash

export HF_TOKEN=<your-hf-token>
export HUGGINGFACE_HUB_TOKEN="$HF_TOKEN"

For non-interactive

docker exec ... bash -lc "<cmd>"

runs, prefer exporting both variables inside the command itself instead of assuming the shell startup path will populate them.

Pick a free GPU.

Use a GPU with

utilization and only a few MiB allocated. Set

CUDA_VISIBLE_DEVICES=<gpu_id>

for every GPU-backed validation command.

This host currently does not provide the
```
kill-idle
```
helper.

Do not assume you can reclaim other users' idle allocations automatically. If the free GPU list is tight, re-check

nvidia-smi

, choose another GPU, or coordinate before proceeding.

If the container is not running, start it first.

bash

ssh h100_sglang 'docker start sglang_bbuf'

检查主机、容器和GPU状态。

bash

ssh h100_sglang 'hostname && whoami'
ssh h100_sglang 'docker ps --format "table {{.Names}}\t{{.Status}}" | sed -n "1,20p"'
ssh h100_sglang 'nvidia-smi --query-gpu=index,name,utilization.gpu,memory.used,memory.total --format=csv,noheader,nounits'

进入容器和代码仓库。

bash

ssh h100_sglang 'docker exec -it sglang_bbuf /bin/zsh'
cd /sgl-workspace/sglang
echo ${HF_TOKEN:+set}

如果当前shell中

HF_TOKEN

意外缺失，请在基于Hub的工作流前手动导出：

bash

export HF_TOKEN=<your-hf-token>
export HUGGINGFACE_HUB_TOKEN="$HF_TOKEN"

对于非交互式

docker exec... bash -lc "<cmd>"

运行，建议在命令内部同时导出两个变量，不要依赖shell启动路径自动填充。

选择空闲GPU。

选择利用率为

且仅占用少量MiB内存的GPU。在所有基于GPU的验证命令中设置

CUDA_VISIBLE_DEVICES=<gpu_id>

。

当前主机未提供
```
kill-idle
```
辅助工具。

不要假设可以自动回收其他用户的空闲资源。如果空闲GPU数量紧张，请重新检查

nvidia-smi

、选择其他GPU或提前协调后再操作。

如果容器未运行，请先启动它。

bash

ssh h100_sglang 'docker start sglang_bbuf'

Safe Remote Workflow

安全远程工作流

Inspect the default repo before editing it.

bash

ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /sgl-workspace/sglang && git branch --show-current && git status --short"'

Fast-forward
```
<your-repo-path>
```
to the latest clean
```
main
```
before creating any validation worktree.

bash

ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /sgl-workspace/sglang && git fetch origin && git checkout main && git pull --ff-only origin main"'

Avoid writing directly into
```
<your-repo-path>
```
when it is dirty or when the local snapshot differs from the remote
```
HEAD
```
.
Prefer one of these isolation strategies.

Create a detached worktree for remote-only experiments:

bash

ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /sgl-workspace/sglang && git worktree add --detach /tmp/sglang_validate_h100 HEAD"'

Stream the exact local working tree into the container when validating the current local snapshot:

bash

COPYFILE_DISABLE=1 tar --exclude=.git -cf - . | \
ssh <your-h100-host> 'docker exec -i <your-container> sh -lc "rm -rf /tmp/sglang_local_validate && mkdir -p /tmp/sglang_local_validate && tar -xf - -C /tmp/sglang_local_validate"'
ssh h100_sglang 'docker exec -i sglang_bbuf sh -lc "rm -rf /tmp/sglang_local_validate && mkdir -p /tmp/sglang_local_validate && tar -xf - -C /tmp/sglang_local_validate"'
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "find /tmp/sglang_local_validate -name '\''._*'\'' -delete"'

Use the streamed copy when the goal is "validate exactly what is in the local repo right now". For patch-oriented remote validation, another good option is:

update remote
```
main
```
create a detached worktree from that clean commit
stream or apply a focused local patch diff into the worktree only

That keeps

<your-repo-path>

clean while still validating the exact local delta.

在编辑前检查默认仓库状态。

bash

ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /sgl-workspace/sglang && git branch --show-current && git status --short"'

在创建任何验证工作树前，将
```
<your-repo-path>
```
快进至最新的干净
```
main
```
分支。

bash

ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /sgl-workspace/sglang && git fetch origin && git checkout main && git pull --ff-only origin main"'

当仓库处于脏状态或本地快照与远程
```
HEAD
```
不同时，避免直接写入
```
<your-repo-path>
```
。
优先选择以下隔离策略之一。

创建独立工作树用于仅远程实验：

bash

ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /sgl-workspace/sglang && git worktree add --detach /tmp/sglang_validate_h100 HEAD"'

当需要验证当前本地快照时，将本地工作树直接流式传输到容器中：

bash

COPYFILE_DISABLE=1 tar --exclude=.git -cf - . | \
ssh <your-h100-host> 'docker exec -i <your-container> sh -lc "rm -rf /tmp/sglang_local_validate && mkdir -p /tmp/sglang_local_validate && tar -xf - -C /tmp/sglang_local_validate"'
ssh h100_sglang 'docker exec -i sglang_bbuf sh -lc "rm -rf /tmp/sglang_local_validate && mkdir -p /tmp/sglang_local_validate && tar -xf - -C /tmp/sglang_local_validate"'
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "find /tmp/sglang_local_validate -name '\''._*'\'' -delete"'

当目标是“验证本地仓库当前的准确内容”时，使用流式传输副本。对于基于补丁的远程验证，另一个不错的选择是：

更新远程
```
main
```
分支
基于该干净提交创建独立工作树
将本地补丁差异流式传输或仅应用到工作树中

这样既保持

<your-repo-path>

干净，又能验证本地的精确变更。

Validation Workflow

验证工作流

Start with import or syntax-level checks.

bash

ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /tmp/sglang_local_validate && python -m compileall python/sglang"'

For diffusion-specific edits, prefer a narrower first pass:

bash

ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /tmp/sglang_local_validate && python -m compileall python/sglang/jit_kernel/diffusion/triton python/sglang/multimodal_gen/runtime/layers"'

Run targeted tests for the changed area.

bash

ssh h100_sglang 'docker exec sglang_bbuf env PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && pytest -q path/to/test.py -q"'

For diffusion changes, start with the fused modulation regression:

bash

ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && pytest -q python/sglang/jit_kernel/tests/test_qwen_image_modulation.py -q"'

For GPU-backed changes, pin a free GPU explicitly.

bash

ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && pytest -q path/to/gpu_test.py -q"'

For kernel-heavy diffusion work, run a targeted smoke script for the changed primitives before attempting a model-level run.

Cover at least these when relevant:

```
rms_norm_fn
```
```
RMSNorm
```
under
```
torch.compile
```
```
norm_infer
```
```
apply_rotary_embedding
```

Pipe the script through

docker exec -i ... python

for pure kernel smoke.

Use a real

.py

file with

if __name__ == "__main__":

when calling

DiffGenerator.from_pretrained(..., local_mode=True)

or any flow that relies on

multiprocessing.spawn

multiprocessing.spawn

will fail if the script is executed from stdin or from unguarded top-level code.

Attempt model-level or server-level smoke only after unit, kernel, or targeted regression checks pass.

Treat checkpoint, dependency, and environment failures separately from code regressions. If a workflow reads from Hugging Face Hub, verify

HF_TOKEN

first and re-export it explicitly in the current shell or command when needed.

先进行导入或语法级检查。

bash

ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /tmp/sglang_local_validate && python -m compileall python/sglang"'

对于扩散相关的修改，优先进行更窄范围的首轮检查：

bash

ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /tmp/sglang_local_validate && python -m compileall python/sglang/jit_kernel/diffusion/triton python/sglang/multimodal_gen/runtime/layers"'

针对变更区域运行定向测试。

bash

ssh h100_sglang 'docker exec sglang_bbuf env PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && pytest -q path/to/test.py -q"'

对于扩散变更，先运行融合调制回归测试：

bash

ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && pytest -q python/sglang/jit_kernel/tests/test_qwen_image_modulation.py -q"'

对于基于GPU的变更，明确指定空闲GPU。

bash

ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && pytest -q path/to/gpu_test.py -q"'

对于内核密集型扩散工作，在尝试模型级运行前，先针对变更的原语运行定向冒烟脚本。

相关情况下至少覆盖以下内容：

```
rms_norm_fn
```
```
torch.compile
```
下的
```
RMSNorm
```
```
norm_infer
```
```
apply_rotary_embedding
```

通过

docker exec -i ... python

管道传输脚本以进行纯内核冒烟测试。

当调用

DiffGenerator.from_pretrained(..., local_mode=True)

或任何依赖

multiprocessing.spawn

的流程时，使用包含

if __name__ == "__main__":

的真实

.py

文件。

如果脚本从标准输入或无保护的顶级代码执行，

multiprocessing.spawn

会失败。

仅在单元测试、内核测试或定向回归检查通过后，再尝试模型级或服务器级冒烟测试。

将检查点、依赖和环境故障与代码回归分开处理。如果工作流需要从Hugging Face Hub读取内容，请先验证

HF_TOKEN

，必要时在当前shell或命令中显式重新导出。

Torch Compile Attribution

Torch Compile 归因分析

When a benchmark compares eager vs

torch.compile

, do not stop at the speedup number. Capture matching eager and compile traces or perf dumps, then run

scripts/analyze_diffusion_torch_compile.py

from the repo to explain where the gain came from.

当基准测试对比eager模式与

torch.compile

时，不要仅停留在加速数值上。捕获匹配的eager和compile跟踪或性能转储，然后运行仓库中的

scripts/analyze_diffusion_torch_compile.py

来解释性能提升的来源。

Cleanup

清理

Remove temporary validation directories when finished.

bash

ssh h100_sglang 'docker exec sglang_bbuf rm -rf /tmp/sglang_local_validate /tmp/sglang_validate_h100'

完成后删除临时验证目录。

bash

ssh h100_sglang 'docker exec sglang_bbuf rm -rf /tmp/sglang_local_validate /tmp/sglang_validate_h100'