h100
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseH100
H100
Overview
概述
Use this skill to do SGLang development on the H100 box through .
The default container is and the repo lives at .
Prefer it whenever local validation is insufficient for CUDA, Triton, diffusion pipelines, or other GPU-backed SGLang behavior.
h100_sglangsglang_bbuf/sgl-workspace/sglangThis environment is already prepared:
- is running on
sglang_bbuflmsysorg/sglang:dev - the repo is cloned at
/sgl-workspace/sglang - editable installs for and
python[all]are already donepython[diffusion] - is mounted as the cache path
/root/.cache - Infiniband paths are mounted into the container for RDMA-aware workflows:
,
/sys/class/infiniband, and/dev/infiniband/usr/sbin/show_gids
Hugging Face cache is already mounted, but do not assume is visible in
every context. Interactive shells and non-interactive can behave differently. Always verify with
before gated-model or Hub-backed runs.
HF_TOKENdocker execdocker exec ... bash -lc "<cmd>"echo ${HF_TOKEN:+set}使用此技能通过在H100设备上进行SGLang开发。默认容器为,代码仓库位于。当本地验证不足以覆盖CUDA、Triton、扩散流水线或其他基于GPU的SGLang行为时,优先使用此环境。
h100_sglangsglang_bbuf/sgl-workspace/sglang此环境已预先配置完成:
- 基于
sglang_bbuf镜像运行lmsysorg/sglang:dev - 代码仓库已克隆至
/sgl-workspace/sglang - 已完成和
python[all]的可编辑安装python[diffusion] - 已挂载为缓存路径
/root/.cache - Infiniband路径已挂载到容器中,支持RDMA相关工作流:、
/sys/class/infiniband和/dev/infiniband/usr/sbin/show_gids
Hugging Face缓存已挂载,但不要假设在所有环境中都可见。交互式shell和非交互式的行为可能不同。在运行 gated-model 或基于Hub的任务前,务必通过验证。
HF_TOKENdocker execdocker exec... bash -lc "<cmd>"echo ${HF_TOKEN:+set}Quick Start
快速开始
- Check the host, container, and GPU state.
bash
ssh h100_sglang 'hostname && whoami'
ssh h100_sglang 'docker ps --format "table {{.Names}}\t{{.Status}}" | sed -n "1,20p"'
ssh h100_sglang 'nvidia-smi --query-gpu=index,name,utilization.gpu,memory.used,memory.total --format=csv,noheader,nounits'- Enter the container and repo.
bash
ssh h100_sglang 'docker exec -it sglang_bbuf /bin/zsh'
cd /sgl-workspace/sglang
echo ${HF_TOKEN:+set}If is unexpectedly missing in the current shell, export it manually before Hub-backed workflows:
HF_TOKENbash
export HF_TOKEN=<your-hf-token>
export HUGGINGFACE_HUB_TOKEN="$HF_TOKEN"For non-interactive runs, prefer exporting both
variables inside the command itself instead of assuming the shell startup path
will populate them.
docker exec ... bash -lc "<cmd>"- Pick a free GPU.
Use a GPU with utilization and only a few MiB allocated.
Set for every GPU-backed validation command.
0CUDA_VISIBLE_DEVICES=<gpu_id>- This host currently does not provide the helper.
kill-idle
Do not assume you can reclaim other users' idle allocations automatically.
If the free GPU list is tight, re-check , choose another GPU, or coordinate before proceeding.
nvidia-smi- If the container is not running, start it first.
bash
ssh h100_sglang 'docker start sglang_bbuf'- 检查主机、容器和GPU状态。
bash
ssh h100_sglang 'hostname && whoami'
ssh h100_sglang 'docker ps --format "table {{.Names}}\t{{.Status}}" | sed -n "1,20p"'
ssh h100_sglang 'nvidia-smi --query-gpu=index,name,utilization.gpu,memory.used,memory.total --format=csv,noheader,nounits'- 进入容器和代码仓库。
bash
ssh h100_sglang 'docker exec -it sglang_bbuf /bin/zsh'
cd /sgl-workspace/sglang
echo ${HF_TOKEN:+set}如果当前shell中意外缺失,请在基于Hub的工作流前手动导出:
HF_TOKENbash
export HF_TOKEN=<your-hf-token>
export HUGGINGFACE_HUB_TOKEN="$HF_TOKEN"对于非交互式运行,建议在命令内部同时导出两个变量,不要依赖shell启动路径自动填充。
docker exec... bash -lc "<cmd>"- 选择空闲GPU。
选择利用率为且仅占用少量MiB内存的GPU。在所有基于GPU的验证命令中设置。
0CUDA_VISIBLE_DEVICES=<gpu_id>- 当前主机未提供辅助工具。
kill-idle
不要假设可以自动回收其他用户的空闲资源。如果空闲GPU数量紧张,请重新检查、选择其他GPU或提前协调后再操作。
nvidia-smi- 如果容器未运行,请先启动它。
bash
ssh h100_sglang 'docker start sglang_bbuf'Safe Remote Workflow
安全远程工作流
- Inspect the default repo before editing it.
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /sgl-workspace/sglang && git branch --show-current && git status --short"'- Fast-forward to the latest clean
<your-repo-path>before creating any validation worktree.main
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /sgl-workspace/sglang && git fetch origin && git checkout main && git pull --ff-only origin main"'-
Avoid writing directly intowhen it is dirty or when the local snapshot differs from the remote
<your-repo-path>.HEAD -
Prefer one of these isolation strategies.
Create a detached worktree for remote-only experiments:
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /sgl-workspace/sglang && git worktree add --detach /tmp/sglang_validate_h100 HEAD"'Stream the exact local working tree into the container when validating the current local snapshot:
bash
COPYFILE_DISABLE=1 tar --exclude=.git -cf - . | \
ssh <your-h100-host> 'docker exec -i <your-container> sh -lc "rm -rf /tmp/sglang_local_validate && mkdir -p /tmp/sglang_local_validate && tar -xf - -C /tmp/sglang_local_validate"'
ssh h100_sglang 'docker exec -i sglang_bbuf sh -lc "rm -rf /tmp/sglang_local_validate && mkdir -p /tmp/sglang_local_validate && tar -xf - -C /tmp/sglang_local_validate"'
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "find /tmp/sglang_local_validate -name '\''._*'\'' -delete"'Use the streamed copy when the goal is "validate exactly what is in the local repo right now".
For patch-oriented remote validation, another good option is:
- update remote
main - create a detached worktree from that clean commit
- stream or apply a focused local patch diff into the worktree only
That keeps clean while still validating the exact local delta.
<your-repo-path>- 在编辑前检查默认仓库状态。
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /sgl-workspace/sglang && git branch --show-current && git status --short"'- 在创建任何验证工作树前,将快进至最新的干净
<your-repo-path>分支。main
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /sgl-workspace/sglang && git fetch origin && git checkout main && git pull --ff-only origin main"'-
当仓库处于脏状态或本地快照与远程不同时,避免直接写入
HEAD。<your-repo-path> -
优先选择以下隔离策略之一。
创建独立工作树用于仅远程实验:
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /sgl-workspace/sglang && git worktree add --detach /tmp/sglang_validate_h100 HEAD"'当需要验证当前本地快照时,将本地工作树直接流式传输到容器中:
bash
COPYFILE_DISABLE=1 tar --exclude=.git -cf - . | \
ssh <your-h100-host> 'docker exec -i <your-container> sh -lc "rm -rf /tmp/sglang_local_validate && mkdir -p /tmp/sglang_local_validate && tar -xf - -C /tmp/sglang_local_validate"'
ssh h100_sglang 'docker exec -i sglang_bbuf sh -lc "rm -rf /tmp/sglang_local_validate && mkdir -p /tmp/sglang_local_validate && tar -xf - -C /tmp/sglang_local_validate"'
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "find /tmp/sglang_local_validate -name '\''._*'\'' -delete"'当目标是“验证本地仓库当前的准确内容”时,使用流式传输副本。对于基于补丁的远程验证,另一个不错的选择是:
- 更新远程分支
main - 基于该干净提交创建独立工作树
- 将本地补丁差异流式传输或仅应用到工作树中
这样既保持干净,又能验证本地的精确变更。
<your-repo-path>Validation Workflow
验证工作流
- Start with import or syntax-level checks.
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /tmp/sglang_local_validate && python -m compileall python/sglang"'For diffusion-specific edits, prefer a narrower first pass:
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /tmp/sglang_local_validate && python -m compileall python/sglang/jit_kernel/diffusion/triton python/sglang/multimodal_gen/runtime/layers"'- Run targeted tests for the changed area.
bash
ssh h100_sglang 'docker exec sglang_bbuf env PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && pytest -q path/to/test.py -q"'For diffusion changes, start with the fused modulation regression:
bash
ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && pytest -q python/sglang/jit_kernel/tests/test_qwen_image_modulation.py -q"'- For GPU-backed changes, pin a free GPU explicitly.
bash
ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && pytest -q path/to/gpu_test.py -q"'- For kernel-heavy diffusion work, run a targeted smoke script for the changed primitives before attempting a model-level run.
Cover at least these when relevant:
rms_norm_fn- under
RMSNormtorch.compile norm_inferapply_rotary_embedding
Pipe the script through for pure kernel smoke.
docker exec -i ... python- Use a real file with
.pywhen callingif __name__ == "__main__":or any flow that relies onDiffGenerator.from_pretrained(..., local_mode=True).multiprocessing.spawn
multiprocessing.spawn- Attempt model-level or server-level smoke only after unit, kernel, or targeted regression checks pass.
Treat checkpoint, dependency, and environment failures separately from code regressions.
If a workflow reads from Hugging Face Hub, verify first and re-export it
explicitly in the current shell or command when needed.
HF_TOKEN- 先进行导入或语法级检查。
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /tmp/sglang_local_validate && python -m compileall python/sglang"'对于扩散相关的修改,优先进行更窄范围的首轮检查:
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /tmp/sglang_local_validate && python -m compileall python/sglang/jit_kernel/diffusion/triton python/sglang/multimodal_gen/runtime/layers"'- 针对变更区域运行定向测试。
bash
ssh h100_sglang 'docker exec sglang_bbuf env PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && pytest -q path/to/test.py -q"'对于扩散变更,先运行融合调制回归测试:
bash
ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && pytest -q python/sglang/jit_kernel/tests/test_qwen_image_modulation.py -q"'- 对于基于GPU的变更,明确指定空闲GPU。
bash
ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && pytest -q path/to/gpu_test.py -q"'- 对于内核密集型扩散工作,在尝试模型级运行前,先针对变更的原语运行定向冒烟脚本。
相关情况下至少覆盖以下内容:
rms_norm_fn- 下的
torch.compileRMSNorm norm_inferapply_rotary_embedding
通过管道传输脚本以进行纯内核冒烟测试。
docker exec -i ... python- 当调用或任何依赖
DiffGenerator.from_pretrained(..., local_mode=True)的流程时,使用包含multiprocessing.spawn的真实if __name__ == "__main__":文件。.py
如果脚本从标准输入或无保护的顶级代码执行,会失败。
multiprocessing.spawn- 仅在单元测试、内核测试或定向回归检查通过后,再尝试模型级或服务器级冒烟测试。
将检查点、依赖和环境故障与代码回归分开处理。如果工作流需要从Hugging Face Hub读取内容,请先验证,必要时在当前shell或命令中显式重新导出。
HF_TOKENTorch Compile Attribution
Torch Compile 归因分析
When a benchmark compares eager vs , do not stop at the speedup number.
Capture matching eager and compile traces or perf dumps, then run from the repo to explain where the gain came from.
torch.compilescripts/analyze_diffusion_torch_compile.py当基准测试对比eager模式与时,不要仅停留在加速数值上。捕获匹配的eager和compile跟踪或性能转储,然后运行仓库中的来解释性能提升的来源。
torch.compilescripts/analyze_diffusion_torch_compile.pyCleanup
清理
Remove temporary validation directories when finished.
bash
ssh h100_sglang 'docker exec sglang_bbuf rm -rf /tmp/sglang_local_validate /tmp/sglang_validate_h100'完成后删除临时验证目录。
bash
ssh h100_sglang 'docker exec sglang_bbuf rm -rf /tmp/sglang_local_validate /tmp/sglang_validate_h100'