h100

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

H100

H100

Overview

概述

Use this skill to do SGLang development on the H100 box through
h100_sglang
. The default container is
sglang_bbuf
and the repo lives at
/sgl-workspace/sglang
. Prefer it whenever local validation is insufficient for CUDA, Triton, diffusion pipelines, or other GPU-backed SGLang behavior.
This environment is already prepared:
  • sglang_bbuf
    is running on
    lmsysorg/sglang:dev
  • the repo is cloned at
    /sgl-workspace/sglang
  • editable installs for
    python[all]
    and
    python[diffusion]
    are already done
  • /root/.cache
    is mounted as the cache path
  • Infiniband paths are mounted into the container for RDMA-aware workflows:
    /sys/class/infiniband
    ,
    /dev/infiniband
    , and
    /usr/sbin/show_gids
Hugging Face cache is already mounted, but do not assume
HF_TOKEN
is visible in every
docker exec
context. Interactive shells and non-interactive
docker exec ... bash -lc "<cmd>"
can behave differently. Always verify with
echo ${HF_TOKEN:+set}
before gated-model or Hub-backed runs.
使用此技能通过
h100_sglang
在H100设备上进行SGLang开发。默认容器为
sglang_bbuf
,代码仓库位于
/sgl-workspace/sglang
。当本地验证不足以覆盖CUDA、Triton、扩散流水线或其他基于GPU的SGLang行为时,优先使用此环境。
此环境已预先配置完成:
  • sglang_bbuf
    基于
    lmsysorg/sglang:dev
    镜像运行
  • 代码仓库已克隆至
    /sgl-workspace/sglang
  • 已完成
    python[all]
    python[diffusion]
    的可编辑安装
  • /root/.cache
    已挂载为缓存路径
  • Infiniband路径已挂载到容器中,支持RDMA相关工作流:
    /sys/class/infiniband
    /dev/infiniband
    /usr/sbin/show_gids
Hugging Face缓存已挂载,但不要假设
HF_TOKEN
在所有
docker exec
环境中都可见。交互式shell和非交互式
docker exec... bash -lc "<cmd>"
的行为可能不同。在运行 gated-model 或基于Hub的任务前,务必通过
echo ${HF_TOKEN:+set}
验证。

Quick Start

快速开始

  1. Check the host, container, and GPU state.
bash
ssh h100_sglang 'hostname && whoami'
ssh h100_sglang 'docker ps --format "table {{.Names}}\t{{.Status}}" | sed -n "1,20p"'
ssh h100_sglang 'nvidia-smi --query-gpu=index,name,utilization.gpu,memory.used,memory.total --format=csv,noheader,nounits'
  1. Enter the container and repo.
bash
ssh h100_sglang 'docker exec -it sglang_bbuf /bin/zsh'
cd /sgl-workspace/sglang
echo ${HF_TOKEN:+set}
If
HF_TOKEN
is unexpectedly missing in the current shell, export it manually before Hub-backed workflows:
bash
export HF_TOKEN=<your-hf-token>
export HUGGINGFACE_HUB_TOKEN="$HF_TOKEN"
For non-interactive
docker exec ... bash -lc "<cmd>"
runs, prefer exporting both variables inside the command itself instead of assuming the shell startup path will populate them.
  1. Pick a free GPU.
Use a GPU with
0
utilization and only a few MiB allocated. Set
CUDA_VISIBLE_DEVICES=<gpu_id>
for every GPU-backed validation command.
  1. This host currently does not provide the
    kill-idle
    helper.
Do not assume you can reclaim other users' idle allocations automatically. If the free GPU list is tight, re-check
nvidia-smi
, choose another GPU, or coordinate before proceeding.
  1. If the container is not running, start it first.
bash
ssh h100_sglang 'docker start sglang_bbuf'
  1. 检查主机、容器和GPU状态。
bash
ssh h100_sglang 'hostname && whoami'
ssh h100_sglang 'docker ps --format "table {{.Names}}\t{{.Status}}" | sed -n "1,20p"'
ssh h100_sglang 'nvidia-smi --query-gpu=index,name,utilization.gpu,memory.used,memory.total --format=csv,noheader,nounits'
  1. 进入容器和代码仓库。
bash
ssh h100_sglang 'docker exec -it sglang_bbuf /bin/zsh'
cd /sgl-workspace/sglang
echo ${HF_TOKEN:+set}
如果当前shell中
HF_TOKEN
意外缺失,请在基于Hub的工作流前手动导出:
bash
export HF_TOKEN=<your-hf-token>
export HUGGINGFACE_HUB_TOKEN="$HF_TOKEN"
对于非交互式
docker exec... bash -lc "<cmd>"
运行,建议在命令内部同时导出两个变量,不要依赖shell启动路径自动填充。
  1. 选择空闲GPU。
选择利用率为
0
且仅占用少量MiB内存的GPU。在所有基于GPU的验证命令中设置
CUDA_VISIBLE_DEVICES=<gpu_id>
  1. 当前主机未提供
    kill-idle
    辅助工具。
不要假设可以自动回收其他用户的空闲资源。如果空闲GPU数量紧张,请重新检查
nvidia-smi
、选择其他GPU或提前协调后再操作。
  1. 如果容器未运行,请先启动它。
bash
ssh h100_sglang 'docker start sglang_bbuf'

Safe Remote Workflow

安全远程工作流

  1. Inspect the default repo before editing it.
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /sgl-workspace/sglang && git branch --show-current && git status --short"'
  1. Fast-forward
    <your-repo-path>
    to the latest clean
    main
    before creating any validation worktree.
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /sgl-workspace/sglang && git fetch origin && git checkout main && git pull --ff-only origin main"'
  1. Avoid writing directly into
    <your-repo-path>
    when it is dirty or when the local snapshot differs from the remote
    HEAD
    .
  2. Prefer one of these isolation strategies.
Create a detached worktree for remote-only experiments:
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /sgl-workspace/sglang && git worktree add --detach /tmp/sglang_validate_h100 HEAD"'
Stream the exact local working tree into the container when validating the current local snapshot:
bash
COPYFILE_DISABLE=1 tar --exclude=.git -cf - . | \
ssh <your-h100-host> 'docker exec -i <your-container> sh -lc "rm -rf /tmp/sglang_local_validate && mkdir -p /tmp/sglang_local_validate && tar -xf - -C /tmp/sglang_local_validate"'
ssh h100_sglang 'docker exec -i sglang_bbuf sh -lc "rm -rf /tmp/sglang_local_validate && mkdir -p /tmp/sglang_local_validate && tar -xf - -C /tmp/sglang_local_validate"'
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "find /tmp/sglang_local_validate -name '\''._*'\'' -delete"'
Use the streamed copy when the goal is "validate exactly what is in the local repo right now". For patch-oriented remote validation, another good option is:
  • update remote
    main
  • create a detached worktree from that clean commit
  • stream or apply a focused local patch diff into the worktree only
That keeps
<your-repo-path>
clean while still validating the exact local delta.
  1. 在编辑前检查默认仓库状态。
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /sgl-workspace/sglang && git branch --show-current && git status --short"'
  1. 在创建任何验证工作树前,将
    <your-repo-path>
    快进至最新的干净
    main
    分支。
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /sgl-workspace/sglang && git fetch origin && git checkout main && git pull --ff-only origin main"'
  1. 当仓库处于脏状态或本地快照与远程
    HEAD
    不同时,避免直接写入
    <your-repo-path>
  2. 优先选择以下隔离策略之一。
创建独立工作树用于仅远程实验:
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /sgl-workspace/sglang && git worktree add --detach /tmp/sglang_validate_h100 HEAD"'
当需要验证当前本地快照时,将本地工作树直接流式传输到容器中:
bash
COPYFILE_DISABLE=1 tar --exclude=.git -cf - . | \
ssh <your-h100-host> 'docker exec -i <your-container> sh -lc "rm -rf /tmp/sglang_local_validate && mkdir -p /tmp/sglang_local_validate && tar -xf - -C /tmp/sglang_local_validate"'
ssh h100_sglang 'docker exec -i sglang_bbuf sh -lc "rm -rf /tmp/sglang_local_validate && mkdir -p /tmp/sglang_local_validate && tar -xf - -C /tmp/sglang_local_validate"'
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "find /tmp/sglang_local_validate -name '\''._*'\'' -delete"'
当目标是“验证本地仓库当前的准确内容”时,使用流式传输副本。对于基于补丁的远程验证,另一个不错的选择是:
  • 更新远程
    main
    分支
  • 基于该干净提交创建独立工作树
  • 将本地补丁差异流式传输或仅应用到工作树中
这样既保持
<your-repo-path>
干净,又能验证本地的精确变更。

Validation Workflow

验证工作流

  1. Start with import or syntax-level checks.
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /tmp/sglang_local_validate && python -m compileall python/sglang"'
For diffusion-specific edits, prefer a narrower first pass:
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /tmp/sglang_local_validate && python -m compileall python/sglang/jit_kernel/diffusion/triton python/sglang/multimodal_gen/runtime/layers"'
  1. Run targeted tests for the changed area.
bash
ssh h100_sglang 'docker exec sglang_bbuf env PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && pytest -q path/to/test.py -q"'
For diffusion changes, start with the fused modulation regression:
bash
ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && pytest -q python/sglang/jit_kernel/tests/test_qwen_image_modulation.py -q"'
  1. For GPU-backed changes, pin a free GPU explicitly.
bash
ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && pytest -q path/to/gpu_test.py -q"'
  1. For kernel-heavy diffusion work, run a targeted smoke script for the changed primitives before attempting a model-level run.
Cover at least these when relevant:
  • rms_norm_fn
  • RMSNorm
    under
    torch.compile
  • norm_infer
  • apply_rotary_embedding
Pipe the script through
docker exec -i ... python
for pure kernel smoke.
  1. Use a real
    .py
    file with
    if __name__ == "__main__":
    when calling
    DiffGenerator.from_pretrained(..., local_mode=True)
    or any flow that relies on
    multiprocessing.spawn
    .
multiprocessing.spawn
will fail if the script is executed from stdin or from unguarded top-level code.
  1. Attempt model-level or server-level smoke only after unit, kernel, or targeted regression checks pass.
Treat checkpoint, dependency, and environment failures separately from code regressions. If a workflow reads from Hugging Face Hub, verify
HF_TOKEN
first and re-export it explicitly in the current shell or command when needed.
  1. 先进行导入或语法级检查。
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /tmp/sglang_local_validate && python -m compileall python/sglang"'
对于扩散相关的修改,优先进行更窄范围的首轮检查:
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /tmp/sglang_local_validate && python -m compileall python/sglang/jit_kernel/diffusion/triton python/sglang/multimodal_gen/runtime/layers"'
  1. 针对变更区域运行定向测试。
bash
ssh h100_sglang 'docker exec sglang_bbuf env PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && pytest -q path/to/test.py -q"'
对于扩散变更,先运行融合调制回归测试:
bash
ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && pytest -q python/sglang/jit_kernel/tests/test_qwen_image_modulation.py -q"'
  1. 对于基于GPU的变更,明确指定空闲GPU。
bash
ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && pytest -q path/to/gpu_test.py -q"'
  1. 对于内核密集型扩散工作,在尝试模型级运行前,先针对变更的原语运行定向冒烟脚本。
相关情况下至少覆盖以下内容:
  • rms_norm_fn
  • torch.compile
    下的
    RMSNorm
  • norm_infer
  • apply_rotary_embedding
通过
docker exec -i ... python
管道传输脚本以进行纯内核冒烟测试。
  1. 当调用
    DiffGenerator.from_pretrained(..., local_mode=True)
    或任何依赖
    multiprocessing.spawn
    的流程时,使用包含
    if __name__ == "__main__":
    的真实
    .py
    文件。
如果脚本从标准输入或无保护的顶级代码执行,
multiprocessing.spawn
会失败。
  1. 仅在单元测试、内核测试或定向回归检查通过后,再尝试模型级或服务器级冒烟测试。
将检查点、依赖和环境故障与代码回归分开处理。如果工作流需要从Hugging Face Hub读取内容,请先验证
HF_TOKEN
,必要时在当前shell或命令中显式重新导出。

Torch Compile Attribution

Torch Compile 归因分析

When a benchmark compares eager vs
torch.compile
, do not stop at the speedup number. Capture matching eager and compile traces or perf dumps, then run
scripts/analyze_diffusion_torch_compile.py
from the repo to explain where the gain came from.
当基准测试对比eager模式与
torch.compile
时,不要仅停留在加速数值上。捕获匹配的eager和compile跟踪或性能转储,然后运行仓库中的
scripts/analyze_diffusion_torch_compile.py
来解释性能提升的来源。

Cleanup

清理

Remove temporary validation directories when finished.
bash
ssh h100_sglang 'docker exec sglang_bbuf rm -rf /tmp/sglang_local_validate /tmp/sglang_validate_h100'
完成后删除临时验证目录。
bash
ssh h100_sglang 'docker exec sglang_bbuf rm -rf /tmp/sglang_local_validate /tmp/sglang_validate_h100'