exec-local-compile
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseCompile TensorRT-LLM (Local / Compute Node)
编译TensorRT-LLM(本地/计算节点)
Compile TensorRT-LLM from source on a compute node inside a Docker container.
在Docker容器内的计算节点上从源码编译TensorRT-LLM。
When to Use
适用场景
| Scenario | Use This Skill? |
|---|---|
On a compute node with GPUs visible ( | Yes |
| On a SLURM login node (no GPUs) | No — use |
| 场景 | 是否使用此方法? |
|---|---|
在可识别GPU的计算节点上( | 是 |
| 在SLURM登录节点(无GPU) | 否 — 请改用 |
Prerequisites
前置条件
- You are inside a Docker/enroot container on a compute node
- succeeds (GPUs visible)
nvidia-smi - exists (TensorRT installation in the container)
/usr/local/tensorrt
- 您处于计算节点上的Docker/enroot容器内
- 可正常运行(GPU可识别)
nvidia-smi - 路径存在(容器内已安装TensorRT)
/usr/local/tensorrt
Instructions
操作步骤
Step 1: Verify Environment
步骤1:验证环境
Run to confirm you are on a compute node with GPU access.
nvidia-smi运行确认您所在的计算节点具备GPU访问权限。
nvidia-smiStep 2: Locate the Codebase
步骤2:定位代码库
cd使用命令进入TensorRT-LLM仓库目录。如果用户未提供路径,请询问用户。
cdStep 3: (Optional) Checkout Branch
步骤3:(可选)切换分支
If the user specifies a branch (e.g., "compile ToT"), checkout and pull:
bash
git checkout main && git pull如果用户指定了分支(例如“编译ToT”),请切换并拉取代码:
bash
git checkout main && git pullStep 4: Build
步骤4:构建
Run the build command (incremental by default — omit / unless explicitly requested or the incremental build fails):
-c--cleanbash
./scripts/build_wheel.py --trt_root /usr/local/tensorrt --benchmarks --use_ccache -a "<arch>" -f --nvtxReplace with the target GPU architecture (see Architecture Reference below). If not specified by the user, auto-detect from .
<arch>nvidia-smi运行构建命令(默认采用增量构建 — 除非用户明确要求或增量构建失败,否则请勿添加/参数):
-c--cleanbash
./scripts/build_wheel.py --trt_root /usr/local/tensorrt --benchmarks --use_ccache -a "<arch>" -f --nvtx将替换为目标GPU架构(请参考下方架构参考)。如果用户未指定,通过自动检测。
<arch>nvidia-smiStep 5: Install
步骤5:安装
bash
pip install -e .[devel]bash
pip install -e .[devel]Step 6: Verify
步骤6:验证
bash
python3 -c "import tensorrt_llm; print(tensorrt_llm.__version__)"bash
python3 -c "import tensorrt_llm; print(tensorrt_llm.__version__)"Build Flags
构建参数
| Flag | Description |
|---|---|
| TensorRT installation path (standard in NVIDIA containers) |
| Build the C++ benchmarks |
| Target GPU architecture(s) |
| Enable NVTX markers for profiling |
| Use ccache for faster recompilation |
| Skip some kernels for faster dev compilation. Always use for dev builds. |
| Clean build directory before building. Only when needed (see below). |
| Build in-place without creating a wheel file |
| Skip virtual environment creation |
| 参数 | 描述 |
|---|---|
| TensorRT安装路径(NVIDIA容器中的标准路径) |
| 构建C++基准测试程序 |
| 目标GPU架构(可多个) |
| 启用NVTX标记以进行性能分析 |
| 使用ccache加快重新编译速度 |
| 跳过部分内核以加快开发构建速度。开发构建时请务必使用。 |
| 构建前清理构建目录。仅在必要时使用(见下文)。 |
| 原地构建,不生成wheel文件 |
| 跳过虚拟环境创建 |
Architecture Reference
架构参考
| Value | GPU Family |
|---|---|
| Blackwell (B200, GB200) |
| Hopper (H100, H200) |
| Ada Lovelace (L40S) |
| Ampere (A100) |
| Multiple architectures |
| 值 | GPU系列 |
|---|---|
| Blackwell(B200、GB200) |
| Hopper(H100、H200) |
| Ada Lovelace(L40S) |
| Ampere(A100) |
| 多架构 |
Incremental vs. Clean Builds
增量构建 vs 清理构建
Default to incremental builds — CMake only recompiles changed files, saving significant time.
Use a clean build () only when:
-c- The user explicitly requests a clean/fresh build
- An incremental build fails with linker errors, stale object files, or CMake cache issues
- Major branch changes (e.g., rebasing across many commits) that may invalidate the build cache
- Build system files changed (,
CMakeLists.txt)*.cmake
默认采用增量构建 — CMake仅重新编译修改过的文件,可大幅节省时间。
仅在以下情况使用清理构建():
-c- 用户明确要求清理/全新构建
- 增量构建因链接错误、陈旧目标文件或CMake缓存问题而失败
- 发生重大分支变更(例如在多个提交上变基)可能导致构建缓存失效
- 构建系统文件被修改(、
CMakeLists.txt)*.cmake