exec-local-compile

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Compile TensorRT-LLM (Local / Compute Node)

编译TensorRT-LLM（本地/计算节点）

Compile TensorRT-LLM from source on a compute node inside a Docker container.

在Docker容器内的计算节点上从源码编译TensorRT-LLM。

When to Use

适用场景

Scenario	Use This Skill?
On a compute node with GPUs visible ( `nvidia-smi` works)	Yes
On a SLURM login node (no GPUs)	No — use `exec-slurm-compile` instead

场景	是否使用此方法？
在可识别GPU的计算节点上（ `nvidia-smi` 可正常运行）	是
在SLURM登录节点（无GPU）	否 — 请改用 `exec-slurm-compile`

Prerequisites

前置条件

You are inside a Docker/enroot container on a compute node
```
nvidia-smi
```
succeeds (GPUs visible)
```
/usr/local/tensorrt
```
exists (TensorRT installation in the container)

您处于计算节点上的Docker/enroot容器内
```
nvidia-smi
```
可正常运行（GPU可识别）
```
/usr/local/tensorrt
```
路径存在（容器内已安装TensorRT）

Instructions

操作步骤

Step 1: Verify Environment

步骤1：验证环境

Run

nvidia-smi

to confirm you are on a compute node with GPU access.

运行

nvidia-smi

确认您所在的计算节点具备GPU访问权限。

Step 2: Locate the Codebase

步骤2：定位代码库

cd

to the TensorRT-LLM repository. If the path is not provided by the user, ask for it.

使用

cd

命令进入TensorRT-LLM仓库目录。如果用户未提供路径，请询问用户。

Step 3: (Optional) Checkout Branch

步骤3：（可选）切换分支

If the user specifies a branch (e.g., "compile ToT"), checkout and pull:

bash

git checkout main && git pull

如果用户指定了分支（例如“编译ToT”），请切换并拉取代码：

bash

git checkout main && git pull

Step 4: Build

步骤4：构建

Run the build command (incremental by default — omit

-c

--clean

unless explicitly requested or the incremental build fails):

bash

./scripts/build_wheel.py --trt_root /usr/local/tensorrt --benchmarks --use_ccache -a "<arch>" -f --nvtx

Replace

<arch>

with the target GPU architecture (see Architecture Reference below). If not specified by the user, auto-detect from

nvidia-smi

运行构建命令（默认采用增量构建 — 除非用户明确要求或增量构建失败，否则请勿添加

-c

--clean

参数）：

bash

./scripts/build_wheel.py --trt_root /usr/local/tensorrt --benchmarks --use_ccache -a "<arch>" -f --nvtx

将

<arch>

替换为目标GPU架构（请参考下方架构参考）。如果用户未指定，通过

nvidia-smi

自动检测。

Step 5: Install

步骤5：安装

bash

pip install -e .[devel]

bash

pip install -e .[devel]

Step 6: Verify

步骤6：验证

bash

python3 -c "import tensorrt_llm; print(tensorrt_llm.__version__)"

bash

python3 -c "import tensorrt_llm; print(tensorrt_llm.__version__)"

Build Flags

构建参数

Flag	Description
`--trt_root /usr/local/tensorrt`	TensorRT installation path (standard in NVIDIA containers)
`--benchmarks`	Build the C++ benchmarks
`-a "<arch>"`	Target GPU architecture(s)
`--nvtx`	Enable NVTX markers for profiling
`--use_ccache`	Use ccache for faster recompilation
`-f` / `--fast_build`	Skip some kernels for faster dev compilation. Always use for dev builds.
`-c` / `--clean`	Clean build directory before building. Only when needed (see below).
`--skip_building_wheel`	Build in-place without creating a wheel file
`--no-venv`	Skip virtual environment creation

参数	描述
`--trt_root /usr/local/tensorrt`	TensorRT安装路径（NVIDIA容器中的标准路径）
`--benchmarks`	构建C++基准测试程序
`-a "<arch>"`	目标GPU架构（可多个）
`--nvtx`	启用NVTX标记以进行性能分析
`--use_ccache`	使用ccache加快重新编译速度
`-f` / `--fast_build`	跳过部分内核以加快开发构建速度。开发构建时请务必使用。
`-c` / `--clean`	构建前清理构建目录。仅在必要时使用（见下文）。
`--skip_building_wheel`	原地构建，不生成wheel文件
`--no-venv`	跳过虚拟环境创建

Architecture Reference

架构参考

Value	GPU Family
`"100-real"`	Blackwell (B200, GB200)
`"90-real"`	Hopper (H100, H200)
`"89-real"`	Ada Lovelace (L40S)
`"80-real"`	Ampere (A100)
`"90;100-real"`	Multiple architectures

值	GPU系列
`"100-real"`	Blackwell（B200、GB200）
`"90-real"`	Hopper（H100、H200）
`"89-real"`	Ada Lovelace（L40S）
`"80-real"`	Ampere（A100）
`"90;100-real"`	多架构

Incremental vs. Clean Builds

增量构建 vs 清理构建

Default to incremental builds — CMake only recompiles changed files, saving significant time.

Use a clean build (

-c

) only when:

The user explicitly requests a clean/fresh build
An incremental build fails with linker errors, stale object files, or CMake cache issues
Major branch changes (e.g., rebasing across many commits) that may invalidate the build cache
Build system files changed (
```
CMakeLists.txt
```
,
```
*.cmake
```
)

默认采用增量构建 — CMake仅重新编译修改过的文件，可大幅节省时间。

仅在以下情况使用清理构建（

-c

）：

用户明确要求清理/全新构建
增量构建因链接错误、陈旧目标文件或CMake缓存问题而失败
发生重大分支变更（例如在多个提交上变基）可能导致构建缓存失效
构建系统文件被修改（
```
CMakeLists.txt
```
、
```
*.cmake
```
）