exec-local-compile

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Compile TensorRT-LLM (Local / Compute Node)

编译TensorRT-LLM(本地/计算节点)

Compile TensorRT-LLM from source on a compute node inside a Docker container.
在Docker容器内的计算节点上从源码编译TensorRT-LLM。

When to Use

适用场景

ScenarioUse This Skill?
On a compute node with GPUs visible (
nvidia-smi
works)
Yes
On a SLURM login node (no GPUs)No — use
exec-slurm-compile
instead
场景是否使用此方法?
在可识别GPU的计算节点上(
nvidia-smi
可正常运行)
在SLURM登录节点(无GPU)否 — 请改用
exec-slurm-compile

Prerequisites

前置条件

  • You are inside a Docker/enroot container on a compute node
  • nvidia-smi
    succeeds (GPUs visible)
  • /usr/local/tensorrt
    exists (TensorRT installation in the container)
  • 您处于计算节点上的Docker/enroot容器内
  • nvidia-smi
    可正常运行(GPU可识别)
  • /usr/local/tensorrt
    路径存在(容器内已安装TensorRT)

Instructions

操作步骤

Step 1: Verify Environment

步骤1:验证环境

Run
nvidia-smi
to confirm you are on a compute node with GPU access.
运行
nvidia-smi
确认您所在的计算节点具备GPU访问权限。

Step 2: Locate the Codebase

步骤2:定位代码库

cd
to the TensorRT-LLM repository. If the path is not provided by the user, ask for it.
使用
cd
命令进入TensorRT-LLM仓库目录。如果用户未提供路径,请询问用户。

Step 3: (Optional) Checkout Branch

步骤3:(可选)切换分支

If the user specifies a branch (e.g., "compile ToT"), checkout and pull:
bash
git checkout main && git pull
如果用户指定了分支(例如“编译ToT”),请切换并拉取代码:
bash
git checkout main && git pull

Step 4: Build

步骤4:构建

Run the build command (incremental by default — omit
-c
/
--clean
unless explicitly requested or the incremental build fails):
bash
./scripts/build_wheel.py --trt_root /usr/local/tensorrt --benchmarks --use_ccache -a "<arch>" -f --nvtx
Replace
<arch>
with the target GPU architecture (see Architecture Reference below). If not specified by the user, auto-detect from
nvidia-smi
.
运行构建命令(默认采用增量构建 — 除非用户明确要求或增量构建失败,否则请勿添加
-c
/
--clean
参数):
bash
./scripts/build_wheel.py --trt_root /usr/local/tensorrt --benchmarks --use_ccache -a "<arch>" -f --nvtx
<arch>
替换为目标GPU架构(请参考下方架构参考)。如果用户未指定,通过
nvidia-smi
自动检测。

Step 5: Install

步骤5:安装

bash
pip install -e .[devel]
bash
pip install -e .[devel]

Step 6: Verify

步骤6:验证

bash
python3 -c "import tensorrt_llm; print(tensorrt_llm.__version__)"
bash
python3 -c "import tensorrt_llm; print(tensorrt_llm.__version__)"

Build Flags

构建参数

FlagDescription
--trt_root /usr/local/tensorrt
TensorRT installation path (standard in NVIDIA containers)
--benchmarks
Build the C++ benchmarks
-a "<arch>"
Target GPU architecture(s)
--nvtx
Enable NVTX markers for profiling
--use_ccache
Use ccache for faster recompilation
-f
/
--fast_build
Skip some kernels for faster dev compilation. Always use for dev builds.
-c
/
--clean
Clean build directory before building. Only when needed (see below).
--skip_building_wheel
Build in-place without creating a wheel file
--no-venv
Skip virtual environment creation
参数描述
--trt_root /usr/local/tensorrt
TensorRT安装路径(NVIDIA容器中的标准路径)
--benchmarks
构建C++基准测试程序
-a "<arch>"
目标GPU架构(可多个)
--nvtx
启用NVTX标记以进行性能分析
--use_ccache
使用ccache加快重新编译速度
-f
/
--fast_build
跳过部分内核以加快开发构建速度。开发构建时请务必使用。
-c
/
--clean
构建前清理构建目录。仅在必要时使用(见下文)。
--skip_building_wheel
原地构建,不生成wheel文件
--no-venv
跳过虚拟环境创建

Architecture Reference

架构参考

ValueGPU Family
"100-real"
Blackwell (B200, GB200)
"90-real"
Hopper (H100, H200)
"89-real"
Ada Lovelace (L40S)
"80-real"
Ampere (A100)
"90;100-real"
Multiple architectures
GPU系列
"100-real"
Blackwell(B200、GB200)
"90-real"
Hopper(H100、H200)
"89-real"
Ada Lovelace(L40S)
"80-real"
Ampere(A100)
"90;100-real"
多架构

Incremental vs. Clean Builds

增量构建 vs 清理构建

Default to incremental builds — CMake only recompiles changed files, saving significant time.
Use a clean build (
-c
) only when:
  • The user explicitly requests a clean/fresh build
  • An incremental build fails with linker errors, stale object files, or CMake cache issues
  • Major branch changes (e.g., rebasing across many commits) that may invalidate the build cache
  • Build system files changed (
    CMakeLists.txt
    ,
    *.cmake
    )
默认采用增量构建 — CMake仅重新编译修改过的文件,可大幅节省时间。
仅在以下情况使用清理构建
-c
):
  • 用户明确要求清理/全新构建
  • 增量构建因链接错误、陈旧目标文件或CMake缓存问题而失败
  • 发生重大分支变更(例如在多个提交上变基)可能导致构建缓存失效
  • 构建系统文件被修改(
    CMakeLists.txt
    *.cmake