trtllm-flashinfer-upgrade

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

FlashInfer Version Upgrade Skill

FlashInfer版本升级技能

Automates upgrading the
flashinfer-python
package version across TensorRT-LLM.
自动完成TensorRT-LLM中
flashinfer-python
包版本的全仓库升级。

When to Use

使用场景

  • User asks to upgrade / bump / update flashinfer
  • Routine dependency update duty for flashinfer-python
  • 用户要求升级/更新flashinfer
  • 定期执行flashinfer-python的依赖更新任务

Prerequisites

前置条件

Step 0a: Determine GitHub Username

步骤0a:确定GitHub用户名

Query
gh
for the authenticated user's login:
bash
GITHUB_USERNAME=$(gh api user --jq .login)
echo "$GITHUB_USERNAME"
If this fails,
gh
is not authenticated — resolve Step 0c first, then retry. As a fallback, derive the username from the fork remote:
bash
GITHUB_USERNAME=$(git remote -v | grep -E 'github\.com/[^/]+/TensorRT-LLM' \
  | head -1 | sed -E 's|.*github\.com[:/]([^/]+)/TensorRT-LLM.*|\1|')
If neither works, ask the user via
AskUserQuestion
.
查询
gh
获取已认证用户的登录名:
bash
GITHUB_USERNAME=$(gh api user --jq .login)
echo "$GITHUB_USERNAME"
如果此命令失败,说明
gh
未认证——先解决步骤0c,然后重试。 作为备选方案,从fork远程仓库推导用户名:
bash
GITHUB_USERNAME=$(git remote -v | grep -E 'github\.com/[^/]+/TensorRT-LLM' \
  | head -1 | sed -E 's|.*github\.com[:/]([^/]+)/TensorRT-LLM.*|\1|')
如果两种方法都无效,通过
AskUserQuestion
询问用户。

Step 0b: Verify Fork Remote

步骤0b:验证Fork远程仓库

Check that a git remote pointing to the user's fork of TensorRT-LLM exists:
bash
git remote -v | grep -E 'github\.com/${GITHUB_USERNAME}/TensorRT-LLM'
If no fork remote is found, stop and notify the user:
No GitHub fork remote detected. A fork of
NVIDIA/TensorRT-LLM
is required to push branches and create PRs.
  1. Fork the repo at https://github.com/NVIDIA/TensorRT-LLM/fork
  2. Add it as a git remote:
    bash
    git remote add fork https://github.com/<GITHUB_USERNAME>/TensorRT-LLM.git
  3. Re-run this skill.
检查是否存在指向用户TensorRT-LLM fork的git远程仓库:
bash
git remote -v | grep -E 'github\.com/${GITHUB_USERNAME}/TensorRT-LLM'
如果未找到fork远程仓库,停止操作并通知用户:
未检测到GitHub fork远程仓库。需要
NVIDIA/TensorRT-LLM
的fork才能推送分支并创建PR。
  1. https://github.com/NVIDIA/TensorRT-LLM/fork处fork仓库
  2. 将其添加为git远程仓库:
    bash
    git remote add fork https://github.com/<GITHUB_USERNAME>/TensorRT-LLM.git
  3. 重新运行此技能。

Step 0c: Verify
gh
CLI Is Authenticated

步骤0c:验证
gh
CLI已认证

This skill uses the GitHub CLI (
gh
) to push branches and open PRs. Confirm it is installed and authenticated:
bash
gh auth status
Expected:
Logged in to github.com
with at least the
repo
scope.
repo
covers pushing to the user's fork and opening PRs on
NVIDIA/TensorRT-LLM
, so no separate fine-grained PATs are needed.
If
gh
reports "not logged in", instruct the user:
bash
gh auth login
Choose: GitHub.com → HTTPS → authenticate with a web browser (or paste a PAT with
repo
scope).
Note on
GH_CONFIG_DIR
:
If the user keeps multiple
gh
accounts (e.g. a personal account and a separate account for
NVIDIA/TensorRT-LLM
work), they may point
gh
at a non-default config directory. Check
CLAUDE.local.md
/
AGENTS.md
or the environment for
GH_CONFIG_DIR
; if unclear, ask the user. When set, prefix every
gh
invocation:
GH_CONFIG_DIR=<path> gh ...
.
Do not proceed with the upgrade workflow until
gh auth status
is clean and the fork remote (Step 0b) is confirmed.
此技能使用GitHub CLI (
gh
)推送分支并创建PR。确认其已安装并完成认证:
bash
gh auth status
预期结果:
Logged in to github.com
且至少拥有
repo
权限。
repo
权限涵盖向用户fork推送代码以及在
NVIDIA/TensorRT-LLM
上创建PR,因此无需单独的细粒度PAT。
如果
gh
显示“未登录”,指导用户执行:
bash
gh auth login
选择:GitHub.com → HTTPS → 通过浏览器认证(或粘贴拥有
repo
权限的PAT)。
关于
GH_CONFIG_DIR
的说明
:如果用户使用多个
gh
账户(例如个人账户和用于
NVIDIA/TensorRT-LLM
工作的独立账户),他们可能会将
gh
指向非默认配置目录。查看
CLAUDE.local.md
/
AGENTS.md
或环境变量中的
GH_CONFIG_DIR
;如果不确定,询问用户。设置后,在每个
gh
命令前添加前缀:
GH_CONFIG_DIR=<path> gh ...
gh auth status
显示正常且步骤0b中的fork远程仓库确认存在之前,不要继续升级流程。

Workflow

工作流程

Execute these steps in order. Use
AskUserQuestion
for user choices and
WebFetch
/ GitHub API for release data.
顺序执行以下步骤。使用
AskUserQuestion
获取用户选择,使用
WebFetch
/GitHub API获取版本发布数据。

Step 1: Fetch Available Releases from GitHub

步骤1:从GitHub获取可用版本

Fetch the release list from
https://github.com/flashinfer-ai/flashinfer/releases
.
Use
WebFetch
with the URL
https://github.com/flashinfer-ai/flashinfer/releases
and extract all release tag names and dates. Collect both stable releases (e.g.,
v0.6.7
) and pre-release / nightly tags (e.g.,
v0.7.0.dev20260401
).
Alternatively, use the GitHub API via curl:
bash
curl -s "https://api.github.com/repos/flashinfer-ai/flashinfer/releases?per_page=30" \
  | python3 -c "
import json, sys
releases = json.load(sys.stdin)
for r in releases:
    tag = r['tag_name']
    pre = ' (pre-release)' if r['prerelease'] else ' (stable)'
    date = r['published_at'][:10]
    print(f'{tag}  {date}{pre}')
"
https://github.com/flashinfer-ai/flashinfer/releases
获取版本列表。
使用
WebFetch
访问URL
https://github.com/flashinfer-ai/flashinfer/releases
,提取所有版本标签名称和日期。收集稳定版本(例如
v0.6.7
)和预发布/夜间版本(例如
v0.7.0.dev20260401
)。
或者,通过curl调用GitHub API:
bash
curl -s "https://api.github.com/repos/flashinfer-ai/flashinfer/releases?per_page=30" \
  | python3 -c "
import json, sys
releases = json.load(sys.stdin)
for r in releases:
    tag = r['tag_name']
    pre = ' (pre-release)' if r['prerelease'] else ' (stable)'
    date = r['published_at'][:10]
    print(f'{tag}  {date}{pre}')
"

Step 2: Check Current Version

步骤2:检查当前版本

Read the current pinned version from
requirements.txt
:
bash
grep flashinfer-python requirements.txt
Expected format:
flashinfer-python==X.Y.Z
requirements.txt
读取当前固定版本:
bash
grep flashinfer-python requirements.txt
预期格式:
flashinfer-python==X.Y.Z

Step 3: Ask User Preferences

步骤3:询问用户偏好

Ask the user three questions using
AskUserQuestion
:
  1. "Prefer a latest nightly release version?"
    • Options: "Yes, show nightly/dev releases" | "No, stable releases only (Recommended)"
    • This filters the release list shown in the next question.
  2. "Which flashinfer-python version do you want to upgrade to?"
    • Present up to 4 versions newer than the current version (filtered by the nightly preference above), with the latest as the recommended option.
    • If the current version is already the latest, inform the user and stop.
  3. "Also update
    security_scanning/poetry.lock
    ?"
    • Options: "No, skip the lockfile (Recommended)" | "Yes, update version + hashes"
    • Default: No. The lockfile is typically regenerated by maintainers separately; editing it here can produce spurious hash diffs and stale
      metadata.content-hash
      values.
    • If the user answers Yes, follow the "Updating
      security_scanning/poetry.lock
      hashes" subsection below; otherwise skip it entirely (do not touch
      security_scanning/poetry.lock
      ).
使用
AskUserQuestion
向用户询问三个问题
  1. “是否偏好最新的夜间版本?”
    • 选项:“是,显示夜间/开发版本” | “否,仅显示稳定版本(推荐)”
    • 此选项将过滤下一个问题中显示的版本列表。
  2. “你想要将flashinfer-python升级到哪个版本?”
    • 展示最多4个比当前版本新的版本(根据上述夜间版本偏好过滤),并将最新版本设为推荐选项。
    • 如果当前版本已是最新版本,通知用户并停止流程。
  3. “是否同时更新
    security_scanning/poetry.lock
    ?”
    • 选项:“否,跳过锁文件(推荐)” | “是,更新版本和哈希值”
    • 默认:。锁文件通常由维护者单独重新生成;在此处编辑可能会产生虚假的哈希差异和过期的
      metadata.content-hash
      值。
    • 如果用户回答,请遵循下面的“更新
      security_scanning/poetry.lock
      哈希值”小节;否则完全跳过(不要修改
      security_scanning/poetry.lock
      )。

Step 4: Update All Version References

步骤4:更新所有版本引用

After the user selects a target version, update these files:
FileWhat to changeAlways
requirements.txt
flashinfer-python==OLD
flashinfer-python==NEW
Yes
security_scanning/pyproject.toml
"flashinfer-python (==OLD)"
"flashinfer-python (==NEW)"
Yes
ATTRIBUTIONS-Python.md
## flashinfer-python (OLD)
## flashinfer-python (NEW)
Yes
security_scanning/poetry.lock
Update
version = "OLD"
version = "NEW"
under
[[package]] name = "flashinfer-python"
, and update the
files
list with new hashes
Only if user opted in at Step 3 question 3
用户选择目标版本后,更新以下文件:
文件修改内容是否必须
requirements.txt
flashinfer-python==OLD
flashinfer-python==NEW
security_scanning/pyproject.toml
"flashinfer-python (==OLD)"
"flashinfer-python (==NEW)"
ATTRIBUTIONS-Python.md
## flashinfer-python (OLD)
## flashinfer-python (NEW)
security_scanning/poetry.lock
更新
[[package]] name = "flashinfer-python"
下的
version = "OLD"
version = "NEW"
,并使用新哈希值更新
files
列表
仅当用户在步骤3的问题3中选择是时

Updating
security_scanning/poetry.lock
hashes

更新
security_scanning/poetry.lock
哈希值

Only perform this subsection if the user answered Yes to question 3 in Step 3. Otherwise skip it entirely.
The poetry.lock file contains SHA256 hashes for the wheel and sdist. Fetch them from PyPI:
bash
curl -s "https://pypi.org/pypi/flashinfer-python/NEW_VERSION/json" \
  | python3 -c "
import json, sys
data = json.load(sys.stdin)
for f in data['urls']:
    print(f'{f[\"filename\"]}  sha256:{f[\"digests\"][\"sha256\"]}')
"
Replace the old
files = [...]
block under
[[package]] name = "flashinfer-python"
with the new filenames and hashes. Also update the
[package.dependencies]
section if the new version has different dependencies (check PyPI JSON
requires_dist
).
Important: After manually editing both
security_scanning/pyproject.toml
and
security_scanning/poetry.lock
, the lockfile's
metadata.content-hash
becomes stale. Regenerate it by running:
bash
cd security_scanning && poetry lock --no-update && cd ..
This refreshes the hash without changing any other package versions. If
poetry
is available, you can alternatively use
poetry add flashinfer-python@NEW_VERSION
in the
security_scanning/
directory to update both
pyproject.toml
and
poetry.lock
automatically (including the content-hash).
仅当用户在步骤3的问题3中回答时执行此小节。否则完全跳过。
poetry.lock文件包含wheel和sdist的SHA256哈希值。从PyPI获取这些值:
bash
curl -s "https://pypi.org/pypi/flashinfer-python/NEW_VERSION/json" \
  | python3 -c "
import json, sys
data = json.load(sys.stdin)
for f in data['urls']:
    print(f'{f[\"filename\"]}  sha256:{f[\"digests\"][\"sha256\"]}')
"
[[package]] name = "flashinfer-python"
下旧的
files = [...]
块替换为新的文件名和哈希值。如果新版本有不同的依赖项,请同时更新
[package.dependencies]
部分(查看PyPI JSON中的
requires_dist
)。
重要提示:手动编辑
security_scanning/pyproject.toml
security_scanning/poetry.lock
后,锁文件的
metadata.content-hash
会过期。通过运行以下命令重新生成:
bash
cd security_scanning && poetry lock --no-update && cd ..
此命令会刷新哈希值而不更改其他包版本。如果
poetry
可用,也可以在
security_scanning/
目录中使用
poetry add flashinfer-python@NEW_VERSION
自动更新
pyproject.toml
poetry.lock
(包括content-hash)。

Nightly / dev version special handling

夜间/开发版本特殊处理

If the user selects a nightly/dev version (e.g.,
0.7.0.dev20260401
):
  • The PyPI package may not exist — check first with
    curl -s "https://pypi.org/pypi/flashinfer-python/VERSION/json"
    .
  • If not on PyPI, the
    security_scanning/poetry.lock
    hashes cannot be updated. Warn the user and leave a
    # TODO: update hashes when published to PyPI
    comment.
  • The
    requirements.txt
    can pin to a git install instead:
    flashinfer-python @ git+https://github.com/flashinfer-ai/flashinfer.git@TAG#egg=flashinfer-python
    Ask the user which approach they prefer (PyPI pin vs git pin).
如果用户选择夜间/开发版本(例如
0.7.0.dev20260401
):
  • PyPI上可能不存在该包——先通过
    curl -s "https://pypi.org/pypi/flashinfer-python/VERSION/json"
    检查。
  • 如果不在PyPI上,则无法更新
    security_scanning/poetry.lock
    的哈希值。警告用户并留下
    # TODO: 发布到PyPI后更新哈希值
    注释。
  • requirements.txt
    可以固定为git安装方式:
    flashinfer-python @ git+https://github.com/flashinfer-ai/flashinfer.git@TAG#egg=flashinfer-python
    询问用户偏好哪种方式(PyPI固定 vs git固定)。

Step 5: Verify Version Compatibility

步骤5:验证版本兼容性

After updating, check if any code has version-gated logic that needs adjusting:
bash
grep -rn 'flashinfer.*__version__\|flashinfer.*version' \
  tensorrt_llm/ --include="*.py"
Known locations with version checks:
  • tensorrt_llm/_torch/speculative/interface.py
    flashinfer.__version__ >= "0.6.4"
If the new version is still >= the gated version, no changes needed. Otherwise, flag to the user.
更新完成后,检查是否有任何代码包含需要调整的版本门控逻辑:
bash
grep -rn 'flashinfer.*__version__\|flashinfer.*version' \
  tensorrt_llm/ --include="*.py"
已知存在版本检查的位置:
  • tensorrt_llm/_torch/speculative/interface.py
    flashinfer.__version__ >= "0.6.4"
如果新版本仍大于等于门控版本,则无需更改。否则,向用户标记此问题。

Step 6: Summary

步骤6:总结

Print a summary of all changes made:
  • Old version → New version
  • Files modified (with line numbers)
  • Any warnings (e.g., poetry.lock hashes couldn't be updated for nightly)
  • Remind user to run
    pip install -r requirements.txt
    to test locally
  • Remind user to run relevant unit tests:
    bash
    pytest tests/unittest/_torch/flashinfer/ -v
    pytest tests/unittest/_torch/attention/test_flashinfer_attention.py -v
打印所有已做更改的摘要:
  • 旧版本 → 新版本
  • 修改的文件(含行号)
  • 任何警告(例如夜间版本无法更新poetry.lock哈希值)
  • 提醒用户运行
    pip install -r requirements.txt
    进行本地测试
  • 提醒用户运行相关单元测试:
    bash
    pytest tests/unittest/_torch/flashinfer/ -v
    pytest tests/unittest/_torch/attention/test_flashinfer_attention.py -v

Step 7: Commit, Push, and Create PR

步骤7:提交、推送并创建PR

After all files are updated and verified:
If the user opted out of the
poetry.lock
update at Step 3 question 3, drop
security_scanning/poetry.lock
from the
git stash
,
git add
, and commit message in the snippets below.
所有文件更新并验证完成后:
如果用户在步骤3的问题3中选择更新
poetry.lock
,则在以下代码片段的
git stash
git add
和提交消息中移除
security_scanning/poetry.lock

7a. Create a new branch from upstream main

7a:基于上游main创建新分支

bash
undefined
bash
undefined

Drop security_scanning/poetry.lock from this list if the user opted out.

如果用户选择不更新,从此列表中移除security_scanning/poetry.lock。

git stash push -m "flashinfer-upgrade-wip" -- requirements.txt security_scanning/pyproject.toml security_scanning/poetry.lock ATTRIBUTIONS-Python.md git checkout main git pull --rebase https://github.com/NVIDIA/TensorRT-LLM.git main git checkout -b ${GITHUB_USERNAME}/update_flashinfer_${NEW_VERSION} git stash pop

Where `GITHUB_USERNAME` comes from the fork remote (e.g., `yihwang-nv`) and
`NEW_VERSION` is the selected version (e.g., `0.6.7.post3`).
git stash push -m "flashinfer-upgrade-wip" -- requirements.txt security_scanning/pyproject.toml security_scanning/poetry.lock ATTRIBUTIONS-Python.md git checkout main git pull --rebase https://github.com/NVIDIA/TensorRT-LLM.git main git checkout -b ${GITHUB_USERNAME}/update_flashinfer_${NEW_VERSION} git stash pop

其中`GITHUB_USERNAME`来自fork远程仓库(例如`yihwang-nv`),`NEW_VERSION`是所选版本(例如`0.6.7.post3`)。

7b. Commit with DCO sign-off

7b:带DCO签署的提交

bash
undefined
bash
undefined

Drop security_scanning/poetry.lock from the
git add
list and the commit

如果用户选择不更新,从
git add
列表和提交正文中移除security_scanning/poetry.lock。

body if the user opted out.

git add requirements.txt security_scanning/pyproject.toml security_scanning/poetry.lock ATTRIBUTIONS-Python.md git commit -s -m "[None][chore] Update flashinfer-python from OLD to NEW
Bump flashinfer-python dependency to the latest stable release. Updated version pins in requirements.txt, security_scanning/pyproject.toml, security_scanning/poetry.lock (if updated), and ATTRIBUTIONS-Python.md."
undefined
git add requirements.txt security_scanning/pyproject.toml security_scanning/poetry.lock ATTRIBUTIONS-Python.md git commit -s -m "[None][chore] Update flashinfer-python from OLD to NEW
Bump flashinfer-python dependency to the latest stable release. Updated version pins in requirements.txt, security_scanning/pyproject.toml, security_scanning/poetry.lock (if updated), and ATTRIBUTIONS-Python.md."
undefined

7c. Push the branch to the user's fork

7c:将分支推送到用户的fork

Identify the fork remote (from Step 0b — commonly named
fork
), then push:
bash
FORK_REMOTE=fork   # adjust if the user named their fork remote differently
BRANCH="${GITHUB_USERNAME}/update_flashinfer_${NEW_VERSION}"
git push -u "${FORK_REMOTE}" "${BRANCH}"
If the push is rejected for auth reasons, confirm
gh auth status
shows
repo
scope —
gh
installs a git credential helper that reuses its token for HTTPS pushes. Users on a non-default config dir must export
GH_CONFIG_DIR
in the same shell.
确定fork远程仓库(来自步骤0b——通常命名为
fork
),然后推送:
bash
FORK_REMOTE=fork   # 如果用户为fork远程仓库命名不同,请调整
BRANCH="${GITHUB_USERNAME}/update_flashinfer_${NEW_VERSION}"
git push -u "${FORK_REMOTE}" "${BRANCH}"
如果推送因认证原因被拒绝,确认
gh auth status
显示
repo
权限——
gh
会安装一个git凭证助手,将其令牌重用于HTTPS推送。使用非默认配置目录的用户必须在同一个shell中导出
GH_CONFIG_DIR

7d. Open the PR on
NVIDIA/TensorRT-LLM

7d:在
NVIDIA/TensorRT-LLM
上创建PR

bash
gh pr create \
  --repo NVIDIA/TensorRT-LLM \
  --base main \
  --head "${GITHUB_USERNAME}:${BRANCH}" \
  --title "[None][chore] Update flashinfer-python from ${OLD_VERSION} to ${NEW_VERSION}" \
  --body "$(cat <<EOF
bash
gh pr create \
  --repo NVIDIA/TensorRT-LLM \
  --base main \
  --head "${GITHUB_USERNAME}:${BRANCH}" \
  --title "[None][chore] Update flashinfer-python from ${OLD_VERSION} to ${NEW_VERSION}" \
  --body "$(cat <<EOF

Summary

Summary

  • Bump flashinfer-python from ${OLD_VERSION} to ${NEW_VERSION} (latest stable)
  • Updated version pins in requirements.txt, security_scanning/pyproject.toml, and ATTRIBUTIONS-Python.md (and security_scanning/poetry.lock if the user opted in)
  • Bump flashinfer-python from ${OLD_VERSION} to ${NEW_VERSION} (latest stable)
  • Updated version pins in requirements.txt, security_scanning/pyproject.toml, and ATTRIBUTIONS-Python.md (and security_scanning/poetry.lock if the user opted in)

Test plan

Test plan

  • pip install -r requirements.txt installs successfully
  • pytest tests/unittest/_torch/flashinfer/ -v
  • pytest tests/unittest/_torch/attention/test_flashinfer_attention.py -v
  • CI pre-merge passes EOF )"

`gh pr create` prints the new PR URL on success. Report it back to the user.
  • pip install -r requirements.txt installs successfully
  • pytest tests/unittest/_torch/flashinfer/ -v
  • pytest tests/unittest/_torch/attention/test_flashinfer_attention.py -v
  • CI pre-merge passes EOF )"

`gh pr create`成功后会打印新PR的URL。将其反馈给用户。

Files Reference

文件参考

All files that contain flashinfer-python version pins:
FilePattern
requirements.txt
flashinfer-python==X.Y.Z
security_scanning/pyproject.toml
"flashinfer-python (==X.Y.Z)"
security_scanning/poetry.lock
name = "flashinfer-python"
block with version + hashes
ATTRIBUTIONS-Python.md
## flashinfer-python (X.Y.Z)
所有包含flashinfer-python版本固定的文件:
文件匹配模式
requirements.txt
flashinfer-python==X.Y.Z
security_scanning/pyproject.toml
"flashinfer-python (==X.Y.Z)"
security_scanning/poetry.lock
name = "flashinfer-python"
块,包含版本和哈希值
ATTRIBUTIONS-Python.md
## flashinfer-python (X.Y.Z)

Notes

注意事项

  • The
    setup.py
    has a comment about git+https install URLs — no version pin to update there.
  • The
    .pre-commit-config.yaml
    and
    pyproject.toml
    reference flashinfer source files, not versions — no changes needed.
  • The
    flashinfer/
    submodule (if present) is separate from the
    flashinfer-python
    PyPI package.
  • setup.py
    中有关于git+https安装URL的注释——此处无需更新版本固定。
  • .pre-commit-config.yaml
    pyproject.toml
    引用flashinfer源文件,而非版本——无需更改。
  • flashinfer/
    子模块(如果存在)与
    flashinfer-python
    PyPI包是独立的。