model-pr-history-knowledge
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseModel PR History Knowledge
模型PR历史知识库
This is a PR-driven knowledge base for model optimization history. It is not a
set of per-model skills. Each model family keeps bilingual docs with inspected
PR diffs, implementation file coverage, timelines, changed files, code excerpts,
and validation/risk notes.
Use it before patching model-specific serving paths, choosing an SGLang SOTA
optimization target, or explaining why a framework already has a faster path.
这是一个基于PR的模型优化历史知识库,并非针对单个模型的技能集合。每个模型家族都维护着双语文档,包含已审核的PR差异、实现文件覆盖范围、时间线、变更文件、代码片段以及验证/风险说明。
在修改模型特定的服务路径、选择SGLang的SOTA优化目标,或解释为何框架已具备更快路径时,请使用本知识库。
Query
查询
Run commands from this directory:
bash
python3 scripts/query.py --list
python3 scripts/query.py --framework sglang --model qwen3-core --paths-only
python3 scripts/query.py --framework sglang --model qwen3-core "fused qk norm rope"
python3 scripts/query.py --framework vllm "DeepSeek-V4 fused norm router" --limit 5Useful options:
- : restrict to one serving framework.
--framework sglang|vllm - : restrict to one model family directory.
--model <slug> - : select English, Chinese, or both docs.
--lang en|zh|both - : print the exact docs to read without snippets.
--paths-only - : bound search results.
--limit N
从当前目录执行以下命令:
bash
python3 scripts/query.py --list
python3 scripts/query.py --framework sglang --model qwen3-core --paths-only
python3 scripts/query.py --framework sglang --model qwen3-core "fused qk norm rope"
python3 scripts/query.py --framework vllm "DeepSeek-V4 fused norm router" --limit 5实用选项:
- :限定为单个服务框架。
--framework sglang|vllm - :限定为单个模型家族目录。
--model <slug> - :选择英文、中文或双语文档。
--lang en|zh|both - :仅打印需阅读的准确文档路径,不显示代码片段。
--paths-only - :限制搜索结果数量。
--limit N
Workflow
工作流程
- Infer the model-family slug from the user's model id, checkpoint path, or
SGLang source path. If unsure, run .
scripts/query.py "<model name>" - Read the matching SGLang history first for SGLang patch work. Read the vLLM history too when vLLM is the leading competitor or its trace suggests a missing SGLang fast path.
- Extract only actionable evidence:
- model implementation files and symbols
- PRs that changed the hot source path
- prior fusions, overlap work, quantization, MoE, attention, cache, sampler, or loader changes
- open/watch PRs that may explain a known gap or pending support issue
- validation lanes and regression risks implied by the PR cards
- Save a short note in the active run artifacts, for example
, with paths read, PR numbers, source files, and the decision each item influenced.
history/model-pr-history-notes.md - Do not copy long PR cards into the final answer. Cite paths and summarize the relevant implementation/risk.
- 从用户提供的模型ID、检查点路径或SGLang源码路径推断模型家族slug。若不确定,请执行。
scripts/query.py "<模型名称>" - 若进行SGLang补丁开发,请先阅读匹配的SGLang历史文档。当vLLM是主要竞品,或其追踪数据显示SGLang缺少某条快速路径时,也需阅读vLLM的历史文档。
- 仅提取可落地的证据:
- 模型实现文件及符号
- 修改过热源码路径的PR
- 过往的融合方案、重叠工作、量化、MoE、注意力机制、缓存、采样器或加载器变更
- 可解释已知差距或待支持问题的公开/关注中PR
- PR卡片中隐含的验证流程及回归风险
- 在当前运行的工件中保存简短记录,例如,包含阅读的路径、PR编号、源码文件,以及各条目影响的决策。
history/model-pr-history-notes.md - 请勿将冗长的PR卡片复制到最终答案中。请引用路径并总结相关实现/风险要点。
Model Slugs
模型Slug
Current frameworks:
sglangvllm
Current model-family slugs include:
text
deepseek-ocr, deepseek-ocr-2, deepseek-v3-r1, deepseek-v31, deepseek-v32,
deepseek-v4, ernie45, gemma4, glm-vlm-ocr, glm45, glm46-glm47, glm5-glm51,
gpt-oss, intern-s1, internvl35, jina-reranker-m0, kimi, ling25, llada21,
llama31, llama33-70b, llama4, mimo-v2-flash, minimax, mistral-small-4,
mixtral-quark-int4fp8-moe, nemotron-super, qwen-vlm-omni-asr, qwen3-coder,
qwen3-core, qwen3-next, qwen35, ring25, step35当前支持的框架:
sglangvllm
当前的模型家族slug包括:
text
deepseek-ocr, deepseek-ocr-2, deepseek-v3-r1, deepseek-v31, deepseek-v32,
deepseek-v4, ernie45, gemma4, glm-vlm-ocr, glm45, glm46-glm47, glm5-glm51,
gpt-oss, intern-s1, internvl35, jina-reranker-m0, kimi, ling25, llada21,
llama31, llama33-70b, llama4, mimo-v2-flash, minimax, mistral-small-4,
mixtral-quark-int4fp8-moe, nemotron-super, qwen-vlm-omni-asr, qwen3-coder,
qwen3-core, qwen3-next, qwen35, ring25, step35SOTA Loop Contract
SOTA循环约定
For , this knowledge base is an early context
source:
sglang-sota-humanize-loop- Read it after model identification and before patch planning.
- Include the history paths and key PR evidence in or
analysis/root-cause.md.history/model-pr-history-notes.md - If the profiler points at a known model path, check whether the history has prior changes on that file before writing a new patch.
- If a competitor is faster, search that competitor's model history for the same model family and stage before assuming the gap is kernel-local.
对于,该知识库是早期上下文来源:
sglang-sota-humanize-loop- 在模型识别完成后、补丁规划开始前阅读本知识库。
- 将历史路径及关键PR证据纳入或
analysis/root-cause.md。history/model-pr-history-notes.md - 若性能分析器指向某一已知模型路径,在编写新补丁前请检查历史文档中是否有针对该文件的过往变更。
- 若竞品性能更优,请先搜索该竞品对应模型家族及阶段的历史文档,再假设性能差距源于内核本身。