model-pr-history-knowledge

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Model PR History Knowledge

模型PR历史知识库

This is a PR-driven knowledge base for model optimization history. It is not a set of per-model skills. Each model family keeps bilingual docs with inspected PR diffs, implementation file coverage, timelines, changed files, code excerpts, and validation/risk notes.
Use it before patching model-specific serving paths, choosing an SGLang SOTA optimization target, or explaining why a framework already has a faster path.
这是一个基于PR的模型优化历史知识库,并非针对单个模型的技能集合。每个模型家族都维护着双语文档,包含已审核的PR差异、实现文件覆盖范围、时间线、变更文件、代码片段以及验证/风险说明。
在修改模型特定的服务路径、选择SGLang的SOTA优化目标,或解释为何框架已具备更快路径时,请使用本知识库。

Query

查询

Run commands from this directory:
bash
python3 scripts/query.py --list
python3 scripts/query.py --framework sglang --model qwen3-core --paths-only
python3 scripts/query.py --framework sglang --model qwen3-core "fused qk norm rope"
python3 scripts/query.py --framework vllm "DeepSeek-V4 fused norm router" --limit 5
Useful options:
  • --framework sglang|vllm
    : restrict to one serving framework.
  • --model <slug>
    : restrict to one model family directory.
  • --lang en|zh|both
    : select English, Chinese, or both docs.
  • --paths-only
    : print the exact docs to read without snippets.
  • --limit N
    : bound search results.
从当前目录执行以下命令:
bash
python3 scripts/query.py --list
python3 scripts/query.py --framework sglang --model qwen3-core --paths-only
python3 scripts/query.py --framework sglang --model qwen3-core "fused qk norm rope"
python3 scripts/query.py --framework vllm "DeepSeek-V4 fused norm router" --limit 5
实用选项:
  • --framework sglang|vllm
    :限定为单个服务框架。
  • --model <slug>
    :限定为单个模型家族目录。
  • --lang en|zh|both
    :选择英文、中文或双语文档。
  • --paths-only
    :仅打印需阅读的准确文档路径,不显示代码片段。
  • --limit N
    :限制搜索结果数量。

Workflow

工作流程

  1. Infer the model-family slug from the user's model id, checkpoint path, or SGLang source path. If unsure, run
    scripts/query.py "<model name>"
    .
  2. Read the matching SGLang history first for SGLang patch work. Read the vLLM history too when vLLM is the leading competitor or its trace suggests a missing SGLang fast path.
  3. Extract only actionable evidence:
    • model implementation files and symbols
    • PRs that changed the hot source path
    • prior fusions, overlap work, quantization, MoE, attention, cache, sampler, or loader changes
    • open/watch PRs that may explain a known gap or pending support issue
    • validation lanes and regression risks implied by the PR cards
  4. Save a short note in the active run artifacts, for example
    history/model-pr-history-notes.md
    , with paths read, PR numbers, source files, and the decision each item influenced.
  5. Do not copy long PR cards into the final answer. Cite paths and summarize the relevant implementation/risk.
  1. 从用户提供的模型ID、检查点路径或SGLang源码路径推断模型家族slug。若不确定,请执行
    scripts/query.py "<模型名称>"
  2. 若进行SGLang补丁开发,请先阅读匹配的SGLang历史文档。当vLLM是主要竞品,或其追踪数据显示SGLang缺少某条快速路径时,也需阅读vLLM的历史文档。
  3. 仅提取可落地的证据:
    • 模型实现文件及符号
    • 修改过热源码路径的PR
    • 过往的融合方案、重叠工作、量化、MoE、注意力机制、缓存、采样器或加载器变更
    • 可解释已知差距或待支持问题的公开/关注中PR
    • PR卡片中隐含的验证流程及回归风险
  4. 在当前运行的工件中保存简短记录,例如
    history/model-pr-history-notes.md
    ,包含阅读的路径、PR编号、源码文件,以及各条目影响的决策。
  5. 请勿将冗长的PR卡片复制到最终答案中。请引用路径并总结相关实现/风险要点。

Model Slugs

模型Slug

Current frameworks:
  • sglang
  • vllm
Current model-family slugs include:
text
deepseek-ocr, deepseek-ocr-2, deepseek-v3-r1, deepseek-v31, deepseek-v32,
deepseek-v4, ernie45, gemma4, glm-vlm-ocr, glm45, glm46-glm47, glm5-glm51,
gpt-oss, intern-s1, internvl35, jina-reranker-m0, kimi, ling25, llada21,
llama31, llama33-70b, llama4, mimo-v2-flash, minimax, mistral-small-4,
mixtral-quark-int4fp8-moe, nemotron-super, qwen-vlm-omni-asr, qwen3-coder,
qwen3-core, qwen3-next, qwen35, ring25, step35
当前支持的框架:
  • sglang
  • vllm
当前的模型家族slug包括:
text
deepseek-ocr, deepseek-ocr-2, deepseek-v3-r1, deepseek-v31, deepseek-v32,
deepseek-v4, ernie45, gemma4, glm-vlm-ocr, glm45, glm46-glm47, glm5-glm51,
gpt-oss, intern-s1, internvl35, jina-reranker-m0, kimi, ling25, llada21,
llama31, llama33-70b, llama4, mimo-v2-flash, minimax, mistral-small-4,
mixtral-quark-int4fp8-moe, nemotron-super, qwen-vlm-omni-asr, qwen3-coder,
qwen3-core, qwen3-next, qwen35, ring25, step35

SOTA Loop Contract

SOTA循环约定

For
sglang-sota-humanize-loop
, this knowledge base is an early context source:
  • Read it after model identification and before patch planning.
  • Include the history paths and key PR evidence in
    analysis/root-cause.md
    or
    history/model-pr-history-notes.md
    .
  • If the profiler points at a known model path, check whether the history has prior changes on that file before writing a new patch.
  • If a competitor is faster, search that competitor's model history for the same model family and stage before assuming the gap is kernel-local.
对于
sglang-sota-humanize-loop
,该知识库是早期上下文来源:
  • 在模型识别完成后、补丁规划开始前阅读本知识库。
  • 将历史路径及关键PR证据纳入
    analysis/root-cause.md
    history/model-pr-history-notes.md
  • 若性能分析器指向某一已知模型路径,在编写新补丁前请检查历史文档中是否有针对该文件的过往变更。
  • 若竞品性能更优,请先搜索该竞品对应模型家族及阶段的历史文档,再假设性能差距源于内核本身。