project-profiler

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

project-profiler

项目概要生成器(project-profiler)

Generate an LLM-optimized project profile — a judgment-rich document that lets any future LLM answer within 60 seconds:
  1. What are the core abstractions?
  2. Which modules to modify for feature X?
  3. What is the biggest risk/debt?
  4. When should / shouldn't you use this?
This is NOT a codebase map (directory + module navigation) or a diff schematic. This is architectural judgment: design tradeoffs, usage patterns, and when NOT to use.

生成一份LLM优化的项目概要文档——一份富含判断性内容的文档,能让后续任意LLM在60秒内回答以下问题:
  1. 核心抽象有哪些?
  2. 要实现功能X需要修改哪些模块?
  3. 最大的风险/技术债务是什么?
  4. 何时应该/不应该使用本项目?
这不是代码库地图(目录+模块导航)或差异示意图,而是架构判断:设计权衡、使用模式,以及不适用的场景。

Model Strategy

模型策略

  • Opus: Orchestrator — runs all phases, writes the final profile. Does NOT read source code directly (except in direct mode).
  • Sonnet: Subagents — read source code files, analyze patterns, report structured findings.
  • All subagents launch in a single message (parallel, never sequential).

  • Opus:协调器——执行所有阶段,编写最终的概要文档。不直接读取源代码(直接模式除外)。
  • Sonnet:子代理——读取源代码文件,分析模式,输出结构化的分析结果。
  • 所有子代理通过单条消息启动(并行执行,绝不串行)。

Phase 0: Preflight

阶段0:准备工作

0.1 Target & Project Name

0.1 目标与项目名称

Determine the target directory (use argument if provided, else
.
).
Extract project name from the first available source:
  1. package.json
    name
  2. pyproject.toml
    [project] name
  3. Cargo.toml
    [package] name
  4. go.mod
    → module path (last segment)
  5. Directory name as fallback
确定目标目录(若提供参数则使用参数,否则使用当前目录
.
)。
从以下优先级来源提取项目名称:
  1. package.json
    name
    字段
  2. pyproject.toml
    [project] name
    字段
  3. Cargo.toml
    [package] name
    字段
  4. go.mod
    → 模块路径(最后一段)
  5. 目录名称(备选方案)

0.2 Run Scanner

0.2 运行扫描器

bash
uv run {SKILL_DIR}/scripts/scan-project.py {TARGET_DIR} --format summary
Capture the summary output. This provides:
  • Project metadata (name, version, license, deps count)
  • Tech stack (languages, frameworks, package manager)
  • Language distribution (top 5 by tokens)
  • Entry points (CLI, API, library)
  • Project features (dockerfile, CI, tests, codebase_map)
  • Detected conditional sections (Storage, Embedding, Infrastructure, etc.)
  • Workspaces (monorepo packages, if any)
  • Top 20 largest files
  • Directory structure (depth 3)
For debugging or when full file details are needed, use
--format json
instead.
bash
uv run {SKILL_DIR}/scripts/scan-project.py {TARGET_DIR} --format summary
捕获扫描器的摘要输出,内容包括:
  • 项目元数据(名称、版本、许可证、依赖数量)
  • 技术栈(语言、框架、包管理器)
  • 语言分布(按代码令牌数排序的前5种语言)
  • 入口点(CLI、API、库)
  • 项目特性(Dockerfile、CI、测试、codebase_map)
  • 检测到的条件模块(存储层、嵌入流水线、基础设施等)
  • 工作区(若为单体仓库则包含子包)
  • 前20个最大文件
  • 目录结构(深度3级)
如需调试或需要完整文件详情,使用
--format json
参数替代。

0.3 Git Metadata

0.3 Git元数据

Run these commands (use Bash tool):
bash
undefined
运行以下命令(使用Bash工具):
bash
undefined

Recent commits

最近提交记录

git -C {TARGET_DIR} log --oneline -20
git -C {TARGET_DIR} log --oneline -20

Contributors

贡献者

git -C {TARGET_DIR} log --format="%aN" | sort -u | head -20
git -C {TARGET_DIR} log --format="%aN" | sort -u | head -20

Version tags

版本标签

git -C {TARGET_DIR} tag --sort=-v:refname | head -5
git -C {TARGET_DIR} tag --sort=-v:refname | head -5

First commit date

首次提交日期

git -C {TARGET_DIR} log --format="%aI" --reverse | head -1
undefined
git -C {TARGET_DIR} log --format="%aI" --reverse | head -1
undefined

0.4 Check Existing CODEBASE_MAP

0.4 检查现有CODEBASE_MAP

If
docs/CODEBASE_MAP.md
exists, note its presence. The profile will reference it rather than duplicating directory structure.
docs/CODEBASE_MAP.md
已存在,记录其存在。概要文档将引用该文件,而非重复目录结构内容。

0.5 Token Budget → Execution Mode

0.5 令牌预算 → 执行模式

Based on
total_tokens
from scanner, choose execution mode:
Total TokensModeStrategy
≤ 80kDirectSkip subagents. Opus reads all files directly and performs all analysis in a single context.
80k – 200k2 agentsAgent AB (Core + Architecture + Design), Agent C (Usage + Patterns + Deployment)
200k – 400k3 agentsAgent A (Core + Design), Agent B (Architecture + Patterns), Agent C (Usage + Deployment)
> 400k3 agentsAgent A, Agent B, Agent C — each ≤150k tokens, with overflow files assigned to lightest agent
Why 80k threshold: Opus has 200k context. At ≤80k source tokens, loading all files + scanner output + git metadata + writing the profile all fit comfortably. Subagent overhead (spawn + communication + wait) adds 2-3 minutes for zero benefit.
Direct mode workflow: Skip Phase 2 entirely. After Phase 0+1, proceed to Phase 3 (read scanner
detected_sections
directly), then Phase 4, then Phase 5. Read files on-demand during synthesis — do NOT pre-read all files; read only what's needed for each section.

根据扫描器输出的
total_tokens
选择执行模式:
总令牌数模式策略
≤ 80k直接模式跳过子代理。Opus直接读取所有文件,并在单个上下文内完成所有分析。
80k – 200k2代理模式代理AB(核心+架构+设计)、代理C(使用+模式+部署)
200k – 400k3代理模式代理A(核心+设计)、代理B(架构+模式)、代理C(使用+部署)
> 400k3代理模式代理A、B、C——每个代理处理≤150k令牌的文件,溢出文件分配给负载最轻的代理
为何设置80k阈值:Opus的上下文窗口为200k。当源代码令牌数≤80k时,加载所有文件+扫描器输出+Git元数据+编写概要文档的内容可完全容纳。子代理的开销(启动+通信+等待)会增加2-3分钟且无任何收益。
直接模式工作流:完全跳过阶段2。完成阶段0+1后,直接进入阶段3(直接读取扫描器的
detected_sections
输出),然后是阶段4、阶段5。在合成文档时按需读取文件——不要预读所有文件,仅读取每个章节分析所需的文件。

Phase 1: Community & External Data

阶段1:社区与外部数据

Run in parallel with Phase 2 subagent launches (or with Phase 3 in direct mode).
与阶段2的子代理启动并行执行(或在直接模式下与阶段3并行执行)。

1.1 GitHub Stats

1.1 GitHub统计数据

Parse owner/repo from
.git/config
remote origin URL:
bash
git -C {TARGET_DIR} remote get-url origin
Extract
owner/repo
from the URL. Then:
bash
gh api repos/{owner}/{repo} --jq '{stars: .stargazers_count, forks: .forks_count, open_issues: .open_issues_count}'
If
gh
is unavailable or not a GitHub repo → fill with
N/A
. Do not fail.
.git/config
的远程仓库URL中解析owner/repo:
bash
git -C {TARGET_DIR} remote get-url origin
从URL中提取
owner/repo
,然后执行:
bash
gh api repos/{owner}/{repo} --jq '{stars: .stargazers_count, forks: .forks_count, open_issues: .open_issues_count}'
gh
工具不可用或非GitHub仓库→填充为
N/A
,不抛出错误。

1.2 Package Downloads

1.2 包下载量

npm (if
package.json
exists):
WebFetch https://api.npmjs.org/downloads/point/last-month/{package_name}
PyPI (if
pyproject.toml
exists):
WebFetch https://pypistats.org/api/packages/{package_name}/recent
If fetch fails → fill with
N/A
.
npm(若存在
package.json
):
WebFetch https://api.npmjs.org/downloads/point/last-month/{package_name}
PyPI(若存在
pyproject.toml
):
WebFetch https://pypistats.org/api/packages/{package_name}/recent
若获取失败→填充为
N/A

1.3 License

1.3 许可证

Read from (in order): LICENSE file → package metadata field →
N/A
.
从以下优先级来源读取:LICENSE文件→包元数据字段→
N/A

1.4 Maturity Assessment

1.4 成熟度评估

Calculate from:
  • Git history length: first commit date → now
  • Release count: number of version tags
  • Contributor count: unique authors
CriteriaScore
< 3 months, < 3 releases, 1-2 contributorsexperimental
3-12 months, 3-10 releases, 2-5 contributorsgrowing
1-3 years, 10-50 releases, 5-20 contributorsstable
> 3 years, > 50 releases, > 20 contributorsmature
Use the lowest matching tier (conservative estimate).

根据以下指标计算:
  • Git历史时长:首次提交日期→当前日期
  • 发布次数:版本标签数量
  • 贡献者数量:唯一作者数量
评估标准成熟度等级
时长<3个月,发布次数<3,贡献者1-2人实验性
3-12个月,3-10次发布,2-5名贡献者成长期
1-3年,10-50次发布,5-20名贡献者稳定期
>3年,>50次发布,>20名贡献者成熟期
采用最低匹配等级(保守估算)。

Phase 2: Parallel Deep Exploration

阶段2:并行深度探索

Direct mode (≤80k tokens): SKIP this entire phase. Proceed to Phase 3. Opus reads files directly during synthesis.
Launch Sonnet subagents using the
Task
tool. All subagents must be launched in a single message.
Assign files to each agent based on the token budget from Phase 0.5. Use the scanner output to determine which files go to which agent.
直接模式(≤80k令牌):完全跳过此阶段,直接进入阶段3。Opus在合成文档时按需读取文件。
使用
Task
工具启动Sonnet子代理。所有子代理必须通过单条消息启动
根据阶段0.5的令牌预算为每个代理分配文件,使用扫描器输出确定文件分配方案。

File Assignment Strategy

文件分配策略

If workspaces detected (monorepo):
  1. Group files by workspace package
  2. Assign complete packages to agents (never split a package across agents)
  3. Agent A gets packages with core business logic
  4. Agent B gets packages with infrastructure/shared libraries
  5. Agent C gets packages with CLI/API/SDK surface + docs
If no workspaces (single project):
  1. Sort all files by path
  2. Group by top-level directory
  3. Assign groups to agents based on their responsibility:
    • Agent A gets: core source files (src/lib, core/, models/, types/) + README, CHANGELOG
    • Agent B gets: architecture files (routes/, middleware/, config/, entry points) + tests/
    • Agent C gets: integration files (API, CLI, SDK, examples/, docs/) + .github/
  4. If files don't fit neatly, distribute remaining to agents under budget
若检测到工作区(单体仓库)
  1. 按工作区子包对文件分组
  2. 将完整子包分配给代理(绝不跨代理拆分子包)
  3. 代理A负责核心业务逻辑子包
  4. 代理B负责基础设施/共享库子包
  5. 代理C负责CLI/API/SDK接口+文档子包
若无工作区(单项目)
  1. 按路径对所有文件排序
  2. 按顶层目录分组
  3. 根据代理职责分配分组:
    • 代理A:核心源代码文件(src/lib、core/、models/、types/)+ README、CHANGELOG
    • 代理B:架构相关文件(routes/、middleware/、config/、入口点)+ 测试文件
    • 代理C:集成相关文件(API、CLI、SDK、examples/、docs/)+ .github/
  4. 若文件无法完美分配,将剩余文件分配给未达令牌预算的代理

Agent A: Core Abstractions + Design Decisions

代理A:核心抽象 + 设计决策

Task prompt for Agent A — subagent_type: "general-purpose", model: "sonnet"
Task prompt for Agent A — subagent_type: "general-purpose", model: "sonnet"

Mission

任务目标

Identify the most architecturally significant abstractions AND key design decisions in this codebase.
识别代码库中最具架构意义的抽象概念及关键设计决策。

Files to Read

需读取的文件

{LIST_OF_ASSIGNED_FILES} Also read: README.md, CHANGELOG.md (if they exist and not already assigned)
{LIST_OF_ASSIGNED_FILES} 额外读取:README.md、CHANGELOG.md(若存在且未分配给其他代理)

Output Format

输出格式

Part 1: Core Abstractions

第一部分:核心抽象

Report the TOP 10-15 most architecturally significant abstractions, ranked by fan-in (how many other files reference them). If the project has fewer than 15 meaningful abstractions, report all.
For EACH abstraction:
报告前10-15个最具架构意义的抽象概念,按扇入数(被其他文件引用的次数)排序。若项目中有效抽象少于15个,则全部报告。
每个抽象需包含:

{Name}

{名称}

  • Purpose: {≤15 words}
  • Defined in:
    {file_path}:ClassName
    or
    {file_path}:function_name
  • Type: {class / interface / type / trait / struct / protocol}
  • Public methods/fields: {exact_count}
  • Adapters/implementations: {count} — {names with file paths}
  • Imported by: {count} files
  • Key pattern: {factory / singleton / strategy / observer / none}
  • 用途:{≤15字}
  • 定义位置
    {file_path}:ClassName
    {file_path}:function_name
  • 类型:{类 / 接口 / 类型 / trait / 结构体 / 协议}
  • 公共方法/字段:{精确数量}
  • 适配器/实现:{数量} — {带文件路径的名称列表}
  • 被引用次数:{数量}个文件
  • 关键模式:{工厂模式 / 单例模式 / 策略模式 / 观察者模式 / 无}

Part 2: Design Decisions

第二部分:设计决策

For EACH decision (identify 3-5):
报告3-5个关键设计决策,每个决策包含:

{Decision Title}

{决策标题}

  • Problem: {what needed solving}
  • Choice made: {what was chosen}
  • Evidence:
    {file_path}:ClassName
    or
    {file_path}:function_name
    — {relevant code pattern}
  • Alternatives NOT chosen: {what else could have been done}
  • Why not: {concrete reason — performance / complexity / ecosystem / team preference}
  • Tradeoff: {what is gained} vs. {what is lost}
  • 问题背景:{需要解决的问题}
  • 选择方案:{最终采用的方案}
  • 证据
    {file_path}:ClassName
    {file_path}:function_name
    — {相关代码模式}
  • 未选择的替代方案:{其他可行方案}
  • 未选择原因:{具体理由——性能 / 复杂度 / 生态系统 / 团队偏好}
  • 权衡结果:{获得的收益} vs. {失去的东西}

Part 3: Architecture Risks

第三部分:架构风险

For EACH risk (identify 2-4):
  • Risk: {specific description}
  • Location:
    {file_path}:SymbolName
  • Impact: {what breaks if this goes wrong}
  • Mitigation: {how to fix or reduce risk}
报告2-4个架构风险,每个风险包含:
  • 风险描述:{具体内容}
  • 风险位置
    {file_path}:SymbolName
  • 影响范围:{风险触发后会导致哪些问题}
  • 缓解措施:{修复或降低风险的方法}

Part 4: Recommendations

第四部分:改进建议

For EACH recommendation (identify 2-4):
  • Current state:
    {file_path}
    — {what exists now}
  • Problem: {specific issue — not "could be better"}
  • Fix: {concrete action — not "consider refactoring"}
  • Effect: {measurable outcome}
报告2-4个改进建议,每个建议包含:
  • 当前状态
    {file_path}
    — {当前存在的情况}
  • 问题描述:{具体问题——不能写“可以优化”}
  • 修复方案:{具体行动——不能写“考虑重构”}
  • 预期效果:{可衡量的结果}

Rules

规则

  • Every number must come from actual code (count imports, count methods)
  • No subjective language (no "well-designed", "elegant", "robust", "clean", "優雅", "完美", "強大")
  • Every claim needs a
    file:SymbolName
    reference (NOT line numbers — they break on next commit)
  • Each decision must have a "why NOT the alternative" answer
  • Report the TOTAL count of abstractions found
undefined
  • 所有数字必须来自实际代码(统计引用次数、方法数量等)
  • 禁止使用主观语言(如“设计良好”、“优雅”、“健壮”、“简洁”、“優雅”、“完美”、“強大”等)
  • 所有结论必须带有
    file:SymbolName
    引用(禁止使用行号——代码提交后行号会变化)
  • 每个决策必须包含“为何未选择替代方案”的答案
  • 报告检测到的抽象概念总数
undefined

Agent B: Architecture + Code Quality Patterns

代理B:架构 + 代码质量模式

Task prompt for Agent B — subagent_type: "general-purpose", model: "sonnet"
Task prompt for Agent B — subagent_type: "general-purpose", model: "sonnet"

Mission

任务目标

Map the system topology, layer boundaries, data flow paths, AND code quality patterns.
绘制系统拓扑结构、层边界、数据流路径及代码质量模式。

Files to Read

需读取的文件

{LIST_OF_ASSIGNED_FILES}
{LIST_OF_ASSIGNED_FILES}

Output Format

输出格式

Part 1: Topology

第一部分:拓扑结构

  • Architecture style: {monolith / microservices / serverless / library / CLI tool / plugin system}
  • Entry points: {list with file paths}
  • Layer count: {N}
  • 架构风格:{单体应用 / 微服务 / 无服务器 / 库 / CLI工具 / 插件系统}
  • 入口点:{带文件路径的列表}
  • 层级数量:{N}

Part 2: Layers (table)

第二部分:层级结构(表格)

LayerModulesFilesResponsibility
层级模块文件数职责

Part 3: Data Flow Paths

第三部分:数据流路径

For each major user-facing operation:
  1. {Operation name}: {step1_module} → {step2_module} → ... → {result}
    • Evidence:
      {file:SymbolName}
      for each step
针对每个主要用户操作:
  1. {操作名称}:{step1_module} → {step2_module} → ... → {结果}
    • 证据:每个步骤对应
      {file:SymbolName}

Part 4: Mermaid Diagram Elements

第四部分:Mermaid图元素

Provide raw data for Mermaid diagrams:
  • Nodes: {module_name} — {file_path}
  • Edges: {from} → {to} — {relationship_type: imports/calls/extends}
提供Mermaid图的原始数据:
  • 节点:{module_name} — {file_path}
  • 边:{from} → {to} — {关系类型:imports/calls/extends}

Part 5: Module Dependencies (structured)

第五部分:模块依赖(结构化)

For each module:
  • {module_name} (
    {path}
    ): imports [{dep1}, {dep2}, ...]
每个模块需包含:
  • {module_name} (
    {path}
    ): 依赖 [{dep1}, {dep2}, ...]

Part 6: Boundary Violations

第六部分:边界违规

List any cases where a lower layer imports from a higher layer.
列出所有低层模块引用高层模块的情况。

Part 7: Code Quality Patterns

第七部分:代码质量模式

  • Error handling: {strategy and consistency — e.g., "try/catch at controller layer, custom AppError class"}
  • Logging: {framework and coverage — e.g., "winston, structured JSON, covers all API routes"}
  • Testing: {framework, coverage level, patterns — e.g., "vitest, 47 test files, unit + integration"}
  • Type safety: {strict / partial / none — e.g., "strict TypeScript with no
    any
    casts"}
  • 错误处理:{策略及一致性——例如:“控制器层使用try/catch,自定义AppError类”}
  • 日志记录:{框架及覆盖范围——例如:“使用winston,结构化JSON格式,覆盖所有API路由”}
  • 测试:{框架、覆盖水平、模式——例如:“使用vitest,47个测试文件,单元测试+集成测试”}
  • 类型安全:{严格 / 部分 / 无——例如:“严格TypeScript,无
    any
    类型转换”}

Rules

规则

  • Every number must come from actual code
  • No subjective language (no "well-designed", "elegant", "robust", "clean", "優雅", "完美", "強大")
  • Every claim needs a
    file:SymbolName
    reference (NOT line numbers)
  • Focus on HOW data moves, not WHAT the code does
undefined
  • 所有数字必须来自实际代码
  • 禁止使用主观语言(如“设计良好”、“优雅”、“健壮”、“简洁”、“優雅”、“完美”、“強大”等)
  • 所有结论必须带有
    file:SymbolName
    引用(禁止使用行号)
  • 重点关注数据如何流动,而非代码实现的功能
undefined

Agent C: Usage + Deployment + Security

代理C:使用方式 + 部署 + 安全

Task prompt for Agent C — subagent_type: "general-purpose", model: "sonnet"
Task prompt for Agent C — subagent_type: "general-purpose", model: "sonnet"

Mission

任务目标

Document all consumption interfaces, deployment modes, security surface, and AI agent integration points.
记录所有消费接口、部署模式、安全面及AI代理集成点。

Files to Read

需读取的文件

{LIST_OF_ASSIGNED_FILES}
{LIST_OF_ASSIGNED_FILES}

Output Format

输出格式

Part 1: Consumption Interfaces

第一部分:消费接口

For each interface found:
  • Type: {Python SDK / TS SDK / REST API / MCP / CLI / Vercel AI SDK / Library import}
  • Entry point:
    {file_path}:ClassName
    or
    {file_path}:function_name
  • Public surface: {N} exported functions/classes/endpoints
  • Example usage: {minimal code snippet from docs/examples or inferred from exports}
每个接口需包含:
  • 类型:{Python SDK / TS SDK / REST API / MCP / CLI / Vercel AI SDK / 库导入}
  • 入口点
    {file_path}:ClassName
    {file_path}:function_name
  • 公共接口:{N}个导出函数/类/端点
  • 使用示例:{来自文档/示例的最小代码片段,或从导出内容推断}

Part 2: Configuration

第二部分:配置

SourcePathKey Settings
来源路径关键设置

Part 3: Deployment Modes

第三部分:部署模式

ModeEvidencePrerequisites
模式证据前置条件

Part 4: AI Agent Integration

第四部分:AI代理集成

  • MCP tools: {count and names, if any}
  • Function calling schemas: {count, if any}
  • Tool definitions: {count, if any}
  • SDK integration: {Vercel AI SDK / LangChain / LlamaIndex / custom}
  • MCP工具:{数量及名称(若有)}
  • 函数调用 schema:{数量(若有)}
  • 工具定义:{数量(若有)}
  • SDK集成:{Vercel AI SDK / LangChain / LlamaIndex / 自定义}

Part 5: Security Surface

第五部分:安全面

  • API key handling: {how and where}
  • Auth mechanism: {type and file}
  • CORS config: {if applicable}
  • Data at rest: {encrypted / plaintext / N/A}
  • PII handling: {anonymized / logged / none detected}
  • API密钥处理:{方式及位置}
  • 认证机制:{类型及对应文件}
  • CORS配置:{若适用}
  • 静态数据存储:{加密 / 明文 / N/A}
  • PII数据处理:{匿名化 / 日志记录 / 未检测到}

Part 6: Performance & Cost Indicators

第六部分:性能与成本指标

MetricValueSource
{LLM calls per request}{N}
{file:SymbolName}
{Cache strategy}{type}
{file:SymbolName}
{Rate limiting}{config}
{file:SymbolName}
指标数值来源
{每次请求的LLM调用次数}{N}
{file:SymbolName}
{缓存策略}{类型}
{file:SymbolName}
{速率限制}{配置}
{file:SymbolName}

Rules

规则

  • Every number must come from actual code
  • No subjective language (no "well-designed", "elegant", "robust", "clean", "優雅", "完美", "強大")
  • Every claim needs a
    file:SymbolName
    reference (NOT line numbers)
  • Include BOTH documented and undocumented interfaces

---
  • 所有数字必须来自实际代码
  • 禁止使用主观语言(如“设计良好”、“优雅”、“健壮”、“简洁”、“優雅”、“完美”、“強大”等)
  • 所有结论必须带有
    file:SymbolName
    引用(禁止使用行号)
  • 需包含已文档化和未文档化的接口

---

Phase 3: Conditional Section Detection

阶段3:条件模块检测

Read the scanner's
detected_sections
output from Phase 0.2. This is the primary detection source — the scanner checks dependency manifests and file presence automatically.
Cross-reference with subagent reports (skip in direct mode) for additional evidence richness. If a subagent reports a pattern not caught by the scanner (e.g., concurrency via raw
Promise.all
without a library dependency), add it.
Refer to
references/section-detection-rules.md
for the full pattern reference.
Record results as a checklist:
- [x] Storage Layer — scanner detected: prisma in dependencies
- [ ] Embedding Pipeline — not detected
- [x] Infrastructure Layer — scanner detected: Dockerfile present
- [ ] Knowledge Graph — not detected
- [ ] Scalability — not detected
- [x] Concurrency — Agent B reported: Promise.all pattern in src/worker.ts

读取阶段0.2中扫描器输出的
detected_sections
内容,这是主要检测来源——扫描器会自动检查依赖清单和文件存在性。
交叉验证(直接模式下跳过):结合子代理报告补充证据。若子代理报告了扫描器未检测到的模式(例如:未使用库依赖的情况下通过原生
Promise.all
实现并发),则添加该模块。
参考
references/section-detection-rules.md
获取完整模式参考。
将结果记录为检查表格式:
- [x] 存储层 — 扫描器检测到:依赖中包含prisma
- [ ] 嵌入流水线 — 未检测到
- [x] 基础设施层 — 扫描器检测到:存在Dockerfile
- [ ] 知识图谱 — 未检测到
- [ ] 可扩展性 — 未检测到
- [x] 并发处理 — 代理B报告:src/worker.ts中存在Promise.all模式

Phase 4: Synthesis & Draft

阶段4:合成与草稿

4.1 Merge Reports

4.1 合并报告

Subagent mode: Combine all subagent outputs into a working document. Direct mode: Read key files on-demand as you write each section. Do NOT pre-read all files. For each section, read only the files relevant to that section's analysis.
Cross-validate:
  • Core abstractions ↔ Architecture layers: each abstraction belongs to a layer
  • Architecture data flow ↔ Usage interfaces: flows end at documented interfaces
  • Design decisions ↔ Code evidence: decisions are backed by found patterns
子代理模式:将所有子代理的输出合并为一份工作文档。 直接模式:编写每个章节时按需读取关键文件,不要预读所有文件。每个章节仅读取与该章节分析相关的文件。
交叉验证内容:
  • 核心抽象 ↔ 架构层级:每个抽象必须属于某一层级
  • 架构数据流 ↔ 使用接口:数据流必须终止于已文档化的接口
  • 设计决策 ↔ 代码证据:决策必须有对应的代码模式支持

4.2 Generate Mermaid Diagrams + Structured Dependencies

4.2 生成Mermaid图 + 结构化依赖

Using Agent B's raw data (or direct file analysis in direct mode), create:
Architecture Topology (
graph TB
):
  • Each node = actual module/directory
  • Each edge = import/dependency relationship
  • Label edges with relationship type
  • Group nodes by layer using subgraph
Data Flow (
sequenceDiagram
):
  • Each participant = actual module
  • Each arrow = actual function call or event
  • Cover the primary user-facing operation
Structured Module Dependencies (text, below each Mermaid diagram):
  • Provide a machine-parseable dependency list as fallback for LLM readers
  • Format:
    - **{module_name}** (\
    {path}`): imports [{dep1}, {dep2}, ...]`
使用代理B的原始数据(或直接模式下的文件分析结果)生成:
架构拓扑图
graph TB
):
  • 每个节点 = 实际模块/目录
  • 每条边 = 导入/依赖关系
  • 边需标注关系类型
  • 使用subgraph按层级分组节点
数据流图
sequenceDiagram
):
  • 每个参与者 = 实际模块
  • 每个箭头 = 实际函数调用或事件
  • 覆盖主要用户操作流程
结构化模块依赖(Mermaid图下方的文本):
  • 提供机器可解析的依赖列表,作为LLM读取的备选方案
  • 格式:
    - **{module_name}** (\
    {path}`): 依赖 [{dep1}, {dep2}, ...]`

4.3 Fill Output Template

4.3 填充输出模板

Follow
references/output-template.md
exactly. Fill each section:
SectionPrimary SourceSecondary Source
1. Project IdentityScanner metadata + Phase 1Git metadata
2. ArchitectureAgent B (Parts 1-6)Agent A (abstractions per layer)
3. Core AbstractionsAgent A (Part 1)Agent B (layer context)
4. ConditionalPhase 3 detection + relevant agents
5. Usage GuideAgent C (Parts 1-4)Scanner entry_points
6. Performance & CostAgent C (Part 6) + Agent B
7. Security & PrivacyAgent C (Part 5)
8. Design DecisionsAgent A (Part 2)Agent B (architecture context)
8.5 Code Quality & PatternsAgent B (Part 7)Agent A (supporting observations)
9. RecommendationsAgent A (Part 4)Agents B/C (supporting evidence)
严格遵循
references/output-template.md
的格式,填充每个章节:
章节主要数据源次要数据源
1. 项目标识扫描器元数据 + 阶段1Git元数据
2. 架构设计代理B(第1-6部分)代理A(各层级抽象)
3. 核心抽象代理A(第1部分)代理B(层级上下文)
4. 条件模块阶段3检测结果 + 相关代理报告
5. 使用指南代理C(第1-4部分)扫描器入口点数据
6. 性能与成本代理C(第6部分) + 代理B
7. 安全与隐私代理C(第5部分)
8. 设计决策代理A(第2部分)代理B(架构上下文)
8.5 代码质量与模式代理B(第7部分)代理A(辅助观察结果)
9. 改进建议代理A(第4部分)代理B/C(辅助证据)

4.4 Write Output

4.4 写入输出文件

Write the profile to
docs/{project-name}.md
using the Write tool.

使用Write工具将概要文档写入
docs/{project-name}.md

Phase 5: Quality Gate

阶段5:质量检查

Read
references/quality-checklist.md
and verify the output.
读取
references/quality-checklist.md
并验证输出内容。

5.1 Banned Language Scan

5.1 禁用语言扫描

Search the written file for any word from the banned list:
English:
well-designed, elegant, elegantly, robust, clean, impressive,
state-of-the-art, cutting-edge, best-in-class, beautifully,
carefully crafted, thoughtfully, well-thought-out, well-architected,
nicely, cleverly, sophisticated, powerful, seamless, seamlessly,
intuitive, intuitively
Chinese:
優雅、完美、強大、直觀、無縫、精心、巧妙、出色、卓越、先進、高效、靈活、穩健、簡潔
If found → replace with verifiable descriptions and re-write.
在生成的文件中搜索以下禁用词汇:
英文禁用词
well-designed, elegant, elegantly, robust, clean, impressive,
state-of-the-art, cutting-edge, best-in-class, beautifully,
carefully crafted, thoughtfully, well-thought-out, well-architected,
nicely, cleverly, sophisticated, powerful, seamless, seamlessly,
intuitive, intuitively
中文禁用词
優雅、完美、強大、直觀、無縫、精心、巧妙、出色、卓越、先進、高效、靈活、穩健、簡潔
若发现禁用词→替换为可验证的描述并重写内容。

5.2 Number Audit

5.2 数字审计

Scan for all numeric claims. Each must have a traceable source. Remove or fix any "approximately", "around", "roughly", "several", "many", "numerous".
扫描所有数值型结论,每个数值必须有可追溯的来源。 删除或修正所有“大约”、“左右”、“大概”、“若干”、“许多”等模糊表述。

5.3 Structure Verification

5.3 结构验证

  • Every
    ##
    section starts with
    >
    blockquote summary
  • No directory tree duplicated from CODEBASE_MAP.md
  • No file extension enumeration (use percentages)
  • No generic concluding paragraph
  • At least one Mermaid diagram in Architecture section
  • Structured module dependency list below each Mermaid diagram
  • All Mermaid nodes reference actual modules
  • 每个
    ##
    章节开头都有
    >
    块引用摘要
  • 未重复CODEBASE_MAP.md中的目录树内容
  • 未枚举文件扩展名(使用百分比替代)
  • 无通用总结段落
  • 架构章节至少包含一个Mermaid图
  • 每个Mermaid图下方都有结构化模块依赖列表
  • 所有Mermaid节点都引用实际模块

5.4 Core Question Test

5.4 核心问题测试

For each of the 4 core questions, locate the specific answer in the output:
  1. Core abstractions → Section 3
  2. Module to modify → Section 2 Layer Boundaries table
  3. Biggest risk → Section 9 first recommendation
  4. When to use/not use → Section 1 positioning line
针对以下4个核心问题,在输出文件中定位具体答案:
  1. 核心抽象 → 第3章节
  2. 需修改的模块 → 第2章节的层级边界表格
  3. 最大风险 → 第9章节的第一条建议
  4. 适用/不适用场景 → 第1章节的定位描述

5.5 Evidence Audit

5.5 证据审计

  • Section 3: every abstraction has
    file:SymbolName
    reference
  • Section 8: every decision has
    file:SymbolName
    + alternative + tradeoff
  • Section 8.5: code quality patterns have framework names + coverage facts
  • Section 9: every recommendation has
    file_path
    + specific problem + concrete fix
If any check fails → fix the issue in the file and re-verify.

  • 第3章节:每个抽象都有
    file:SymbolName
    引用
  • 第8章节:每个决策都有
    file:SymbolName
    引用+替代方案+权衡结果
  • 第8.5章节:代码质量模式包含框架名称+覆盖事实
  • 第9章节:每个建议都有
    file_path
    +具体问题+可行修复方案
若任何检查不通过→修复文件内容并重新验证。

Output

最终输出

After all phases complete, report to the user:
Profile generated: docs/{project-name}.md
- {total_files} files scanned ({total_tokens} tokens)
- {N} core abstractions identified
- {N} design decisions documented
- {N} recommendations
- Conditional sections: {list of included sections or "none"}
所有阶段完成后,向用户报告:
已生成项目概要文档:docs/{project-name}.md
- 扫描文件总数:{total_files}(总令牌数:{total_tokens})
- 识别核心抽象数量:{N}
- 记录设计决策数量:{N}
- 提出改进建议数量:{N}
- 条件模块:{已包含的模块列表或“无”}