acquire-codebase-knowledge

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Acquire Codebase Knowledge

获取代码库知识

Produces seven populated documents in

docs/codebase/

covering everything needed to work effectively on the project. Only document what is verifiable from files or terminal output — never infer or assume.

会在

docs/codebase/

目录下生成七个已填充内容的文档，涵盖在项目中高效工作所需的全部信息。仅记录可从文件或终端输出中验证的内容——绝不进行推断或假设。

Output Contract (Required)

输出契约（必填）

Before finishing, all of the following must be true:

Exactly these files exist in

docs/codebase/

STACK.md

STRUCTURE.md

ARCHITECTURE.md

CONVENTIONS.md

INTEGRATIONS.md

TESTING.md

CONCERNS.md

Every claim is traceable to source files, config, or terminal output.
Unknowns are marked as
```
[TODO]
```
; intent-dependent decisions are marked
```
[ASK USER]
```
.
Every document includes a short "evidence" list with concrete file paths.
Final response includes numbered
```
[ASK USER]
```
questions and intent-vs-reality divergences.

完成前，必须满足以下所有条件：

docs/codebase/

目录下必须恰好存在这些文件：

STACK.md

、

STRUCTURE.md

、

ARCHITECTURE.md

、

CONVENTIONS.md

、

INTEGRATIONS.md

、

TESTING.md

、

CONCERNS.md

。

所有声明均可追溯至源文件、配置或终端输出。
未知内容标记为
```
[TODO]
```
；依赖意图的决策标记为
```
[ASK USER]
```
。
每个文档都包含一个简短的“证据”列表，附带具体的文件路径。
最终回复需包含编号的
```
[ASK USER]
```
问题，以及意图与实际情况的差异点。

Workflow

工作流

Copy and track this checklist:

- [ ] Phase 1: Run scan, read intent documents
- [ ] Phase 2: Investigate each documentation area
- [ ] Phase 3: Populate all seven docs in docs/codebase/
- [ ] Phase 4: Validate docs, present findings, resolve all [ASK USER] items

复制并跟踪以下检查清单：

- [ ] 阶段1：运行扫描，读取意图文档
- [ ] 阶段2：调研每个文档领域
- [ ] 阶段3：填充`docs/codebase/`下的所有七个文档
- [ ] 阶段4：验证文档，呈现发现结果，解决所有[ASK USER]项

Focus Area Mode

聚焦领域模式

If the user supplies a focus area (for example: "architecture only" or "testing and concerns"):

Always run Phase 1 in full.
Fully complete focus-area documents first.
For non-focus documents not yet analyzed, keep required sections present and mark unknowns as
```
[TODO]
```
.
Still run the Phase 4 validation loop on all seven documents before final output.

如果用户指定了聚焦领域（例如：“仅架构”或“测试与关注点”）：

始终完整运行阶段1。
优先完成聚焦领域的文档。
对于未分析的非聚焦文档，保留必填部分并将未知内容标记为
```
[TODO]
```
。
在最终输出前，仍需对所有七个文档运行阶段4的验证循环。

Phase 1: Scan and Read Intent

阶段1：扫描与读取意图

Run the scan script from the target project root:

bash

python3 "$SKILL_ROOT/scripts/scan.py" --output docs/codebase/.codebase-scan.txt

Where

$SKILL_ROOT

is the absolute path to the skill folder. Works on Windows, macOS, and Linux.

Quick start: If you have the path inline:

bash

python3 /absolute/path/to/skills/acquire-codebase-knowledge/scripts/scan.py --output docs/codebase/.codebase-scan.txt

Search for
```
PRD
```
,
```
TRD
```
,
```
README
```
,
```
ROADMAP
```
,
```
SPEC
```
,
```
DESIGN
```
files and read them.
Summarise the stated project intent before reading any source code.

在目标项目根目录运行扫描脚本：

bash

python3 "$SKILL_ROOT/scripts/scan.py" --output docs/codebase/.codebase-scan.txt

其中

$SKILL_ROOT

是该Skill文件夹的绝对路径。适用于Windows、macOS和Linux系统。

快速启动： 如果已内联路径：

bash

python3 /absolute/path/to/skills/acquire-codebase-knowledge/scripts/scan.py --output docs/codebase/.codebase-scan.txt

搜索
```
PRD
```
、
```
TRD
```
、
```
README
```
、
```
ROADMAP
```
、
```
SPEC
```
、
```
DESIGN
```
文件并阅读。
在阅读任何源代码前，总结项目的既定意图。

Phase 2: Investigate

阶段2：调研

Use the scan output to answer questions for each of the seven templates. Load

references/inquiry-checkpoints.md

for the full per-template question list.

If the stack is ambiguous (multiple manifest files, unfamiliar file types, no

package.json

), load

references/stack-detection.md

使用扫描输出来回答七个模板对应的问题。加载

references/inquiry-checkpoints.md

获取每个模板的完整问题列表。

如果技术栈不明确（存在多个清单文件、不熟悉的文件类型、无

package.json

），加载

references/stack-detection.md

。

Phase 3: Populate Templates

阶段3：填充模板

Copy each template from

assets/templates/

into

docs/codebase/

. Fill in this order:

STACK.md — language, runtime, frameworks, all dependencies
STRUCTURE.md — directory layout, entry points, key files
ARCHITECTURE.md — layers, patterns, data flow
CONVENTIONS.md — naming, formatting, error handling, imports
INTEGRATIONS.md — external APIs, databases, auth, monitoring
TESTING.md — frameworks, file organization, mocking strategy
CONCERNS.md — tech debt, bugs, security risks, perf bottlenecks

Use

[TODO]

for anything that cannot be determined from code. Use

[ASK USER]

where the right answer requires team intent.

将

assets/templates/

中的每个模板复制到

docs/codebase/

目录下。按以下顺序填充：

STACK.md — 语言、运行时、框架、所有依赖项
STRUCTURE.md — 目录结构、入口点、关键文件
ARCHITECTURE.md — 分层、模式、数据流
CONVENTIONS.md — 命名规范、格式、错误处理、导入规则
INTEGRATIONS.md — 外部API、数据库、认证、监控
TESTING.md — 测试框架、文件组织、Mock策略
CONCERNS.md — 技术债务、Bug、安全风险、性能瓶颈

对于无法从代码中确定的内容，使用

[TODO]

标记。对于需要团队意图才能确定的内容，使用

[ASK USER]

标记。

Phase 4: Validate, Repair, Verify

阶段4：验证、修复、确认

Run this mandatory validation loop before finalizing:

Validate each doc against
```
references/inquiry-checkpoints.md
```
.
For each non-trivial claim, confirm at least one evidence reference exists.
If any required section is missing or unsupported:

Fix the document.
Re-run validation.

Repeat until all seven docs pass.

Then present a summary of all seven documents, list every

[ASK USER]

item as a numbered question, and highlight any Intent vs. Reality divergences from Phase 1.

Validation pass criteria:

No unsupported claims.
No empty required sections.
Unknowns use
```
[TODO]
```
rather than assumptions.
Team-intent gaps are explicitly marked
```
[ASK USER]
```
.

在最终确定前，必须运行以下验证循环：

根据
```
references/inquiry-checkpoints.md
```
验证每个文档。
对于每个非琐碎声明，确认至少存在一个证据引用。
如果任何必填部分缺失或无依据：

修复文档。
重新运行验证。

重复此过程直到所有七个文档通过验证。

然后呈现所有七个文档的摘要，将每个

[ASK USER]

项列为编号问题，并突出显示阶段1中发现的“意图与实际情况”差异点。

验证通过标准：

无无依据的声明。
无不完整的必填部分。
未知内容使用
```
[TODO]
```
而非假设。
团队意图缺口明确标记为
```
[ASK USER]
```
。

Gotchas

注意事项

Monorepos: Root

package.json

may have no source — check for

workspaces

packages/

, or

apps/

directories. Each workspace may have independent dependencies and conventions. Map each sub-package separately.

Outdated README: README often describes intended architecture, not the current one. Cross-reference with actual file structure before treating any README claim as fact.

TypeScript path aliases:

tsconfig.json

paths

config means imports like

@/foo

don't map directly to the filesystem. Map aliases to real paths before documenting structure.

Generated/compiled output: Never document patterns from

dist/

build/

generated/

.next/

out/

, or

__pycache__/

. These are artefacts — document source conventions only.

.env.example
reveals required config: Secrets are never committed. Read

.env.example

.env.template

, or

.env.sample

to discover required environment variables.

devDependencies
≠ production stack: Only

dependencies

(or equivalent, e.g.

[tool.poetry.dependencies]

) runs in production. Document linters, formatters, and test frameworks separately as dev tooling.

Test TODOs ≠ production debt: TODOs inside

test/

tests/

__tests__/

, or

spec/

are coverage gaps, not production technical debt. Separate them in

CONCERNS.md

High-churn files = fragile areas: Files appearing most in recent git history have the highest modification rate and likely hidden complexity. Always note them in

CONCERNS.md

单体仓库（Monorepos）： 根目录的

package.json

可能无源代码——检查是否存在

workspaces

、

packages/

或

apps/

目录。每个工作区可能有独立的依赖项和规范。需分别映射每个子包。

过时的README： README通常描述的是预期架构，而非当前实际架构。在将README中的任何声明视为事实前，需与实际文件结构交叉验证。

TypeScript路径别名：

tsconfig.json

中的

paths

配置意味着类似

@/foo

的导入不会直接映射到文件系统。在记录结构前，需将别名映射到真实路径。

生成/编译输出： 绝不记录

dist/

、

build/

、

generated/

、

.next/

、

out/

或

__pycache__/

目录中的模式。这些是生成产物——仅记录源代码的规范。

.env.example
揭示必填配置：机密信息绝不会提交到仓库。阅读

.env.example

、

.env.template

或

.env.sample

以发现所需的环境变量。

devDependencies
≠ 生产技术栈：只有

dependencies

（或等效项，如

[tool.poetry.dependencies]

）会在生产环境运行。需将代码检查工具、格式化工具和测试框架作为开发工具单独记录。

测试中的TODO ≠ 生产债务：

test/

、

tests/

、

__tests__/

或

spec/

目录中的TODO是测试覆盖率缺口，而非生产环境的技术债务。需在

CONCERNS.md

中单独区分。

高变动文件 = 脆弱区域： 在最近Git历史中出现最频繁的文件修改率最高，可能存在隐藏的复杂性。需在

CONCERNS.md

中始终标记这些文件。

Anti-Patterns

反模式

❌ Don't	✅ Do instead
"Uses Clean Architecture with Domain/Data layers." (when no such directories exist)	State only what directory structure actually shows.
"This is a Next.js project." (without checking `package.json` )	Check `dependencies` first. State what's actually there.
Guess the database from a variable name like `dbUrl`	Check manifest for `pg` , `mysql2` , `mongoose` , `prisma` , etc.
Document `dist/` or `build/` naming patterns as conventions	Source files only.

❌ 请勿	✅ 正确做法
“采用了Clean Architecture，包含领域/数据层。”（当不存在此类目录时）	仅陈述目录结构实际显示的内容。
“这是一个Next.js项目。”（未检查 `package.json` ）	先检查 `dependencies` 。仅陈述实际存在的内容。
从 `dbUrl` 这类变量名猜测数据库类型	检查清单文件中的 `pg` 、 `mysql2` 、 `mongoose` 、 `prisma` 等依赖。
将 `dist/` 或 `build/` 目录中的命名模式记录为规范	仅记录源代码的规范。

Enhanced Scan Output Sections

增强的扫描输出部分

The

scan.py

script now produce the following sections in addition to the original output:

CODE METRICS — Total files, lines of code by language, largest files (complexity signals)
CI/CD PIPELINES — Detected GitHub Actions, GitLab CI, Jenkins, CircleCI, etc.
CONTAINERS & ORCHESTRATION — Docker, Docker Compose, Kubernetes, Vagrant configs
SECURITY & COMPLIANCE — Snyk, Dependabot, SECURITY.md, SBOM, security policies
PERFORMANCE & TESTING — Benchmark configs, profiling markers, load testing tools

Use these sections during Phase 2 to inform investigation questions and identify tool-specific patterns.

scan.py

脚本现在除了原始输出外，还会生成以下部分：

CODE METRICS — 总文件数、按语言统计的代码行数、最大文件（复杂度信号）
CI/CD PIPELINES — 检测到的GitHub Actions、GitLab CI、Jenkins、CircleCI等
CONTAINERS & ORCHESTRATION — Docker、Docker Compose、Kubernetes、Vagrant配置
SECURITY & COMPLIANCE — Snyk、Dependabot、SECURITY.md、SBOM、安全策略
PERFORMANCE & TESTING — 基准测试配置、性能分析标记、负载测试工具

在阶段2中使用这些部分来指导调研问题，并识别特定工具的模式。

Bundled Assets

捆绑资源

Asset	When to load
`scripts/scan.py`	Phase 1 — run first, before reading any code (Python 3.8+ required)

references/inquiry-checkpoints.md

| Phase 2 — load for per-template investigation questions | |

references/stack-detection.md

| Phase 2 — only if stack is ambiguous | |

assets/templates/STACK.md

| Phase 3 step 1 | |

assets/templates/STRUCTURE.md

| Phase 3 step 2 | |

assets/templates/ARCHITECTURE.md

| Phase 3 step 3 | |

assets/templates/CONVENTIONS.md

| Phase 3 step 4 | |

assets/templates/INTEGRATIONS.md

| Phase 3 step 5 | |

assets/templates/TESTING.md

| Phase 3 step 6 | |

assets/templates/CONCERNS.md

| Phase 3 step 7 |

Template usage mode:

Default mode: complete only the "Core Sections (Required)" in each template.
Extended mode: add optional sections only when the repo complexity justifies them.

资源	加载时机
`scripts/scan.py`	阶段1 — 先运行此脚本，再阅读任何代码（需要Python 3.8+）

references/inquiry-checkpoints.md

| 阶段2 — 加载以获取每个模板的调研问题 | |

references/stack-detection.md

| 阶段2 — 仅当技术栈不明确时加载 | |

assets/templates/STACK.md

| 阶段3 步骤1 | |

assets/templates/STRUCTURE.md

| 阶段3 步骤2 | |

assets/templates/ARCHITECTURE.md

| 阶段3 步骤3 | |

assets/templates/CONVENTIONS.md

| 阶段3 步骤4 | |

assets/templates/INTEGRATIONS.md

| 阶段3 步骤5 | |

assets/templates/TESTING.md

| 阶段3 步骤6 | |

assets/templates/CONCERNS.md

| 阶段3 步骤7 |

模板使用模式：

默认模式：仅完成每个模板中的“核心部分（必填）”。
扩展模式：仅当仓库复杂性需要时，才添加可选部分。