acquire-codebase-knowledge

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Acquire Codebase Knowledge

获取代码库知识

Produces seven populated documents in
docs/codebase/
covering everything needed to work effectively on the project. Only document what is verifiable from files or terminal output — never infer or assume.
会在
docs/codebase/
目录下生成七个已填充内容的文档,涵盖在项目中高效工作所需的全部信息。仅记录可从文件或终端输出中验证的内容——绝不进行推断或假设。

Output Contract (Required)

输出契约(必填)

Before finishing, all of the following must be true:
  1. Exactly these files exist in
    docs/codebase/
    :
    STACK.md
    ,
    STRUCTURE.md
    ,
    ARCHITECTURE.md
    ,
    CONVENTIONS.md
    ,
    INTEGRATIONS.md
    ,
    TESTING.md
    ,
    CONCERNS.md
    .
  2. Every claim is traceable to source files, config, or terminal output.
  3. Unknowns are marked as
    [TODO]
    ; intent-dependent decisions are marked
    [ASK USER]
    .
  4. Every document includes a short "evidence" list with concrete file paths.
  5. Final response includes numbered
    [ASK USER]
    questions and intent-vs-reality divergences.
完成前,必须满足以下所有条件:
  1. docs/codebase/
    目录下必须恰好存在这些文件:
    STACK.md
    STRUCTURE.md
    ARCHITECTURE.md
    CONVENTIONS.md
    INTEGRATIONS.md
    TESTING.md
    CONCERNS.md
  2. 所有声明均可追溯至源文件、配置或终端输出。
  3. 未知内容标记为
    [TODO]
    ;依赖意图的决策标记为
    [ASK USER]
  4. 每个文档都包含一个简短的“证据”列表,附带具体的文件路径。
  5. 最终回复需包含编号的
    [ASK USER]
    问题,以及意图与实际情况的差异点。

Workflow

工作流

Copy and track this checklist:
- [ ] Phase 1: Run scan, read intent documents
- [ ] Phase 2: Investigate each documentation area
- [ ] Phase 3: Populate all seven docs in docs/codebase/
- [ ] Phase 4: Validate docs, present findings, resolve all [ASK USER] items
复制并跟踪以下检查清单:
- [ ] 阶段1:运行扫描,读取意图文档
- [ ] 阶段2:调研每个文档领域
- [ ] 阶段3:填充`docs/codebase/`下的所有七个文档
- [ ] 阶段4:验证文档,呈现发现结果,解决所有[ASK USER]项

Focus Area Mode

聚焦领域模式

If the user supplies a focus area (for example: "architecture only" or "testing and concerns"):
  1. Always run Phase 1 in full.
  2. Fully complete focus-area documents first.
  3. For non-focus documents not yet analyzed, keep required sections present and mark unknowns as
    [TODO]
    .
  4. Still run the Phase 4 validation loop on all seven documents before final output.
如果用户指定了聚焦领域(例如:“仅架构”或“测试与关注点”):
  1. 始终完整运行阶段1。
  2. 优先完成聚焦领域的文档。
  3. 对于未分析的非聚焦文档,保留必填部分并将未知内容标记为
    [TODO]
  4. 在最终输出前,仍需对所有七个文档运行阶段4的验证循环。

Phase 1: Scan and Read Intent

阶段1:扫描与读取意图

  1. Run the scan script from the target project root:
    bash
    python3 "$SKILL_ROOT/scripts/scan.py" --output docs/codebase/.codebase-scan.txt
    Where
    $SKILL_ROOT
    is the absolute path to the skill folder. Works on Windows, macOS, and Linux.
    Quick start: If you have the path inline:
    bash
    python3 /absolute/path/to/skills/acquire-codebase-knowledge/scripts/scan.py --output docs/codebase/.codebase-scan.txt
  2. Search for
    PRD
    ,
    TRD
    ,
    README
    ,
    ROADMAP
    ,
    SPEC
    ,
    DESIGN
    files and read them.
  3. Summarise the stated project intent before reading any source code.
  1. 在目标项目根目录运行扫描脚本:
    bash
    python3 "$SKILL_ROOT/scripts/scan.py" --output docs/codebase/.codebase-scan.txt
    其中
    $SKILL_ROOT
    是该Skill文件夹的绝对路径。适用于Windows、macOS和Linux系统。
    快速启动: 如果已内联路径:
    bash
    python3 /absolute/path/to/skills/acquire-codebase-knowledge/scripts/scan.py --output docs/codebase/.codebase-scan.txt
  2. 搜索
    PRD
    TRD
    README
    ROADMAP
    SPEC
    DESIGN
    文件并阅读。
  3. 在阅读任何源代码前,总结项目的既定意图。

Phase 2: Investigate

阶段2:调研

Use the scan output to answer questions for each of the seven templates. Load
references/inquiry-checkpoints.md
for the full per-template question list.
If the stack is ambiguous (multiple manifest files, unfamiliar file types, no
package.json
), load
references/stack-detection.md
.
使用扫描输出来回答七个模板对应的问题。加载
references/inquiry-checkpoints.md
获取每个模板的完整问题列表。
如果技术栈不明确(存在多个清单文件、不熟悉的文件类型、无
package.json
),加载
references/stack-detection.md

Phase 3: Populate Templates

阶段3:填充模板

Copy each template from
assets/templates/
into
docs/codebase/
. Fill in this order:
  1. STACK.md — language, runtime, frameworks, all dependencies
  2. STRUCTURE.md — directory layout, entry points, key files
  3. ARCHITECTURE.md — layers, patterns, data flow
  4. CONVENTIONS.md — naming, formatting, error handling, imports
  5. INTEGRATIONS.md — external APIs, databases, auth, monitoring
  6. TESTING.md — frameworks, file organization, mocking strategy
  7. CONCERNS.md — tech debt, bugs, security risks, perf bottlenecks
Use
[TODO]
for anything that cannot be determined from code. Use
[ASK USER]
where the right answer requires team intent.
assets/templates/
中的每个模板复制到
docs/codebase/
目录下。按以下顺序填充:
  1. STACK.md — 语言、运行时、框架、所有依赖项
  2. STRUCTURE.md — 目录结构、入口点、关键文件
  3. ARCHITECTURE.md — 分层、模式、数据流
  4. CONVENTIONS.md — 命名规范、格式、错误处理、导入规则
  5. INTEGRATIONS.md — 外部API、数据库、认证、监控
  6. TESTING.md — 测试框架、文件组织、Mock策略
  7. CONCERNS.md — 技术债务、Bug、安全风险、性能瓶颈
对于无法从代码中确定的内容,使用
[TODO]
标记。对于需要团队意图才能确定的内容,使用
[ASK USER]
标记。

Phase 4: Validate, Repair, Verify

阶段4:验证、修复、确认

Run this mandatory validation loop before finalizing:
  1. Validate each doc against
    references/inquiry-checkpoints.md
    .
  2. For each non-trivial claim, confirm at least one evidence reference exists.
  3. If any required section is missing or unsupported:
  • Fix the document.
  • Re-run validation.
  1. Repeat until all seven docs pass.
Then present a summary of all seven documents, list every
[ASK USER]
item as a numbered question, and highlight any Intent vs. Reality divergences from Phase 1.
Validation pass criteria:
  • No unsupported claims.
  • No empty required sections.
  • Unknowns use
    [TODO]
    rather than assumptions.
  • Team-intent gaps are explicitly marked
    [ASK USER]
    .

在最终确定前,必须运行以下验证循环:
  1. 根据
    references/inquiry-checkpoints.md
    验证每个文档。
  2. 对于每个非琐碎声明,确认至少存在一个证据引用。
  3. 如果任何必填部分缺失或无依据:
  • 修复文档。
  • 重新运行验证。
  1. 重复此过程直到所有七个文档通过验证。
然后呈现所有七个文档的摘要,将每个
[ASK USER]
项列为编号问题,并突出显示阶段1中发现的“意图与实际情况”差异点。
验证通过标准:
  • 无无依据的声明。
  • 无不完整的必填部分。
  • 未知内容使用
    [TODO]
    而非假设。
  • 团队意图缺口明确标记为
    [ASK USER]

Gotchas

注意事项

Monorepos: Root
package.json
may have no source — check for
workspaces
,
packages/
, or
apps/
directories. Each workspace may have independent dependencies and conventions. Map each sub-package separately.
Outdated README: README often describes intended architecture, not the current one. Cross-reference with actual file structure before treating any README claim as fact.
TypeScript path aliases:
tsconfig.json
paths
config means imports like
@/foo
don't map directly to the filesystem. Map aliases to real paths before documenting structure.
Generated/compiled output: Never document patterns from
dist/
,
build/
,
generated/
,
.next/
,
out/
, or
__pycache__/
. These are artefacts — document source conventions only.
.env.example
reveals required config:
Secrets are never committed. Read
.env.example
,
.env.template
, or
.env.sample
to discover required environment variables.
devDependencies
≠ production stack:
Only
dependencies
(or equivalent, e.g.
[tool.poetry.dependencies]
) runs in production. Document linters, formatters, and test frameworks separately as dev tooling.
Test TODOs ≠ production debt: TODOs inside
test/
,
tests/
,
__tests__/
, or
spec/
are coverage gaps, not production technical debt. Separate them in
CONCERNS.md
.
High-churn files = fragile areas: Files appearing most in recent git history have the highest modification rate and likely hidden complexity. Always note them in
CONCERNS.md
.

单体仓库(Monorepos): 根目录的
package.json
可能无源代码——检查是否存在
workspaces
packages/
apps/
目录。每个工作区可能有独立的依赖项和规范。需分别映射每个子包。
过时的README: README通常描述的是预期架构,而非当前实际架构。在将README中的任何声明视为事实前,需与实际文件结构交叉验证。
TypeScript路径别名:
tsconfig.json
中的
paths
配置意味着类似
@/foo
的导入不会直接映射到文件系统。在记录结构前,需将别名映射到真实路径。
生成/编译输出: 绝不记录
dist/
build/
generated/
.next/
out/
__pycache__/
目录中的模式。这些是生成产物——仅记录源代码的规范。
.env.example
揭示必填配置:
机密信息绝不会提交到仓库。阅读
.env.example
.env.template
.env.sample
以发现所需的环境变量。
devDependencies
≠ 生产技术栈:
只有
dependencies
(或等效项,如
[tool.poetry.dependencies]
)会在生产环境运行。需将代码检查工具、格式化工具和测试框架作为开发工具单独记录。
测试中的TODO ≠ 生产债务:
test/
tests/
__tests__/
spec/
目录中的TODO是测试覆盖率缺口,而非生产环境的技术债务。需在
CONCERNS.md
中单独区分。
高变动文件 = 脆弱区域: 在最近Git历史中出现最频繁的文件修改率最高,可能存在隐藏的复杂性。需在
CONCERNS.md
中始终标记这些文件。

Anti-Patterns

反模式

❌ Don't✅ Do instead
"Uses Clean Architecture with Domain/Data layers." (when no such directories exist)State only what directory structure actually shows.
"This is a Next.js project." (without checking
package.json
)
Check
dependencies
first. State what's actually there.
Guess the database from a variable name like
dbUrl
Check manifest for
pg
,
mysql2
,
mongoose
,
prisma
, etc.
Document
dist/
or
build/
naming patterns as conventions
Source files only.

❌ 请勿✅ 正确做法
“采用了Clean Architecture,包含领域/数据层。”(当不存在此类目录时)仅陈述目录结构实际显示的内容。
“这是一个Next.js项目。”(未检查
package.json
先检查
dependencies
。仅陈述实际存在的内容。
dbUrl
这类变量名猜测数据库类型
检查清单文件中的
pg
mysql2
mongoose
prisma
等依赖。
dist/
build/
目录中的命名模式记录为规范
仅记录源代码的规范。

Enhanced Scan Output Sections

增强的扫描输出部分

The
scan.py
script now produce the following sections in addition to the original output:
  • CODE METRICS — Total files, lines of code by language, largest files (complexity signals)
  • CI/CD PIPELINES — Detected GitHub Actions, GitLab CI, Jenkins, CircleCI, etc.
  • CONTAINERS & ORCHESTRATION — Docker, Docker Compose, Kubernetes, Vagrant configs
  • SECURITY & COMPLIANCE — Snyk, Dependabot, SECURITY.md, SBOM, security policies
  • PERFORMANCE & TESTING — Benchmark configs, profiling markers, load testing tools
Use these sections during Phase 2 to inform investigation questions and identify tool-specific patterns.

scan.py
脚本现在除了原始输出外,还会生成以下部分:
  • CODE METRICS — 总文件数、按语言统计的代码行数、最大文件(复杂度信号)
  • CI/CD PIPELINES — 检测到的GitHub Actions、GitLab CI、Jenkins、CircleCI等
  • CONTAINERS & ORCHESTRATION — Docker、Docker Compose、Kubernetes、Vagrant配置
  • SECURITY & COMPLIANCE — Snyk、Dependabot、SECURITY.md、SBOM、安全策略
  • PERFORMANCE & TESTING — 基准测试配置、性能分析标记、负载测试工具
在阶段2中使用这些部分来指导调研问题,并识别特定工具的模式。

Bundled Assets

捆绑资源

AssetWhen to load
scripts/scan.py
Phase 1 — run first, before reading any code (Python 3.8+ required)
|
references/inquiry-checkpoints.md
| Phase 2 — load for per-template investigation questions | |
references/stack-detection.md
| Phase 2 — only if stack is ambiguous | |
assets/templates/STACK.md
| Phase 3 step 1 | |
assets/templates/STRUCTURE.md
| Phase 3 step 2 | |
assets/templates/ARCHITECTURE.md
| Phase 3 step 3 | |
assets/templates/CONVENTIONS.md
| Phase 3 step 4 | |
assets/templates/INTEGRATIONS.md
| Phase 3 step 5 | |
assets/templates/TESTING.md
| Phase 3 step 6 | |
assets/templates/CONCERNS.md
| Phase 3 step 7 |
Template usage mode:
  • Default mode: complete only the "Core Sections (Required)" in each template.
  • Extended mode: add optional sections only when the repo complexity justifies them.
资源加载时机
scripts/scan.py
阶段1 — 先运行此脚本,再阅读任何代码(需要Python 3.8+)
|
references/inquiry-checkpoints.md
| 阶段2 — 加载以获取每个模板的调研问题 | |
references/stack-detection.md
| 阶段2 — 仅当技术栈不明确时加载 | |
assets/templates/STACK.md
| 阶段3 步骤1 | |
assets/templates/STRUCTURE.md
| 阶段3 步骤2 | |
assets/templates/ARCHITECTURE.md
| 阶段3 步骤3 | |
assets/templates/CONVENTIONS.md
| 阶段3 步骤4 | |
assets/templates/INTEGRATIONS.md
| 阶段3 步骤5 | |
assets/templates/TESTING.md
| 阶段3 步骤6 | |
assets/templates/CONCERNS.md
| 阶段3 步骤7 |
模板使用模式:
  • 默认模式:仅完成每个模板中的“核心部分(必填)”。
  • 扩展模式:仅当仓库复杂性需要时,才添加可选部分。