harness-step2-fill-docs

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Harness Step 2: 填充 docs/ 知识库内容

Harness Step 2: Fill in docs/ Knowledge Base Content

目标

Goal

通过深度阅读项目代码,将隐藏在代码里的架构知识、命名约定、技术决策, 显式地写入 docs/ 各文件。让 agent 在任何 session 都能快速理解项目全貌。
核心原则:推断出来的内容要标注来源,无法确定的内容标注「待补充」, 不要用模糊的占位符糊弄过去。

By deeply reading the project code, explicitly document the architectural knowledge, naming conventions, and technical decisions hidden in the code into each file in docs/. Enable agents to quickly understand the overall project overview in any session.
Core Principle: Mark the source for inferred content, mark "To be supplemented" for uncertain content, do not use vague placeholders.

执行步骤

Execution Steps

Step 1:深度扫描

Step 1: In-depth Scanning

在写任何文档之前,先充分读懂项目。按顺序执行:
bash
undefined
Before writing any documentation, fully understand the project first. Execute in the following order:
bash
undefined

1. 确认 docs/ 骨架已存在

1. Confirm the docs/ skeleton exists

ls docs/
ls docs/

2. 读懂目录结构(3层)

2. Understand the directory structure (3 levels)

find . -maxdepth 3
-not -path '/node_modules/' -not -path '/.git/'
-not -path '/pycache/' -not -path '/dist/'
-not -path '/.next/' -not -path '/build/' | sort
find . -maxdepth 3
-not -path '/node_modules/' -not -path '/.git/'
-not -path '/pycache/' -not -path '/dist/'
-not -path '/.next/' -not -path '/build/' | sort

3. 读主要入口文件

3. Read main entry files

(根据技术栈判断:main.ts / main.py / app.go / index.js 等)

(Judge based on tech stack: main.ts / main.py / app.go / index.js, etc.)

4. 读模块边界(各主要目录的 index 文件或第一个文件)

4. Read module boundaries (index file or first file of each main directory)

目标:搞清楚每个目录的职责

Goal: Clarify the responsibility of each directory

5. 读依赖声明

5. Read dependency declarations

cat package.json 2>/dev/null || cat pyproject.toml 2>/dev/null ||
cat go.mod 2>/dev/null || cat Cargo.toml 2>/dev/null
cat package.json 2>/dev/null || cat pyproject.toml 2>/dev/null ||
cat go.mod 2>/dev/null || cat Cargo.toml 2>/dev/null

6. 读已有文档(复用,不重复)

6. Read existing documentation (reuse, do not duplicate)

cat README.md 2>/dev/null cat AGENTS.md 2>/dev/null

扫描目标——在写文档前,必须能回答这些问题:
- 这个项目分成哪几个主要模块?每个模块做什么?
- 代码调用链是怎样的?(UI → ? → ? → 数据层)
- 用了哪些主要的库/框架?能推断出选择原因吗?
- 文件命名有什么规律?变量命名有什么规律?
- 什么情况会导致测试失败?验收标准是什么?

---
cat README.md 2>/dev/null cat AGENTS.md 2>/dev/null

Scanning Goals — Before writing documentation, you must be able to answer these questions:
- What are the main modules of this project? What does each module do?
- What is the code call chain like? (UI → ? → ? → Data Layer)
- Which main libraries/frameworks are used? Can you infer the reasons for the choice?
- What are the file naming rules? What are the variable naming rules?
- What situations will cause test failures? What are the acceptance criteria?

---

Step 2:写
docs/ARCHITECTURE.md

Step 2: Write
docs/ARCHITECTURE.md

写什么:模块划分、依赖方向、主要数据流。写"是什么结构"和"为什么这样分",不写具体实现。
⚠️ 强制要求:描述组件/模块关系前,必须验证 import
写任何"A 被 B 使用"、"A 内嵌了 B"、"A 页面包含 C 组件"这类断言之前, 必须用 Grep 确认实际 import,不得根据文件名或目录位置猜测。
bash
undefined
What to write: Module division, dependency directions, main data flow. Write "what the structure is" and "why it is divided this way", not the specific implementation.
⚠️ Mandatory Requirement: Verify imports before describing component/module relationships
Before making any assertions like "A is used by B", "A embeds B", or "A page contains C component", you must use Grep to confirm the actual import. Do not guess based on file names or directory locations.
bash
undefined

验证某组件是否被某页面实际引用

Verify if a component is actually referenced by a page

grep -r "ChatInterface" frontend/src/app/[locale]/book/[bookCode]/ 2>/dev/null
grep -r "ChatInterface" frontend/src/app/[locale]/book/[bookCode]/ 2>/dev/null

验证某组件被哪些文件实际引用

Verify which files actually reference a component

grep -rl "ComponentName" src/ 2>/dev/null

如果 grep 无结果,说明没有引用关系——即使组件在同一目录下也不能断言它被使用。
未经验证的关系统一标注「待验证:未找到 import,请人工确认」。

**格式模板**:

```markdown
grep -rl "ComponentName" src/ 2>/dev/null

If Grep returns no results, there is no reference relationship — even if the components are in the same directory, you cannot assert they are used. All unverified relationships must be marked as "To be verified: No import found, please confirm manually".

**Format Template**:

```markdown

架构说明

Architecture Description

整体结构

Overall Structure

[用文字描述整体分层,再用目录树辅助说明]
[目录树,只到关键层级,不要穷举所有文件]
[Describe the overall layering in text, then use a directory tree for illustration]
[Directory tree, only to key levels, do not list all files]

依赖方向规则

Dependency Direction Rules

[用箭头图或列表说明哪层可以引用哪层]
关键约束:
  • [约束1,说明原因]
  • [约束2,说明原因]
[Use arrow diagrams or lists to explain which layers can reference which layers]
Key Constraints:
  • [Constraint 1, explain the reason]
  • [Constraint 2, explain the reason]

主要数据流

Main Data Flow

[描述最核心的 1-2 条请求/数据流,从入口到数据库]
[Describe the 1-2 core request/data flows, from entry to database]

待补充

To be Supplemented

  • [扫描时无法确定的内容]

**写作要求**:
- 依赖规则要具体,不要写"保持清晰的分层"这种废话
- 每条约束附上原因("不要在 UI 层调 DB,因为……")
- 无法从代码推断的内容,明确标注「待补充:需人工确认」

---
  • [Content that cannot be determined during scanning]

**Writing Requirements**:
- Dependency rules must be specific, do not write vague statements like "maintain clear layering"
- Attach reasons for each constraint (e.g., "Do not call the DB from the UI layer because...")
- Clearly mark content that cannot be inferred from code as "To be supplemented: Need manual confirmation"

---

Step 3:写
docs/CONVENTIONS.md

Step 3: Write
docs/CONVENTIONS.md

写什么:从代码里归纳出来的命名规律和文件组织规律。
扫描方法
bash
undefined
What to write: Naming rules and file organization rules summarized from the code.
Scanning Method:
bash
undefined

看文件命名规律

Check file naming rules

find src -name ".ts" -o -name ".py" -o -name "*.go" 2>/dev/null | head -30
find src -name ".ts" -o -name ".py" -o -name "*.go" 2>/dev/null | head -30

看函数/变量命名(随机抽几个文件)

Check function/variable naming (randomly select a few files)

head -50 [主要源文件路径]

**格式模板**:

```markdown
head -50 [main source file path]

**Format Template**:

```markdown

代码约定

Code Conventions

文件命名

File Naming

  • [规律1]:示例
    XxxYyy.tsx
  • [规律2]:示例
    xxx-yyy.ts
  • [Rule 1]: Example
    XxxYyy.tsx
  • [Rule 2]: Example
    xxx-yyy.ts

变量和函数命名

Variable and Function Naming

  • 变量/函数:[规律 + 示例]
  • 类/组件:[规律 + 示例]
  • 常量:[规律 + 示例]
  • Variables/Functions: [Rule + Example]
  • Classes/Components: [Rule + Example]
  • Constants: [Rule + Example]

目录组织

Directory Organization

[每个主要目录放什么类型的文件]
[What types of files are placed in each main directory]

Git Commit 格式

Git Commit Format

[从 git log 里归纳,或写推荐格式]
type(scope): 描述
type 可选:feat / fix / docs / refactor / test
[Summarize from git log, or write recommended format]
type(scope): description
Optional types: feat / fix / docs / refactor / test

待补充

To be Supplemented

  • [无法从代码推断的约定]

**写作要求**:
- 每条规律附上从代码中观察到的实例
- 如果代码本身命名不一致,如实写出来并标注「当前不一致,建议统一为……」
- 不要发明项目里没有的约定

---
  • [Conventions that cannot be inferred from code]

**Writing Requirements**:
- Attach code examples observed from the project for each rule
- If there are inconsistencies in code naming, record them truthfully and mark "Currently inconsistent, it is recommended to unify as..."
- Do not invent conventions that do not exist in the project

---

Step 4:写
docs/TECH_DECISIONS.md

Step 4: Write
docs/TECH_DECISIONS.md

写什么:技术选型的原因。这是最难写的一份,因为原因往往不在代码里。
扫描方法
bash
undefined
What to write: Reasons for technology selection. This is the most difficult document to write because the reasons are often not in the code.
Scanning Method:
bash
undefined

看所有直接依赖

Check all direct dependencies

cat package.json | grep '"dependencies"' -A 50 2>/dev/null
cat package.json | grep '"dependencies"' -A 50 2>/dev/null

Or

cat pyproject.toml | grep -A 30 '[tool.poetry.dependencies]' 2>/dev/null

**格式模板**:

```markdown
cat pyproject.toml | grep -A 30 '[tool.poetry.dependencies]' 2>/dev/null

**Format Template**:

```markdown

技术决策记录

Technical Decision Records

[框架/库名]

[Framework/Library Name]

用途:[这个库/框架用来做什么]
选择原因:[能推断出的原因,或标注「待补充」]
替代方案:[如果明显有替代品,列出并说明为何不选]
注意事项:[使用时需要特别注意的地方]
Purpose: [What this library/framework is used for]
Reasons for Selection: [Inferable reasons, or mark "To be supplemented"]
Alternative Solutions: [List obvious alternatives and explain why they were not chosen]
Notes: [Points that need special attention during use]

待补充

To be Supplemented

  • [无法从代码推断选型原因的库,需要人工说明]

**写作要求**:
- 只写主要的框架和库,不要把每个工具依赖都列一遍
- 能推断的写推断,推断不了的明确标「待补充:原始选型原因不明,请补充」
- 不要凭空捏造选型理由

---
  • [Libraries whose selection reasons cannot be inferred from code, need manual explanation]

**Writing Requirements**:
- Only write major frameworks and libraries, do not list every tool dependency
- Write inferences where possible, clearly mark "To be supplemented: Original selection reason unknown, please supplement" for un inferable content
- Do not fabricate selection reasons out of thin air

---

Step 5:写
docs/QUALITY.md

Step 5: Write
docs/QUALITY.md

写什么:什么叫"完成",以及代码审查的检查清单。
扫描方法
bash
undefined
What to write: What "done" means, and the code review checklist.
Scanning Method:
bash
undefined

看测试文件的模式

Check test file patterns

find . -name ".test." -o -name ".spec." -o -name "_test." 2>/dev/null | head -10
find . -name ".test." -o -name ".spec." -o -name "_test." 2>/dev/null | head -10

看 CI 配置(如果有)

Check CI configuration (if available)

cat .github/workflows/*.yml 2>/dev/null | head -60

**格式模板**:

```markdown
cat .github/workflows/*.yml 2>/dev/null | head -60

**Format Template**:

```markdown

质量标准

Quality Standards

Definition of Done(完成的定义)

Definition of Done

一个任务算完成,必须满足:
  • 功能在本地运行正常
  • 写了对应测试(覆盖正常路径 + 至少一个异常路径)
  • [根据项目实际情况补充,如:类型检查通过、lint 无报错]
  • git commit 信息清晰
  • 如修改了架构或约定,docs/ 已同步更新
A task is considered complete only if it meets:
  • Functions run normally locally
  • Corresponding tests are written (covering normal paths + at least one exception path)
  • [Supplement based on actual project situation, e.g., type checking passed, no lint errors]
  • Clear git commit message
  • If architecture or conventions are modified, docs/ has been updated synchronously

代码审查检查清单

Code Review Checklist

正确性
  • [项目特有的正确性检查,如:多租户隔离、权限验证]
可维护性
  • 命名是否符合 CONVENTIONS.md?
  • 有无重复代码可提取?
  • 业务逻辑是否在正确的层?(见 ARCHITECTURE.md)
Correctness
  • [Project-specific correctness checks, e.g., multi-tenant isolation, permission verification]
Maintainability
  • Does the naming conform to CONVENTIONS.md?
  • Is there duplicate code that can be extracted?
  • Is the business logic in the correct layer? (See ARCHITECTURE.md)

测试要求

Testing Requirements

[从现有测试文件归纳出的测试约定,或写推荐标准]
[Testing conventions summarized from existing test files, or write recommended standards]

待补充

To be Supplemented

  • [无法从代码推断的验收标准]

---
  • [Acceptance criteria that cannot be inferred from code]

---

Step 6:写
docs/exec-plans/tech-debt-tracker.md

Step 6: Write
docs/exec-plans/tech-debt-tracker.md

写什么:扫描过程中发现的潜在问题和技术债务。
格式
markdown
undefined
What to write: Potential issues and technical debts discovered during scanning.
Format:
markdown
undefined

技术债务追踪

Technical Debt Tracker

每条格式:
[优先级: 高/中/低] 问题描述 — 影响范围
Each entry format:
[Priority: High/Medium/Low] Issue Description — Scope of Impact

当前债务

Current Debts

[扫描时发现的问题,诚实地写]
[Issues discovered during scanning, record truthfully]

已解决

Resolved

(空)

**判断债务的线索**:
- 重复代码(同样的逻辑在多个地方出现)
- 命名不一致(同一概念有多种叫法)
- TODO / FIXME 注释
- 过于庞大的文件(超过 300 行)
- 没有测试的核心模块

---
(Empty)

**Clues for Identifying Debts**:
- Duplicate code (same logic appearing in multiple places)
- Inconsistent naming (multiple names for the same concept)
- TODO / FIXME comments
- Overly large files (more than 300 lines)
- Core modules without tests

---

质量检验

Quality Inspection

每个文件写完后,逐一自检:
  • 有没有"待补充"的地方?→ 整理成清单告知用户
  • 有没有凭空捏造的内容?→ 删掉,换成「待补充」
  • ARCHITECTURE.md 的依赖规则是否具体可执行?
  • CONVENTIONS.md 的规律是否有代码实例支撑?
  • QUALITY.md 的 DoD 是否包含项目特有的检查项?

After writing each file, perform self-checks one by one:
  • Are there any "To be supplemented" sections? → Compile a list and inform the user
  • Is there any fabricated content? → Delete it and replace with "To be supplemented"
  • Are the dependency rules in ARCHITECTURE.md specific and executable?
  • Are the rules in CONVENTIONS.md supported by code examples?
  • Does the DoD in QUALITY.md include project-specific check items?

完成后告知用户

Notify User After Completion

输出摘要,包含三部分:
已写入的内容:列出每个文件写了什么
需要人工确认的「待补充」清单: 汇总所有文件里标注了「待补充」的条目,这是用户最需要关注的部分
下一步
  • 人工补充「待补充」的内容后,这份知识库就可以投入使用
  • 之后运行
    harness-step3-state-management
    skill,建立跨 session 的状态管理(progress 文件 + tasks.json)
Output a summary including three parts:
Content Written: List what was written in each file
"To be Supplemented" List Needing Manual Confirmation: Summarize all entries marked "To be supplemented" in all files — this is the part the user needs to focus on most
Next Steps:
  • After manually supplementing the "To be supplemented" content, this knowledge base can be put into use
  • Then run the
    harness-step3-state-management
    skill to establish cross-session state management (progress file + tasks.json)