explore-codebase

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Explore Codebase

探索代码库

You are building a mental map of an unfamiliar or large codebase region. The goal is structural comprehension — understanding what the code does, how it's organized, and what the key entry points and invariants are — without loading everything into your main context window.
This skill exists because codebases are too large to read in one prompt. The pattern (Recursive Language Model — see the RLM substrate section in CLAUDE.md) is: keep your root context small, dispatch sub-agents to read individual files or modules, and have each sub-agent return a structured summary. Synthesize the summaries into a coherent map.
This skill covers any codebase the agent encounters — forks, dependencies, unfamiliar regions, user projects. For analyzing yoyo's own source to find bugs and gaps, use
self-assess
instead.
你正在为不熟悉或大型的代码库区域构建心智图。目标是实现结构化理解——了解代码的功能、组织方式,以及关键入口点和不变量——无需将所有内容加载到主上下文窗口中。
本技能的存在是因为代码库过大,无法通过单次提示完成阅读。该模式(递归语言模型——详见CLAUDE.md中的RLM基础部分)为:保持根上下文简洁,调度子代理读取单个文件或模块,让每个子代理返回结构化摘要。将这些摘要合成为连贯的心智图。
本技能适用于代理遇到的任意代码库——分支、依赖项、不熟悉的区域、用户项目。若要分析yoyo自身源码以查找bug和缺陷,请使用
self-assess
技能。

When to use

使用场景

Trigger this skill when ANY of these hold:
  • A planned task touches >5 files you haven't recently worked on
  • A community issue references a feature or module you don't have a mental map for
  • You're investigating a bug whose surface spans multiple modules you haven't read recently
  • A new dependency is being introduced and you want to know its public API + key invariants before integrating
  • A user explicitly asks you to explore, understand, or map a codebase
  • /add
    brought in a large project and you need structural context before acting
  • You're working in a fork and need to understand what the fork changed relative to upstream
当满足以下任一条件时触发本技能:
  • 计划任务涉及5个以上你近期未处理过的文件
  • 社区问题提及你尚未建立心智图的功能或模块
  • 你正在调查的bug影响范围涉及多个你近期未阅读过的模块
  • 引入新依赖项时,你希望在集成前了解其公开API和关键不变量
  • 用户明确要求你探索、理解或绘制代码库的心智图
  • /add
    命令导入了大型项目,你需要先获取结构化上下文再开展操作
  • 你在分支中工作,需要了解分支相对于上游的变更内容

When NOT to use

不适用场景

  • Small known regions. A single file ≤300 lines, or a function you wrote yesterday — just read it directly. Sub-agent overhead exceeds the savings.
  • Precise edits across files. Refactoring needs mutual context (seeing all pieces at once), not summaries. Summaries lose the fidelity you need for surgical edits.
  • Sequential workflows with strong mutual context. When each step depends on the full output of the previous step, fan-out doesn't help — you need serial reading.
  • The region is already known. If you or the user can name the exact module and its API from memory, direct read is faster. Don't explore what you already understand.
  • You're inside a sub-agent at depth 3. Stop. Return what you have. Do not dispatch further.
  • 小型已知区域:单个文件≤300行,或你昨天刚编写的函数——直接读取即可。子代理的开销超过了节省的时间。
  • 跨文件精确编辑:重构需要相互关联的上下文(同时查看所有内容),而非摘要。摘要会丢失精准编辑所需的细节。
  • 强关联上下文的串行工作流:当每个步骤依赖于上一步的完整输出时,并行调度无济于事——你需要串行读取。
  • 已熟悉的区域:如果你或用户能凭记忆说出确切的模块及其API,直接读取更快。不要探索你已经理解的内容。
  • 处于深度为3的子代理中:停止操作。返回已获取的内容。不要继续调度。

Procedure

操作流程

1. Identify the region

1. 确定探索区域

Define the exploration scope — be specific:
  • A directory:
    src/format/
  • A file glob:
    src/agent_builder.rs src/main.rs src/tools.rs
  • A dependency:
    ~/.cargo/registry/src/*/yoagent-*/src/
  • A commit range:
    git diff main..feature-branch --name-only
If the scope is vague ("understand this project"), start with orientation (Step 2). If the scope is precise (a named set of files), skip to Step 3.
明确探索范围——需具体:
  • 目录:
    src/format/
  • 文件通配符:
    src/agent_builder.rs src/main.rs src/tools.rs
  • 依赖项:
    ~/.cargo/registry/src/*/yoagent-*/src/
  • 提交范围:
    git diff main..feature-branch --name-only
如果范围模糊(如“理解这个项目”),先执行定位步骤(步骤2)。如果范围明确(指定的文件集合),直接跳至步骤3。

2. Orient — build a rough map

2. 定位——构建粗略心智图

Before dispatching sub-agents, gather cheap structural signals directly:
bash
undefined
调度子代理前,直接收集低成本的结构化信号:
bash
undefined

Project structure

项目结构

find <root> -type f -name '*.rs' | head -50
find <root> -type f -name '*.rs' | head -50

or use /map if available in the REPL

或在REPL中使用/map命令(若可用)

README / docs

README / 文档

cat <root>/README.md 2>/dev/null | head -100
cat <root>/README.md 2>/dev/null | head -100

File sizes (to plan dispatch)

文件大小(用于规划调度)

wc -l <root>/src/*.rs | sort -rn | head -20
wc -l <root>/src/*.rs | sort -rn | head -20

Recent activity

近期活动

git log --oneline -20 -- <root>/src/

From this, build a **file inventory** with rough sizes. Files >300 lines are candidates for sub-agent exploration. Files ≤300 lines can be read directly if needed later.
git log --oneline -20 -- <root>/src/

基于上述信息,构建包含大致文件大小的**文件清单**。大于300行的文件适合由子代理探索。≤300行的文件可在后续需要时直接读取。

3. Decide: direct read or sub-agent?

3. 决策:直接读取还是使用子代理?

For each file in the region:
  • ≤5KB (roughly ≤150 lines): Read directly with
    read_file
    or
    bash
    . No sub-agent needed.
  • >5KB: Dispatch a sub-agent (Step 4). Don't load large files into your main context.
If the total region is small enough to read directly (≤5 files, all ≤5KB), skip sub-agents entirely — just read and synthesize in your main context.
针对区域内的每个文件:
  • ≤5KB(约≤150行):使用
    read_file
    或bash直接读取。无需子代理。
  • >5KB:调度子代理(步骤4)。不要将大文件加载到主上下文。
如果整个区域足够小,可以直接读取(≤5个文件,均≤5KB),则完全跳过子代理——直接读取并在主上下文中合成结果。

4. Dispatch per-file sub-agents

4. 按文件调度子代理

Store each file's content in shared state, then dispatch a sub-agent to summarize it. One file per sub-agent — files are the natural unit of structure in code.
将每个文件的内容存储到共享状态,然后调度子代理生成摘要。每个文件对应一个子代理——文件是代码中天然的结构单元。

4a. Store the artifact

4a. 存储工件

shared_state set key="explore.<region>.<filename>" value="<file contents>"
Namespace convention:
explore.<region>.<filename>
(e.g.,
explore.format.markdown
,
explore.yoagent.agent
).
If a single file exceeds 30KB (~120,000 bytes), chunk it before storing:
  • Split into chunks of ~80KB with 8KB overlap between consecutive chunks
  • Store as
    explore.<region>.<filename>.chunk-1
    ,
    .chunk-2
    , etc.
  • Dispatch one sub-agent per chunk (same as analyze-trajectory's Section 3.5)
shared_state set key="explore.<region>.<filename>" value="<file contents>"
命名空间约定:
explore.<region>.<filename>
(例如:
explore.format.markdown
,
explore.yoagent.agent
)。
如果单个文件超过30KB(约120,000字节),存储前先分块:
  • 分割为约80KB的块,连续块之间保留8KB重叠
  • 存储为
    explore.<region>.<filename>.chunk-1
    ,
    .chunk-2
  • 为每个块调度一个子代理(与analyze-trajectory的第3.5节相同)

4b. Dispatch the sub-agent

4b. 调度子代理

sub_agent: You are exploring a source file to build a structural summary.

The file is stored in shared state under key "explore.<region>.<filename>".
Read it with: shared_state get key="explore.<region>.<filename>"

Describe this file's structure in a JSON response. Reply with ONLY a JSON object (no markdown fences, no prose):
{
  "file": "<filename>",
  "purpose": "1 sentence: what this file does",
  "public_api": ["list of exported functions/structs/traits with 1-line descriptions"],
  "key_invariants": ["non-obvious behaviors, constraints, or assumptions"],
  "dependencies": ["other modules/crates this file depends on"],
  "dependents": ["who calls into this file, if visible from imports/use statements"],
  "complexity": "low|medium|high",
  "deeper_question": "a follow-up question if something is unclear, or null"
}
Skills do not chain. Sub-agents don't load this skill or any other; include the full question and shared-state key reference directly in the sub-agent's prompt.
sub_agent: You are exploring a source file to build a structural summary.

The file is stored in shared state under key "explore.<region>.<filename>".
Read it with: shared_state get key="explore.<region>.<filename>"

Describe this file's structure in a JSON response. Reply with ONLY a JSON object (no markdown fences, no prose):
{
  "file": "<filename>",
  "purpose": "1 sentence: what this file does",
  "public_api": ["list of exported functions/structs/traits with 1-line descriptions"],
  "key_invariants": ["non-obvious behaviors, constraints, or assumptions"],
  "dependencies": ["other modules/crates this file depends on"],
  "dependents": ["who calls into this file, if visible from imports/use statements"],
  "complexity": "low|medium|high",
  "deeper_question": "a follow-up question if something is unclear, or null"
}
技能不连锁。子代理不会加载本技能或其他技能;需将完整问题和共享状态键引用直接包含在子代理的提示中。

4c. Handle sub-agent responses

4c. 处理子代理响应

Parse each sub-agent's response as JSON:
  1. Valid JSON with all fields: Store the summary in shared state under
    explore.<region>.<filename>.summary
    for the synthesis step.
  2. Malformed JSON but readable text: Extract what you can. Construct a partial summary:
    {"file": "<filename>", "purpose": "<first 200 chars of response>", "public_api": [], "key_invariants": [], "dependencies": [], "dependents": [], "complexity": "unknown", "deeper_question": null}
    .
  3. Empty or errored: Fall back to direct read of the file's first and last 50 lines. Produce a low-confidence summary manually.
将每个子代理的响应解析为JSON:
  1. 包含所有字段的有效JSON:将摘要存储到共享状态的
    explore.<region>.<filename>.summary
    中,用于后续合成步骤。
  2. 格式错误但可读的文本:提取可用信息。构建部分摘要:
    {"file": "<filename>", "purpose": "<响应的前200个字符>", "public_api": [], "key_invariants": [], "dependencies": [], "dependents": [], "complexity": "unknown", "deeper_question": null}
  3. 空响应或错误响应:回退到直接读取文件的前50行和后50行。手动生成低置信度的摘要。

5. Recurse on deeper questions

5. 针对深层问题递归处理

If a sub-agent returns a non-null
deeper_question
and
complexity
is
"high"
:
  1. Dispatch another sub-agent with the narrower question, referencing the same shared-state key.
  2. Merge the answer into the existing summary.
Hard cap: recursion depth = 3. That's: initial dispatch → 1st recursion → 2nd recursion. After depth 3, accept whatever you have. If you find yourself wanting depth 4, your initial scope was probably too broad — narrow the region and retry.
如果子代理返回非空的
deeper_question
complexity
"high"
  1. 调度另一个子代理处理更具体的问题,引用相同的共享状态键。
  2. 将答案合并到现有摘要中。
硬限制:递归深度=3。即:初始调度→第一次递归→第二次递归。达到深度3后,接受已获取的内容。如果需要深度4,说明初始范围可能过宽——缩小区域后重试。

6. Synthesize into a mental map

6. 合成为心智图

After all per-file summaries are collected, dispatch a synthesis sub-agent (or do this in your main context if the total summary data is small enough, ≤5KB):
sub_agent: You are synthesizing per-file summaries into a structural map of a codebase region.

The following shared-state keys contain per-file summaries:
- explore.<region>.<file1>.summary
- explore.<region>.<file2>.summary
...

Read each summary, then produce a structural map as a JSON object:
{
  "region": "<region name>",
  "overview": "2-3 sentences: what this region does as a whole",
  "module_graph": ["<file-A> -> <file-B>: <relationship>", ...],
  "entry_points": ["the key functions/structs a caller would use"],
  "invariants": ["cross-file constraints or assumptions"],
  "risk_areas": ["files or interactions that look fragile or complex"],
  "open_questions": ["things the summaries couldn't resolve"]
}
收集完所有按文件生成的摘要后,调度合成子代理(如果总摘要数据足够小,≤5KB,也可在主上下文中完成):
sub_agent: You are synthesizing per-file summaries into a structural map of a codebase region.

The following shared-state keys contain per-file summaries:
- explore.<region>.<file1>.summary
- explore.<region>.<file2>.summary
...

Read each summary, then produce a structural map as a JSON object:
{
  "region": "<region name>",
  "overview": "2-3 sentences: what this region does as a whole",
  "module_graph": ["<file-A> -> <file-B>: <relationship>", ...],
  "entry_points": ["the key functions/structs a caller would use"],
  "invariants": ["cross-file constraints or assumptions"],
  "risk_areas": ["files or interactions that look fragile or complex"],
  "open_questions": ["things the summaries couldn't resolve"]
}

7. Use the map

7. 使用心智图

The mental map is your working context for the rest of the session. Reference it when:
  • Planning which files to modify for a task
  • Estimating the blast radius of a change
  • Deciding whether a refactor is safe
  • Explaining code structure to a user or in a journal entry
Store the final map in shared state under
explore.<region>.map
so sub-agents in later steps can reference it without re-exploring.
心智图将作为你当前会话的工作上下文。在以下场景中参考它:
  • 规划任务中需要修改的文件
  • 评估变更的影响范围
  • 判断重构是否安全
  • 向用户解释代码结构或撰写日志条目
将最终心智图存储到共享状态的
explore.<region>.map
中,以便后续步骤中的子代理无需重新探索即可引用。

Pitfalls

注意事项

  • Don't explore what you already know. If you wrote the code recently or have it in active memory, skip this skill. It's for building new understanding, not confirming existing knowledge.
  • Don't ask sub-agents to make decisions. They summarize structure; you decide what to do with it. Sub-agents that plan or recommend tend to drift.
  • Don't dump multiple files to one sub-agent. One file per dispatch keeps the JSON output reliable and the summary focused. The synthesis step is where cross-file reasoning happens.
  • Don't forget the recursion cap. 3 is the hard limit. If your region needs depth 4, the region is too broad — split it.
  • Don't explore before acting on small tasks. If the task is "fix this one function," reading that function directly is faster than exploring the whole module. Match the tool to the task size.
  • Don't re-explore within the same session. If you've already explored a region, the summaries are in shared state. Read them with
    shared_state get
    instead of re-dispatching sub-agents.
  • Per-file, not per-byte. Unlike analyze-trajectory (which chunks CI logs by byte offset), this skill fans out by file. Files are the natural structural unit in codebases. Only chunk within a file if it exceeds 30KB.
  • 不要探索已知内容:如果你近期编写过代码或对其有清晰记忆,跳过本技能。它用于构建的理解,而非确认已有知识。
  • 不要让子代理做决策:它们仅总结结构;由你决定如何使用这些信息。做规划或推荐的子代理容易偏离方向。
  • 不要将多个文件交给一个子代理:每个文件单独调度可保证JSON输出可靠且摘要聚焦。跨文件推理在合成步骤中完成。
  • 不要忘记递归限制:3是硬上限。如果区域需要深度4,说明区域过宽——拆分后重试。
  • 小型任务不要先探索:如果任务是“修复这个函数”,直接读取该函数比探索整个模块更快。根据任务大小选择合适的工具。
  • 同一会话内不要重复探索:如果你已探索过某个区域,摘要已存储在共享状态中。使用
    shared_state get
    读取即可,无需重新调度子代理。
  • 按文件而非按字节处理:与analyze-trajectory(按字节偏移拆分CI日志)不同,本技能按文件并行调度。文件是代码库中天然的结构单元。仅当文件超过30KB时才在文件内部分块。

Verification

验证标准

An exploration is "good enough" when ALL of:
  • The map names concrete files and functions (not "some module that handles X")
  • Each file in the region has a summary (even if low-confidence for some)
  • The module graph shows how files relate (who calls whom, who depends on whom)
  • Entry points are identified — a caller knows where to start
  • The total exploration used ≤ N+2 sub-agent dispatches where N is the number of files explored (N per-file + 1 synthesis + 1 possible recursion)
  • The work stayed within the depth-3 recursion cap
If the map fails any of these, narrow the region and re-explore the gap, or accept the partial result and document open questions.
当满足以下所有条件时,探索结果即为“足够好”:
  • 心智图标注了具体的文件和函数(而非“某个处理X的模块”)
  • 区域内的每个文件都有摘要(即使部分摘要置信度较低)
  • 模块图展示了文件间的关系(谁调用谁,谁依赖谁)
  • 已识别入口点——调用者知道从何处开始
  • 总探索使用的子代理调度次数**≤ N+2**,其中N是探索的文件数量(N个按文件调度 +1个合成 +1次可能的递归)
  • 操作未超过深度3的递归限制
如果心智图未满足上述任一条件,缩小区域并重新探索缺口,或接受部分结果并记录未解决的问题。

What this skill deliberately does NOT do

本技能刻意不做的事

  • Does not modify code. Exploration produces understanding, not changes. The actual edits are a separate task.
  • Does not find bugs. That's
    self-assess
    . This skill builds the map; self-assess uses the map to find problems.
  • Does not auto-create documentation. If the map is worth preserving as docs, that's a separate decision outside this skill's scope.
  • Does not write to the audit-log branch. The exploration results live in shared state for the current session only.
  • 不修改代码:探索仅产生理解,不做变更。实际编辑是独立任务。
  • 不查找bug:那是
    self-assess
    的功能。本技能构建心智图;
    self-assess
    使用心智图查找问题。
  • 不自动创建文档:如果心智图值得保存为文档,那是本技能范围外的独立决策。
  • 不写入audit-log分支:探索结果仅存储在当前会话的共享状态中。