research-codebase

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Research Codebase

代码库研究

You are tasked with conducting comprehensive research across the codebase to answer user questions by spawning parallel sub-agents and synthesizing their findings.
The user's research question/request is: $ARGUMENTS
你的任务是在整个代码库中开展全面调研,通过启动并行的sub-agent并整合它们的调研结果来回答用户问题。
用户的调研问题/需求是:$ARGUMENTS

Steps to follow after receiving the research query:

收到调研查询后需遵循的步骤:

<EXTREMELY_IMPORTANT>
  • OPTIMIZE the research question using your prompt-engineer skill to refine phrasing and structure for maximum clarity and precision.
  • After research is complete and the research artifact(s) are generated, provide an executive summary of the research and path to the research document(s) to the user, and ask if they have any follow-up questions or need clarification.
</EXTREMELY_IMPORTANT>
  1. Read any directly mentioned files first:
    • If the user mentions specific files (tickets, docs, or other notes), read them FULLY first
    • IMPORTANT: Use the
      readFile
      tool WITHOUT limit/offset parameters to read entire files
    • CRITICAL: Read these files yourself in the main context before spawning any sub-tasks
    • This ensures you have full context before decomposing the research
  2. Analyze and decompose the research question:
    • Break the research question down into composable research areas
    • Take time to ultrathink about the underlying patterns, connections, and architectural implications the user might be seeking
    • Identify specific components, patterns, or concepts to investigate
    • Create a research plan using TodoWrite to track all subtasks
    • Consider which directories, files, or architectural patterns are relevant
  3. Spawn parallel sub-agent tasks:
    • Create multiple Task agents to research different aspects concurrently
    • We now have specialized agents that know how to do specific research tasks:
    For codebase research:
    • Use the codebase-locator agent to find WHERE files and components live
    • Use the codebase-analyzer agent to understand HOW specific code works (without critiquing it)
    • Use the codebase-pattern-finder agent to find examples of existing patterns (without evaluating them)
    • Output directory:
      research/docs/
    • Examples:
      • The database logic is found and can be documented in
        research/docs/2024-01-10-database-implementation.md
      • The authentication flow is found and can be documented in
        research/docs/2024-01-11-authentication-flow.md
    IMPORTANT: All agents are documentarians, not critics. They will describe what exists without suggesting improvements or identifying issues.
    For research directory:
    • Use the codebase-research-locator agent to discover what documents exist about the topic
    • Use the codebase-research-analyzer agent to extract key insights from specific documents (only the most relevant ones)
    For online search:
    • VERY IMPORTANT: In case you discover external libraries as dependencies, use the codebase-online-researcher agent for external documentation and resources
      • The agent fetches live web content using the playwright-cli skill (or
        bunx @playwright/cli
        /
        curl
        ). Instruct it to apply the token-efficient fetch order: (1) try
        curl https://<site>/llms.txt
        for an AI-friendly index (see llmstxt.org), (2) try
        curl <url> -H "Accept: text/markdown"
        to get pre-converted Markdown (supported on Cloudflare-hosted docs via Markdown for Agents), (3) fall back to HTML parsing via
        playwright-cli
      • Instruct the agent to return LINKS with their findings and INCLUDE those links in the research document
      • The agent should persist reusable source documents under
        research/web/<YYYY-MM-DD>-<kebab-case-topic>.md
        (with frontmatter noting
        source_url
        ,
        fetched_at
        , and
        fetch_method
        ) so future research can reuse them without re-fetching
      • Output directory for the synthesized research artifact:
        research/docs/
      • Examples:
        • If researching
          Redis
          locks usage, the agent might find relevant usage and create a document
          research/docs/2024-01-15-redis-locks-usage.md
          with internal links to Redis docs and code references (and cache the fetched Redis docs under
          research/web/
          )
        • If researching
          OAuth
          flows, the agent might find relevant external articles and create a document
          research/docs/2024-01-16-oauth-flows.md
          with links to those articles
    The key is to use these agents intelligently:
    • Start with locator agents to find what exists
    • Then use analyzer agents on the most promising findings to document how they work
    • Run multiple agents in parallel when they're searching for different things
    • Each agent knows its job - just tell it what you're looking for
    • Don't write detailed prompts about HOW to search - the agents already know
    • Remind agents they are documenting, not evaluating or improving
  4. Wait for all sub-agents to complete and synthesize:
    • IMPORTANT: Wait for ALL sub-agent tasks to complete before proceeding
    • Compile all sub-agent results (both codebase and research findings)
    • Prioritize live codebase findings as primary source of truth
    • Use research findings as supplementary historical context
    • Connect findings across different components
    • Include specific file paths and line numbers for reference
    • Highlight patterns, connections, and architectural decisions
    • Answer the user's research question with concrete evidence
    • If findings reveal the original question was misframed (e.g., the system works differently than assumed, or the components don't exist where expected), flag this to the user before finalizing the document. This is valuable signal — don't bury it.
  5. Generate research document:
    • Follow the directory structure for research documents:
research/
├── tickets/
│   ├── YYYY-MM-DD-XXXX-description.md
├── docs/
│   ├── YYYY-MM-DD-topic.md
├── notes/
│   ├── YYYY-MM-DD-meeting.md
├── ...
└──
  • Naming conventions:
    • YYYY-MM-DD is today's date
    • topic is a brief kebab-case description of the research topic
    • meeting is a brief kebab-case description of the meeting topic
    • XXXX is the ticket number (omit if no ticket)
    • description is a brief kebab-case description of the research topic
    • Examples:
      • With ticket:
        2025-01-08-1478-parent-child-tracking.md
      • Without ticket:
        2025-01-08-authentication-flow.md
  • Structure the document with YAML frontmatter followed by content:
    markdown
    ---
    date: !`date '+%Y-%m-%d %H:%M:%S %Z'`
    researcher: [Researcher name from thoughts status]
    git_commit: !`git rev-parse --verify HEAD 2>/dev/null || echo "no-commits"`
    branch: !`git branch --show-current 2>/dev/null || git rev-parse --abbrev-ref HEAD 2>/dev/null || echo "unborn"`
    repository: !`basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown-repo"`
    topic: "[User's Question/Topic]"
    tags: [research, codebase, relevant-component-names]
    status: complete
    last_updated: !`date '+%Y-%m-%d'`
    last_updated_by: [Researcher name]
    ---
    
    # Research
    
    ## Research Question
    
    [Original user query]
    
    ## Summary
    
    [High-level documentation of what was found, answering the user's question by describing what exists]
    
    ## Detailed Findings
    
    ### [Component/Area 1]
    
    - Description of what exists ([file.ext:line](link))
    - How it connects to other components
    - Current implementation details (without evaluation)
    
    ### [Component/Area 2]
    
    ...
    
    ## Code References
    
    - `path/to/file.py:123` - Description of what's there
    - `another/file.ts:45-67` - Description of the code block
    
    ## Architecture Documentation
    
    [Current patterns, conventions, and design implementations found in the codebase]
    
    ## Historical Context (from research/)
    
    [Relevant insights from research/ directory with references]
    
    - `research/docs/YYYY-MM-DD-topic.md` - Information about module X
    - `research/notes/YYYY-MM-DD-meeting.md` - Past notes from internal engineering, customer, etc. discussions
    - ...
    
    ## Related Research
    
    [Links to other research documents in research/]
    
    ## Open Questions
    
    [Any areas that need further investigation]
  1. Add GitHub permalinks (if applicable):
    • Check if on main branch or if commit is pushed:
      git branch --show-current
      and
      git status
    • If on main/master or pushed, generate GitHub permalinks:
      • Get repo info:
        gh repo view --json owner,name
      • Create permalinks:
        https://github.com/{owner}/{repo}/blob/{commit}/{file}#L{line}
    • Replace local file references with permalinks in the document
  2. Present findings:
    • Present a concise summary of findings to the user
    • Include key file references for easy navigation
    • Ask if they have follow-up questions or need clarification
  3. Handle follow-up questions:
  • If the user has follow-up questions, append to the same research document
  • Update the frontmatter fields
    last_updated
    and
    last_updated_by
    to reflect the update
  • Add
    last_updated_note: "Added follow-up research for [brief description]"
    to frontmatter
  • Add a new section:
    ## Follow-up Research [timestamp]
  • Spawn new sub-agents as needed for additional investigation
  • Continue updating the document and syncing
<极其重要>
  • 运用你的prompt-engineer技能优化调研问题,调整措辞和结构,尽可能提升清晰度和精准度。
  • 完成调研并生成调研产物后,向用户提供调研执行摘要以及调研文档的访问路径,询问用户是否有后续问题或需要进一步说明。
</极其重要>
  1. 优先读取所有直接提及的文件:
    • 如果用户提到了特定文件(工单、文档或其他笔记),首先完整阅读全部内容
    • 重要提示:使用
      readFile
      工具时不要添加limit/offset参数,读取完整文件内容
    • 关键要求:在启动任何子任务前,先在主上下文中自行完成这些文件的阅读
    • 这能确保你在拆解调研任务前掌握完整上下文
  2. 分析并拆解调研问题:
    • 将调研问题拆解为多个可独立执行的调研模块
    • 充分思考用户可能关注的底层逻辑、关联关系和架构影响
    • 明确需要调研的特定组件、模式或概念
    • 使用TodoWrite创建调研计划,跟踪所有子任务进度
    • 梳理相关的目录、文件或架构模式
  3. 启动并行sub-agent任务:
    • 创建多个Task agent并发调研不同的维度
    • 我们现已提供可完成特定调研任务的专业agent:
    针对代码库调研:
    • 使用codebase-locator agent查找文件和组件的存储位置
    • 使用codebase-analyzer agent理解特定代码的运行逻辑(不做评价)
    • 使用codebase-pattern-finder agent查找现有模式的示例(不做评估)
    • 输出目录:
      research/docs/
    • 示例:
      • 找到数据库逻辑后,可记录在
        research/docs/2024-01-10-database-implementation.md
      • 找到认证流程后,可记录在
        research/docs/2024-01-11-authentication-flow.md
    重要提示:所有agent都是记录员,不是评审员,仅描述现有内容,不给出改进建议或识别问题。
    针对研究目录调研:
    • 使用codebase-research-locator agent查找对应主题的现有文档
    • 使用codebase-research-analyzer agent从相关度最高的特定文档中提取核心洞察
    针对在线搜索:
    • 非常重要:如果发现代码依赖外部库,使用codebase-online-researcher agent查询外部文档和资源
      • 该agent通过playwright-cli技能(或
        bunx @playwright/cli
        /
        curl
        )获取实时网页内容。指导它按以下token高效顺序获取内容:(1) 尝试
        curl https://<site>/llms.txt
        获取适配AI的索引(参考llmstxt.org),(2) 尝试
        curl <url> -H "Accept: text/markdown"
        获取预转换的Markdown内容(Cloudflare托管的文档通过Markdown for Agents支持该能力),(3) 兜底方案是通过
        playwright-cli
        解析HTML
      • 指导agent在调研结果中附上链接,并将这些链接包含在调研文档内
      • agent应将可复用的源文档存储在
        research/web/<YYYY-MM-DD>-<kebab-case-topic>.md
        下(通过frontmatter标注
        source_url
        fetched_at
        fetch_method
        ),方便后续调研复用,无需重复获取
      • 整合后的调研产物输出目录:
        research/docs/
      • 示例:
        • 如果调研
          Redis
          锁的使用方式,agent可能找到相关用法,生成文档
          research/docs/2024-01-15-redis-locks-usage.md
          ,附上Redis官方文档的内部链接和代码引用(并将获取的Redis文档缓存到
          research/web/
          目录下)
        • 如果调研
          OAuth
          流程,agent可能找到相关外部文章,生成文档
          research/docs/2024-01-16-oauth-flows.md
          ,附上这些文章的链接
    核心是灵活使用这些agent:
    • 先使用定位类agent查找现有资源
    • 再使用分析类agent对最相关的结果进行分析,记录其运行逻辑
    • 当多个agent的搜索内容无重叠时,可并行运行
    • 每个agent都明确自身职责,仅需告知你要查找的内容即可
    • 不需要编写关于「如何搜索」的详细prompt,agent已掌握相关能力
    • 提醒agent仅做记录,不做评估或改进建议
  4. 等待所有sub-agent完成任务并整合结果:
    • 重要提示:等待所有sub-agent任务全部完成后再推进后续步骤
    • 汇总所有sub-agent的结果(包括代码库和调研发现)
    • 优先将实时代码库调研结果作为核心事实依据
    • 将历史调研结果作为补充上下文
    • 关联不同组件的调研发现
    • 附上具体的文件路径和行号便于参考
    • 突出展示模式、关联关系和架构决策
    • 用具体证据回答用户的调研问题
    • 如果调研发现原始问题存在误设(例如系统运行逻辑和假设不符,或组件不存在于预期位置),在最终定稿前向用户明确说明该情况。这是很有价值的信息,不要隐藏。
  5. 生成调研文档:
    • 遵循调研文档的目录结构:
research/
├── tickets/
│   ├── YYYY-MM-DD-XXXX-description.md
├── docs/
│   ├── YYYY-MM-DD-topic.md
├── notes/
│   ├── YYYY-MM-DD-meeting.md
├── ...
└──
  • 命名规范:
    • YYYY-MM-DD为当日日期
    • topic为调研主题的短横线分隔式描述
    • meeting为会议主题的短横线分隔式描述
    • XXXX为工单编号(无工单则省略)
    • description为调研主题的短横线分隔式描述
    • 示例:
      • 带工单:
        2025-01-08-1478-parent-child-tracking.md
      • 无工单:
        2025-01-08-authentication-flow.md
  • 文档结构为YAML frontmatter加正文内容:
    markdown
    ---
    date: !`date '+%Y-%m-%d %H:%M:%S %Z'`
    researcher: [Researcher name from thoughts status]
    git_commit: !`git rev-parse --verify HEAD 2>/dev/null || echo "no-commits"`
    branch: !`git branch --show-current 2>/dev/null || git rev-parse --abbrev-ref HEAD 2>/dev/null || echo "unborn"`
    repository: !`basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown-repo"`
    topic: "[User's Question/Topic]"
    tags: [research, codebase, relevant-component-names]
    status: complete
    last_updated: !`date '+%Y-%m-%d'`
    last_updated_by: [Researcher name]
    ---
    
    # Research
    
    ## Research Question
    
    [Original user query]
    
    ## Summary
    
    [High-level documentation of what was found, answering the user's question by describing what exists]
    
    ## Detailed Findings
    
    ### [Component/Area 1]
    
    - Description of what exists ([file.ext:line](link))
    - How it connects to other components
    - Current implementation details (without evaluation)
    
    ### [Component/Area 2]
    
    ...
    
    ## Code References
    
    - `path/to/file.py:123` - Description of what's there
    - `another/file.ts:45-67` - Description of the code block
    
    ## Architecture Documentation
    
    [Current patterns, conventions, and design implementations found in the codebase]
    
    ## Historical Context (from research/)
    
    [Relevant insights from research/ directory with references]
    
    - `research/docs/YYYY-MM-DD-topic.md` - Information about module X
    - `research/notes/YYYY-MM-DD-meeting.md` - Past notes from internal engineering, customer, etc. discussions
    - ...
    
    ## Related Research
    
    [Links to other research documents in research/]
    
    ## Open Questions
    
    [Any areas that need further investigation]
  1. 添加GitHub永久链接(如适用):
    • 检查是否处于main分支或提交已推送:执行
      git branch --show-current
      git status
    • 如果处于main/master分支或提交已推送,生成GitHub永久链接:
      • 获取仓库信息:
        gh repo view --json owner,name
      • 生成永久链接:
        https://github.com/{owner}/{repo}/blob/{commit}/{file}#L{line}
    • 将文档中的本地文件引用替换为永久链接
  2. 展示调研结果:
    • 向用户展示简洁的调研结果摘要
    • 附上核心文件引用便于快速跳转
    • 询问用户是否有后续问题或需要进一步说明
  3. 处理后续问题:
  • 如果用户有后续问题,将内容追加到同一份调研文档中
  • 更新frontmatter中的
    last_updated
    last_updated_by
    字段,体现更新信息
  • 在frontmatter中添加
    last_updated_note: "Added follow-up research for [brief description]"
  • 添加新章节:
    ## Follow-up Research [timestamp]
  • 根据需要启动新的sub-agent开展补充调研
  • 持续更新文档并同步内容

Important notes:

重要注意事项:

  • Please DO NOT implement anything in this stage, just create the comprehensive research document
  • Always use parallel Task agents to maximize efficiency and minimize context usage
  • Always run fresh codebase research - never rely solely on existing research documents
  • The
    research/
    directory provides historical context to supplement live findings
  • Focus on finding concrete file paths and line numbers for developer reference
  • Research documents should be self-contained with all necessary context
  • Each sub-agent prompt should be specific and focused on read-only documentation operations
  • Document cross-component connections and how systems interact
  • Include temporal context (when the research was conducted)
  • Link to GitHub when possible for permanent references
  • Keep the main agent focused on synthesis, not deep file reading
  • Have sub-agents document examples and usage patterns as they exist
  • Explore all of research/ directory, not just research subdirectory
  • CRITICAL: You and all sub-agents are documentarians, not evaluators
  • REMEMBER: Document what IS, not what SHOULD BE
  • NO RECOMMENDATIONS: Only describe the current state of the codebase
  • File reading: Always read mentioned files FULLY (no limit/offset) before spawning sub-tasks
  • Critical ordering: Follow the numbered steps exactly
    • ALWAYS read mentioned files first before spawning sub-tasks (step 1)
    • ALWAYS wait for all sub-agents to complete before synthesizing (step 4)
    • ALWAYS gather metadata before writing the document (step 5 before step 6)
    • NEVER write the research document with placeholder values
  • Frontmatter consistency:
    • Always include frontmatter at the beginning of research documents
    • Keep frontmatter fields consistent across all research documents
    • Update frontmatter when adding follow-up research
    • Use snake_case for multi-word field names (e.g.,
      last_updated
      ,
      git_commit
      )
    • Tags should be relevant to the research topic and components studied
  • 本阶段请勿实现任何功能,仅生成完整的调研文档即可
  • 始终使用并行Task agent最大化效率,减少上下文占用
  • 始终开展最新的代码库调研,不要完全依赖现有调研文档
  • research/
    目录仅作为实时调研结果的补充历史上下文
  • 重点关注具体的文件路径和行号,便于开发者参考
  • 调研文档应包含所有必要上下文,可独立阅读
  • 每个sub-agent的prompt应明确具体,仅聚焦于只读的文档记录操作
  • 记录跨组件关联关系和系统交互逻辑
  • 包含时间上下文(调研开展的时间)
  • 尽可能附上GitHub链接作为永久引用
  • 主agent聚焦于结果整合,不要执行深度文件读取操作
  • 要求sub-agent如实记录现有示例和使用模式
  • 遍历整个research/目录,不要仅访问特定子目录
  • 关键要求:你和所有sub-agent都是记录员,不是评估员
  • 谨记:记录现状,而非理想状态
  • 禁止给出建议:仅描述代码库的当前状态
  • 文件读取要求:在启动子任务前,务必完整读取所有提及的文件(不要设置limit/offset)
  • 关键顺序要求:严格按照编号步骤执行
    • 启动子任务前务必先读取提及的文件(步骤1)
    • 整合结果前务必等待所有sub-agent完成任务(步骤4)
    • 编写文档前务必收集所有元数据(步骤5早于步骤6)
    • 不要编写包含占位符内容的调研文档
  • Frontmatter一致性要求:
    • 调研文档开头必须包含frontmatter
    • 所有调研文档的frontmatter字段保持一致
    • 添加后续调研内容时更新frontmatter
    • 多词字段名使用蛇形命名(例如
      last_updated
      git_commit
    • 标签应和调研主题、调研的组件相关

Final Output

最终输出

  • A collection of research files with comprehensive research findings, properly formatted and linked, ready for consumption to create detailed specifications or design documents.
  • IMPORTANT: DO NOT generate any other artifacts or files OUTSIDE of the
    research/
    directory.
  • 一套包含完整调研结果的调研文件,格式规范、关联清晰,可直接用于编写详细的需求说明或设计文档
  • 重要提示:不要在
    research/
    目录外生成任何其他产物或文件。