research-deep

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Research Deep — Batch Item Research

深度调研 — 批量条目调研

Read a research outline (
outline.yaml
+
fields.yaml
) produced by
/research
, then research each item in parallel batches, producing one structured JSON file per item and a final consolidated markdown report.
读取由/research生成的调研大纲(
outline.yaml
+
fields.yaml
),然后按并行批次调研每个条目,为每个条目生成一份结构化JSON文件,最终输出合并后的markdown报告。

Variables

变量

VariableSourceDescription
{topic}
outline.yaml
The
topic
field
{outline_dir}
DiscoveredDirectory containing
outline.yaml
and
fields.yaml
{output_dir}
outline.yaml
execution.output_dir
(default:
./results
) resolved relative to
{outline_dir}
{batch_size}
outline.yaml
execution.batch_size
— max parallel agents per batch
{items_per_agent}
outline.yaml
execution.items_per_agent
— items assigned to each agent
{fields_path}
DerivedAbsolute path to
{outline_dir}/fields.yaml
{item_name}
Per itemThe item's
name
field from
outline.yaml
{item_slug}
DerivedSlugified item name: lowercase, spaces to underscores, strip non-alphanumeric except underscores, collapse consecutive underscores. E.g. "GitHub Copilot" becomes
github_copilot
{output_path}
Derived
{output_dir}/{item_slug}.json
变量来源描述
{topic}
outline.yaml
topic
字段
{outline_dir}
自动发现包含
outline.yaml
fields.yaml
的目录
{output_dir}
outline.yaml
execution.output_dir
(默认值:
./results
),相对于
{outline_dir}
解析的路径
{batch_size}
outline.yaml
execution.batch_size
— 每批次最多并行运行的Agent数量
{items_per_agent}
outline.yaml
execution.items_per_agent
— 分配给单个Agent的条目数量
{fields_path}
推导生成
{outline_dir}/fields.yaml
的绝对路径
{item_name}
单条目属性来自
outline.yaml
的条目的
name
字段
{item_slug}
推导生成经过slug化处理的条目名称:转为小写,空格替换为下划线,移除下划线以外的非字母数字字符,合并连续下划线。例如“GitHub Copilot”会变成
github_copilot
{output_path}
推导生成
{output_dir}/{item_slug}.json

Step 1: Locate Outline

步骤1:定位大纲

Search for
*/outline.yaml
in the current working directory.
  • If exactly one is found: read it along with the sibling
    fields.yaml
    . Store the containing directory as
    {outline_dir}
    .
  • If multiple are found: list them and ask the user which to use.
  • If none found: tell the user to run
    /research
    first and stop.
Read both files. Extract the items list and execution config. Report to the user:
  • Topic:
    {topic}
  • Items count: N items
  • Batch config:
    {batch_size}
    parallel agents,
    {items_per_agent}
    items each
  • Output directory:
    {output_dir}
在当前工作目录中搜索
*/outline.yaml
文件。
  • 如果恰好找到1个:读取该文件以及同级目录下的
    fields.yaml
    ,将文件所在目录存储为
    {outline_dir}
  • 如果找到多个:列出所有文件,询问用户要使用哪一个。
  • 如果未找到:告知用户需要先运行/research,然后终止执行。
读取两个文件,提取条目列表和执行配置,向用户反馈以下信息:
  • 主题:
    {topic}
  • 条目总数:N条
  • 批次配置:
    {batch_size}
    个并行Agent,每个Agent处理
    {items_per_agent}
    个条目
  • 输出目录:
    {output_dir}

Step 2: Resume Check

步骤2:恢复检查

Check
{output_dir}
for existing
.json
files.
For each existing JSON file:
  1. Parse the filename back to an item name (reverse the slug:
    github_copilot.json
    -> match against items list)
  2. Run the validation script to check completeness:
    bash
    python3 scripts/validate_json.py -f {fields_path} -j {output_path}
  3. If validation passes (exit code 0): mark the item as completed — skip it
  4. If validation fails (exit code 1): mark the item as incomplete — include it in the run
Report resume status to the user:
  • "Found N/{total} completed items. Resuming with {remaining} items."
  • List the completed items so the user can verify
If all items are already completed, report this and stop.
检查
{output_dir}
中已存在的
.json
文件。
对每个已存在的JSON文件:
  1. 将文件名反向解析为条目名称(反转slug规则:
    github_copilot.json
    -> 匹配条目列表中的对应项)
  2. 运行验证脚本检查完整性:
    bash
    python3 scripts/validate_json.py -f {fields_path} -j {output_path}
  3. 如果验证通过(退出码为0):将该条目标记为已完成 — 跳过调研
  4. 如果验证失败(退出码为1):将该条目标记为未完成 — 纳入本次运行的调研范围
向用户反馈恢复状态:
  • “已找到N/{total}个已完成条目,将从剩余{remaining}个条目继续执行。”
  • 列出已完成的条目,方便用户核对
如果所有条目均已完成,反馈该信息后终止执行。

Step 3: Batch Execution

步骤3:批量执行

Partition the remaining items into batches:
  • Each agent handles up to
    {items_per_agent}
    items
  • Launch up to
    {batch_size}
    agents in parallel per batch
Before launching each batch, show the user which items are in this batch and ask for approval:
  • "Batch {N}/{total_batches}: items [list]. Launch?"
For each agent, build the prompt from the template below. Preserve the structure and goals; only substitute the
{variables}
.
Read
references/web-search-guide.md
for search methodology guidance to include in the agent context.
Sub-agent prompt template:
undefined
将剩余条目拆分为多个批次:
  • 每个Agent最多处理
    {items_per_agent}
    个条目
  • 每批次最多并行启动
    {batch_size}
    个Agent
启动每个批次前,向用户展示本批次包含的条目并请求确认:
  • “批次{N}/{total_batches}:包含条目[列表]。是否启动?”
为每个Agent基于以下模板构建提示词,保留结构和目标不变,仅替换
{变量}
占位符。
读取
references/web-search-guide.md
获取搜索方法指引,将其纳入Agent上下文。
子Agent提示词模板:
undefined

Task

Task

Research the following item(s) and output structured JSON.
Topic: {topic}
Research the following item(s) and output structured JSON.
Topic: {topic}

Items to Research

Items to Research

{for each item assigned to this agent:}
  • name: {item_name} description: {item_description} {end for}
{for each item assigned to this agent:}
  • name: {item_name} description: {item_description} {end for}

Field Definitions

Field Definitions

Read the field definitions file to understand what data to collect for each item: {fields_path}
Use all field categories and fields defined in that file. Each item gets its own JSON object with every field populated.
Read the field definitions file to understand what data to collect for each item: {fields_path}
Use all field categories and fields defined in that file. Each item gets its own JSON object with every field populated.

Research Instructions

Research Instructions

  • Search for authoritative, current information on each item
  • Use 2-3 search query variations per item
  • Prefer official sources (project websites, documentation, release announcements)
  • Cross-reference claims across multiple sources when possible
  • Note publication dates — flag anything older than 12 months
  • Search for authoritative, current information on each item
  • Use 2-3 search query variations per item
  • Prefer official sources (project websites, documentation, release announcements)
  • Cross-reference claims across multiple sources when possible
  • Note publication dates — flag anything older than 12 months

Output Format

Output Format

For each item, write a JSON file to its output path:
{for each item:}
  • {item_name} -> {output_path} {end for}
Each JSON file must follow this structure:
json
{
  "name": "{item_name}",
  "category_name": {
    "field_name": "value",
    "field_name": "value"
  },
  "another_category": {
    "field_name": "value"
  },
  "uncertain": ["field_name_1", "field_name_2"],
  "sources": [
    {"description": "Source description", "url": "https://..."}
  ]
}
For each item, write a JSON file to its output path:
{for each item:}
  • {item_name} -> {output_path} {end for}
Each JSON file must follow this structure:
json
{
  "name": "{item_name}",
  "category_name": {
    "field_name": "value",
    "field_name": "value"
  },
  "another_category": {
    "field_name": "value"
  },
  "uncertain": ["field_name_1", "field_name_2"],
  "sources": [
    {"description": "Source description", "url": "https://..."}
  ]
}

Field value rules:

Field value rules:

  • Populate every field defined in fields.yaml
  • If a value cannot be confidently determined, write your best estimate and append "[uncertain]" to the string value
  • Add the field name to the top-level "uncertain" array
  • All values must be in English
  • Use the detail_level from fields.yaml to calibrate response length:
    • brief: single value or short phrase
    • moderate: 1-3 sentences
    • detailed: full paragraph or structured breakdown
  • Populate every field defined in fields.yaml
  • If a value cannot be confidently determined, write your best estimate and append "[uncertain]" to the string value
  • Add the field name to the top-level "uncertain" array
  • All values must be in English
  • Use the detail_level from fields.yaml to calibrate response length:
    • brief: single value or short phrase
    • moderate: 1-3 sentences
    • detailed: full paragraph or structured breakdown

Validation

Validation

After writing each JSON file, run:
bash
python3 {validate_script_path} -f {fields_path} -j {output_path}
If validation fails, read the error output, fix the JSON, and re-run until it passes. The task is complete only after all items pass validation.

**One-shot example** (single item, topic "AI Coding History"):
After writing each JSON file, run:
bash
python3 {validate_script_path} -f {fields_path} -j {output_path}
If validation fails, read the error output, fix the JSON, and re-run until it passes. The task is complete only after all items pass validation.

**单次运行示例**(单个条目,主题为“AI Coding History”):

Task

Task

Research the following item(s) and output structured JSON.
Topic: AI Coding History
Research the following item(s) and output structured JSON.
Topic: AI Coding History

Items to Research

Items to Research

  • name: GitHub Copilot description: Developed by Microsoft/GitHub, first mainstream AI coding assistant
  • name: GitHub Copilot description: Developed by Microsoft/GitHub, first mainstream AI coding assistant

Field Definitions

Field Definitions

Read the field definitions file to understand what data to collect for each item: /Users/you/ai-coding-history/fields.yaml
Use all field categories and fields defined in that file. Each item gets its own JSON object with every field populated.
Read the field definitions file to understand what data to collect for each item: /Users/you/ai-coding-history/fields.yaml
Use all field categories and fields defined in that file. Each item gets its own JSON object with every field populated.

Research Instructions

Research Instructions

  • Search for authoritative, current information on each item
  • Use 2-3 search query variations per item
  • Prefer official sources (project websites, documentation, release announcements)
  • Cross-reference claims across multiple sources when possible
  • Note publication dates — flag anything older than 12 months
  • Search for authoritative, current information on each item
  • Use 2-3 search query variations per item
  • Prefer official sources (project websites, documentation, release announcements)
  • Cross-reference claims across multiple sources when possible
  • Note publication dates — flag anything older than 12 months

Output Format

Output Format

For each item, write a JSON file to its output path:
  • GitHub Copilot -> /Users/you/ai-coding-history/results/github_copilot.json
Each JSON file must follow this structure:
json
{
  "name": "GitHub Copilot",
  "basic_info": {
    "release_date": "2021-06-29",
    "company": "Microsoft / GitHub"
  },
  "technical_features": {
    "underlying_model": "OpenAI Codex (initially), GPT-4 (current)",
    "context_window": "Varies by tier; up to 128k tokens in Copilot Enterprise [uncertain]"
  },
  "uncertain": ["context_window"],
  "sources": [
    {"description": "GitHub Copilot official documentation", "url": "https://docs.github.com/copilot"}
  ]
}
For each item, write a JSON file to its output path:
  • GitHub Copilot -> /Users/you/ai-coding-history/results/github_copilot.json
Each JSON file must follow this structure:
json
{
  "name": "GitHub Copilot",
  "basic_info": {
    "release_date": "2021-06-29",
    "company": "Microsoft / GitHub"
  },
  "technical_features": {
    "underlying_model": "OpenAI Codex (initially), GPT-4 (current)",
    "context_window": "Varies by tier; up to 128k tokens in Copilot Enterprise [uncertain]"
  },
  "uncertain": ["context_window"],
  "sources": [
    {"description": "GitHub Copilot official documentation", "url": "https://docs.github.com/copilot"}
  ]
}

Field value rules:

Field value rules:

  • Populate every field defined in fields.yaml
  • If a value cannot be confidently determined, write your best estimate and append "[uncertain]" to the string value
  • Add the field name to the top-level "uncertain" array
  • All values must be in English
  • Use the detail_level from fields.yaml to calibrate response length:
    • brief: single value or short phrase
    • moderate: 1-3 sentences
    • detailed: full paragraph or structured breakdown
  • Populate every field defined in fields.yaml
  • If a value cannot be confidently determined, write your best estimate and append "[uncertain]" to the string value
  • Add the field name to the top-level "uncertain" array
  • All values must be in English
  • Use the detail_level from fields.yaml to calibrate response length:
    • brief: single value or short phrase
    • moderate: 1-3 sentences
    • detailed: full paragraph or structured breakdown

Validation

Validation

After writing each JSON file, run:
bash
python3 /Users/you/agent-skills/skills/research-deep/scripts/validate_json.py -f /Users/you/ai-coding-history/fields.yaml -j /Users/you/ai-coding-history/results/github_copilot.json
If validation fails, read the error output, fix the JSON, and re-run until it passes. The task is complete only after all items pass validation.
undefined
After writing each JSON file, run:
bash
python3 /Users/you/agent-skills/skills/research-deep/scripts/validate_json.py -f /Users/you/ai-coding-history/fields.yaml -j /Users/you/ai-coding-history/results/github_copilot.json
If validation fails, read the error output, fix the JSON, and re-run until it passes. The task is complete only after all items pass validation.
undefined

Step 4: Monitor and Continue

步骤4:监控与继续执行

After launching a batch:
  1. Wait for all agents in the batch to complete
  2. Collect results: for each agent, check that its output JSON files exist and pass validation
  3. Handle failures:
    • If an agent fails entirely (no output): log the item names and add them to a retry list
    • If validation fails after the agent finishes: log which fields are missing/invalid
    • Retry failed items once in the next batch. If they fail again, mark them as failed and move on.
  4. Report batch progress: "Batch {N} complete: {succeeded}/{total} items succeeded."
  5. Launch next batch (with user approval)
启动一个批次后:
  1. 等待批次中所有Agent执行完成
  2. 收集结果:检查每个Agent的输出JSON文件是否存在且通过验证
  3. 处理失败情况
    • 如果Agent完全执行失败(无输出):记录条目名称,加入重试列表
    • 如果Agent执行完成但验证失败:记录缺失/无效的字段
    • 失败的条目仅在下一批次重试1次,如果再次失败,标记为失败并继续处理其他条目
  4. 反馈批次进度:“批次{N}完成:{succeeded}/{total}个条目执行成功。”
  5. (经用户确认后)启动下一个批次

Step 5: Summary Report

步骤5:汇总报告

After all batches complete, output:
undefined
所有批次执行完成后,输出以下内容:
undefined

Research Complete

调研完成

Topic: {topic} Output directory: {output_dir}
主题: {topic} 输出目录: {output_dir}

Results

结果

  • Completed: {count} / {total} items
  • Failed: {count} items {list names if any}
  • Items with uncertain fields: {count}
  • 已完成: {count} / {total} 个条目
  • 失败: {count} 个条目 {如有则列出名称}
  • 存在不确定字段的条目: {count}

Uncertain Fields Summary

不确定字段汇总

{For each item with uncertain fields:}
  • {item_name}: {list of uncertain field names} {end for}
{针对每个存在不确定字段的条目:}
  • {item_name}: {不确定字段名称列表} {结束循环}

Failed Items

失败条目

{If any:}
  • {item_name}: {reason for failure} {end for}
undefined
{如有:}
  • {item_name}: {失败原因} {结束循环}
undefined

Step 6: Generate Report

步骤6:生成报告

After Step 5's summary (or after resume finds all items already completed), generate a markdown report.
Ask the user: "Which fields should appear as summary columns in the table of contents? (Pick from the available fields — e.g. release_date, company, github_stars)"
To help the user choose, scan the completed JSON files and list fields that have short values (single numbers, dates, short strings) — these work well as TOC columns.
Run the report generation script:
bash
python3 scripts/generate_report.py \
  -f {fields_path} \
  -d {output_dir} \
  -o {outline_dir}/report.md \
  --toc-fields field1,field2,field3
If the script exits with an error, show the error output to the user and stop.
Otherwise, confirm: "Report written to
{outline_dir}/report.md
" and show the first ~30 lines as a preview.
完成步骤5的汇总后(或恢复检查发现所有条目已完成时),生成markdown报告。
询问用户:“目录的汇总表格需要展示哪些字段?(从可用字段中选择,例如release_date、company、github_stars)”
为了帮助用户选择,扫描已完成的JSON文件,列出值较短的字段(单个数字、日期、短字符串)—— 这类字段适合作为目录列。
运行报告生成脚本:
bash
python3 scripts/generate_report.py \
  -f {fields_path} \
  -d {output_dir} \
  -o {outline_dir}/report.md \
  --toc-fields field1,field2,field3
如果脚本执行报错,向用户展示错误输出后终止。
如果执行成功,反馈:“报告已写入
{outline_dir}/report.md
”,并展示前约30行内容作为预览。

Rules

规则

  • NEVER modify
    outline.yaml
    or
    fields.yaml
    — they are read-only inputs
  • NEVER skip the user approval step before each batch
  • NEVER retry a failed item more than once
  • Always run the validation script after writing each JSON — do not mark an item as complete until validation passes
  • Write JSON files atomically: write to
    {output_path}.tmp
    first, then rename to
    {output_path}
    after validation passes
  • 绝对不要修改
    outline.yaml
    fields.yaml
    —— 它们是只读输入
  • 启动每个批次前绝对不要跳过用户确认步骤
  • 失败条目的重试次数绝对不要超过1次
  • 写入每个JSON文件后必须运行验证脚本 —— 验证通过前不要将条目标记为已完成
  • 原子化写入JSON文件:先写入
    {output_path}.tmp
    ,验证通过后再重命名为
    {output_path}

Gotchas

注意事项

  • Slug collisions: two items could slug to the same filename (e.g. "C++" and "C" could both become
    c
    ). If detected, append a numeric suffix:
    c.json
    ,
    c_2.json
    .
  • Large item counts: if there are 50+ items, warn the user about total agent cost before starting.
  • fields.yaml changes: if the user modifies fields.yaml between runs, previously completed items won't have the new fields. The validation script will catch this — those items will be re-researched on resume.
  • Slug冲突:两个条目可能会生成相同的slug文件名(例如“C++”和“C”都会变成
    c
    )。如果检测到冲突,添加数字后缀:
    c.json
    c_2.json
  • 条目数量过大:如果条目超过50个,启动前向用户预警总Agent调用成本。
  • fields.yaml变更:如果用户在两次运行之间修改了fields.yaml,之前已完成的条目不会包含新字段,验证脚本会检测到该问题 —— 这些条目会在恢复执行时重新调研。