research-deep
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseResearch Deep — Batch Item Research
深度调研 — 批量条目调研
Read a research outline ( + ) produced by , then research each item in parallel batches, producing one structured JSON file per item and a final consolidated markdown report.
outline.yamlfields.yaml/research读取由/research生成的调研大纲( + ),然后按并行批次调研每个条目,为每个条目生成一份结构化JSON文件,最终输出合并后的markdown报告。
outline.yamlfields.yamlVariables
变量
| Variable | Source | Description |
|---|---|---|
| | The |
| Discovered | Directory containing |
| | |
| | |
| | |
| Derived | Absolute path to |
| Per item | The item's |
| Derived | Slugified item name: lowercase, spaces to underscores, strip non-alphanumeric except underscores, collapse consecutive underscores. E.g. "GitHub Copilot" becomes |
| Derived | |
| 变量 | 来源 | 描述 |
|---|---|---|
| | |
| 自动发现 | 包含 |
| | |
| | |
| | |
| 推导生成 | |
| 单条目属性 | 来自 |
| 推导生成 | 经过slug化处理的条目名称:转为小写,空格替换为下划线,移除下划线以外的非字母数字字符,合并连续下划线。例如“GitHub Copilot”会变成 |
| 推导生成 | |
Step 1: Locate Outline
步骤1:定位大纲
Search for in the current working directory.
*/outline.yaml- If exactly one is found: read it along with the sibling . Store the containing directory as
fields.yaml.{outline_dir} - If multiple are found: list them and ask the user which to use.
- If none found: tell the user to run first and stop.
/research
Read both files. Extract the items list and execution config. Report to the user:
- Topic:
{topic} - Items count: N items
- Batch config: parallel agents,
{batch_size}items each{items_per_agent} - Output directory:
{output_dir}
在当前工作目录中搜索文件。
*/outline.yaml- 如果恰好找到1个:读取该文件以及同级目录下的,将文件所在目录存储为
fields.yaml。{outline_dir} - 如果找到多个:列出所有文件,询问用户要使用哪一个。
- 如果未找到:告知用户需要先运行/research,然后终止执行。
读取两个文件,提取条目列表和执行配置,向用户反馈以下信息:
- 主题:
{topic} - 条目总数:N条
- 批次配置:个并行Agent,每个Agent处理
{batch_size}个条目{items_per_agent} - 输出目录:
{output_dir}
Step 2: Resume Check
步骤2:恢复检查
Check for existing files.
{output_dir}.jsonFor each existing JSON file:
- Parse the filename back to an item name (reverse the slug: -> match against items list)
github_copilot.json - Run the validation script to check completeness:
bash
python3 scripts/validate_json.py -f {fields_path} -j {output_path} - If validation passes (exit code 0): mark the item as completed — skip it
- If validation fails (exit code 1): mark the item as incomplete — include it in the run
Report resume status to the user:
- "Found N/{total} completed items. Resuming with {remaining} items."
- List the completed items so the user can verify
If all items are already completed, report this and stop.
检查中已存在的文件。
{output_dir}.json对每个已存在的JSON文件:
- 将文件名反向解析为条目名称(反转slug规则:-> 匹配条目列表中的对应项)
github_copilot.json - 运行验证脚本检查完整性:
bash
python3 scripts/validate_json.py -f {fields_path} -j {output_path} - 如果验证通过(退出码为0):将该条目标记为已完成 — 跳过调研
- 如果验证失败(退出码为1):将该条目标记为未完成 — 纳入本次运行的调研范围
向用户反馈恢复状态:
- “已找到N/{total}个已完成条目,将从剩余{remaining}个条目继续执行。”
- 列出已完成的条目,方便用户核对
如果所有条目均已完成,反馈该信息后终止执行。
Step 3: Batch Execution
步骤3:批量执行
Partition the remaining items into batches:
- Each agent handles up to items
{items_per_agent} - Launch up to agents in parallel per batch
{batch_size}
Before launching each batch, show the user which items are in this batch and ask for approval:
- "Batch {N}/{total_batches}: items [list]. Launch?"
For each agent, build the prompt from the template below. Preserve the structure and goals; only substitute the .
{variables}Read for search methodology guidance to include in the agent context.
references/web-search-guide.mdSub-agent prompt template:
undefined将剩余条目拆分为多个批次:
- 每个Agent最多处理个条目
{items_per_agent} - 每批次最多并行启动个Agent
{batch_size}
启动每个批次前,向用户展示本批次包含的条目并请求确认:
- “批次{N}/{total_batches}:包含条目[列表]。是否启动?”
为每个Agent基于以下模板构建提示词,保留结构和目标不变,仅替换占位符。
{变量}读取获取搜索方法指引,将其纳入Agent上下文。
references/web-search-guide.md子Agent提示词模板:
undefinedTask
Task
Research the following item(s) and output structured JSON.
Topic: {topic}
Research the following item(s) and output structured JSON.
Topic: {topic}
Items to Research
Items to Research
{for each item assigned to this agent:}
- name: {item_name} description: {item_description} {end for}
{for each item assigned to this agent:}
- name: {item_name} description: {item_description} {end for}
Field Definitions
Field Definitions
Read the field definitions file to understand what data to collect for each item:
{fields_path}
Use all field categories and fields defined in that file. Each item gets its own JSON object with every field populated.
Read the field definitions file to understand what data to collect for each item:
{fields_path}
Use all field categories and fields defined in that file. Each item gets its own JSON object with every field populated.
Research Instructions
Research Instructions
- Search for authoritative, current information on each item
- Use 2-3 search query variations per item
- Prefer official sources (project websites, documentation, release announcements)
- Cross-reference claims across multiple sources when possible
- Note publication dates — flag anything older than 12 months
- Search for authoritative, current information on each item
- Use 2-3 search query variations per item
- Prefer official sources (project websites, documentation, release announcements)
- Cross-reference claims across multiple sources when possible
- Note publication dates — flag anything older than 12 months
Output Format
Output Format
For each item, write a JSON file to its output path:
{for each item:}
- {item_name} -> {output_path} {end for}
Each JSON file must follow this structure:
json
{
"name": "{item_name}",
"category_name": {
"field_name": "value",
"field_name": "value"
},
"another_category": {
"field_name": "value"
},
"uncertain": ["field_name_1", "field_name_2"],
"sources": [
{"description": "Source description", "url": "https://..."}
]
}For each item, write a JSON file to its output path:
{for each item:}
- {item_name} -> {output_path} {end for}
Each JSON file must follow this structure:
json
{
"name": "{item_name}",
"category_name": {
"field_name": "value",
"field_name": "value"
},
"another_category": {
"field_name": "value"
},
"uncertain": ["field_name_1", "field_name_2"],
"sources": [
{"description": "Source description", "url": "https://..."}
]
}Field value rules:
Field value rules:
- Populate every field defined in fields.yaml
- If a value cannot be confidently determined, write your best estimate and append "[uncertain]" to the string value
- Add the field name to the top-level "uncertain" array
- All values must be in English
- Use the detail_level from fields.yaml to calibrate response length:
- brief: single value or short phrase
- moderate: 1-3 sentences
- detailed: full paragraph or structured breakdown
- Populate every field defined in fields.yaml
- If a value cannot be confidently determined, write your best estimate and append "[uncertain]" to the string value
- Add the field name to the top-level "uncertain" array
- All values must be in English
- Use the detail_level from fields.yaml to calibrate response length:
- brief: single value or short phrase
- moderate: 1-3 sentences
- detailed: full paragraph or structured breakdown
Validation
Validation
After writing each JSON file, run:
bash
python3 {validate_script_path} -f {fields_path} -j {output_path}If validation fails, read the error output, fix the JSON, and re-run until it passes.
The task is complete only after all items pass validation.
**One-shot example** (single item, topic "AI Coding History"):After writing each JSON file, run:
bash
python3 {validate_script_path} -f {fields_path} -j {output_path}If validation fails, read the error output, fix the JSON, and re-run until it passes.
The task is complete only after all items pass validation.
**单次运行示例**(单个条目,主题为“AI Coding History”):Task
Task
Research the following item(s) and output structured JSON.
Topic: AI Coding History
Research the following item(s) and output structured JSON.
Topic: AI Coding History
Items to Research
Items to Research
- name: GitHub Copilot description: Developed by Microsoft/GitHub, first mainstream AI coding assistant
- name: GitHub Copilot description: Developed by Microsoft/GitHub, first mainstream AI coding assistant
Field Definitions
Field Definitions
Read the field definitions file to understand what data to collect for each item:
/Users/you/ai-coding-history/fields.yaml
Use all field categories and fields defined in that file. Each item gets its own JSON object with every field populated.
Read the field definitions file to understand what data to collect for each item:
/Users/you/ai-coding-history/fields.yaml
Use all field categories and fields defined in that file. Each item gets its own JSON object with every field populated.
Research Instructions
Research Instructions
- Search for authoritative, current information on each item
- Use 2-3 search query variations per item
- Prefer official sources (project websites, documentation, release announcements)
- Cross-reference claims across multiple sources when possible
- Note publication dates — flag anything older than 12 months
- Search for authoritative, current information on each item
- Use 2-3 search query variations per item
- Prefer official sources (project websites, documentation, release announcements)
- Cross-reference claims across multiple sources when possible
- Note publication dates — flag anything older than 12 months
Output Format
Output Format
For each item, write a JSON file to its output path:
- GitHub Copilot -> /Users/you/ai-coding-history/results/github_copilot.json
Each JSON file must follow this structure:
json
{
"name": "GitHub Copilot",
"basic_info": {
"release_date": "2021-06-29",
"company": "Microsoft / GitHub"
},
"technical_features": {
"underlying_model": "OpenAI Codex (initially), GPT-4 (current)",
"context_window": "Varies by tier; up to 128k tokens in Copilot Enterprise [uncertain]"
},
"uncertain": ["context_window"],
"sources": [
{"description": "GitHub Copilot official documentation", "url": "https://docs.github.com/copilot"}
]
}For each item, write a JSON file to its output path:
- GitHub Copilot -> /Users/you/ai-coding-history/results/github_copilot.json
Each JSON file must follow this structure:
json
{
"name": "GitHub Copilot",
"basic_info": {
"release_date": "2021-06-29",
"company": "Microsoft / GitHub"
},
"technical_features": {
"underlying_model": "OpenAI Codex (initially), GPT-4 (current)",
"context_window": "Varies by tier; up to 128k tokens in Copilot Enterprise [uncertain]"
},
"uncertain": ["context_window"],
"sources": [
{"description": "GitHub Copilot official documentation", "url": "https://docs.github.com/copilot"}
]
}Field value rules:
Field value rules:
- Populate every field defined in fields.yaml
- If a value cannot be confidently determined, write your best estimate and append "[uncertain]" to the string value
- Add the field name to the top-level "uncertain" array
- All values must be in English
- Use the detail_level from fields.yaml to calibrate response length:
- brief: single value or short phrase
- moderate: 1-3 sentences
- detailed: full paragraph or structured breakdown
- Populate every field defined in fields.yaml
- If a value cannot be confidently determined, write your best estimate and append "[uncertain]" to the string value
- Add the field name to the top-level "uncertain" array
- All values must be in English
- Use the detail_level from fields.yaml to calibrate response length:
- brief: single value or short phrase
- moderate: 1-3 sentences
- detailed: full paragraph or structured breakdown
Validation
Validation
After writing each JSON file, run:
bash
python3 /Users/you/agent-skills/skills/research-deep/scripts/validate_json.py -f /Users/you/ai-coding-history/fields.yaml -j /Users/you/ai-coding-history/results/github_copilot.jsonIf validation fails, read the error output, fix the JSON, and re-run until it passes.
The task is complete only after all items pass validation.
undefinedAfter writing each JSON file, run:
bash
python3 /Users/you/agent-skills/skills/research-deep/scripts/validate_json.py -f /Users/you/ai-coding-history/fields.yaml -j /Users/you/ai-coding-history/results/github_copilot.jsonIf validation fails, read the error output, fix the JSON, and re-run until it passes.
The task is complete only after all items pass validation.
undefinedStep 4: Monitor and Continue
步骤4:监控与继续执行
After launching a batch:
- Wait for all agents in the batch to complete
- Collect results: for each agent, check that its output JSON files exist and pass validation
- Handle failures:
- If an agent fails entirely (no output): log the item names and add them to a retry list
- If validation fails after the agent finishes: log which fields are missing/invalid
- Retry failed items once in the next batch. If they fail again, mark them as failed and move on.
- Report batch progress: "Batch {N} complete: {succeeded}/{total} items succeeded."
- Launch next batch (with user approval)
启动一个批次后:
- 等待批次中所有Agent执行完成
- 收集结果:检查每个Agent的输出JSON文件是否存在且通过验证
- 处理失败情况:
- 如果Agent完全执行失败(无输出):记录条目名称,加入重试列表
- 如果Agent执行完成但验证失败:记录缺失/无效的字段
- 失败的条目仅在下一批次重试1次,如果再次失败,标记为失败并继续处理其他条目
- 反馈批次进度:“批次{N}完成:{succeeded}/{total}个条目执行成功。”
- (经用户确认后)启动下一个批次
Step 5: Summary Report
步骤5:汇总报告
After all batches complete, output:
undefined所有批次执行完成后,输出以下内容:
undefinedResearch Complete
调研完成
Topic: {topic}
Output directory: {output_dir}
主题: {topic}
输出目录: {output_dir}
Results
结果
- Completed: {count} / {total} items
- Failed: {count} items {list names if any}
- Items with uncertain fields: {count}
- 已完成: {count} / {total} 个条目
- 失败: {count} 个条目 {如有则列出名称}
- 存在不确定字段的条目: {count}
Uncertain Fields Summary
不确定字段汇总
{For each item with uncertain fields:}
- {item_name}: {list of uncertain field names} {end for}
{针对每个存在不确定字段的条目:}
- {item_name}: {不确定字段名称列表} {结束循环}
Failed Items
失败条目
{If any:}
- {item_name}: {reason for failure} {end for}
undefined{如有:}
- {item_name}: {失败原因} {结束循环}
undefinedStep 6: Generate Report
步骤6:生成报告
After Step 5's summary (or after resume finds all items already completed), generate a markdown report.
Ask the user: "Which fields should appear as summary columns in the table of contents? (Pick from the available fields — e.g. release_date, company, github_stars)"
To help the user choose, scan the completed JSON files and list fields that have short values (single numbers, dates, short strings) — these work well as TOC columns.
Run the report generation script:
bash
python3 scripts/generate_report.py \
-f {fields_path} \
-d {output_dir} \
-o {outline_dir}/report.md \
--toc-fields field1,field2,field3If the script exits with an error, show the error output to the user and stop.
Otherwise, confirm: "Report written to " and show the first ~30 lines as a preview.
{outline_dir}/report.md完成步骤5的汇总后(或恢复检查发现所有条目已完成时),生成markdown报告。
询问用户:“目录的汇总表格需要展示哪些字段?(从可用字段中选择,例如release_date、company、github_stars)”
为了帮助用户选择,扫描已完成的JSON文件,列出值较短的字段(单个数字、日期、短字符串)—— 这类字段适合作为目录列。
运行报告生成脚本:
bash
python3 scripts/generate_report.py \
-f {fields_path} \
-d {output_dir} \
-o {outline_dir}/report.md \
--toc-fields field1,field2,field3如果脚本执行报错,向用户展示错误输出后终止。
如果执行成功,反馈:“报告已写入”,并展示前约30行内容作为预览。
{outline_dir}/report.mdRules
规则
- NEVER modify or
outline.yaml— they are read-only inputsfields.yaml - NEVER skip the user approval step before each batch
- NEVER retry a failed item more than once
- Always run the validation script after writing each JSON — do not mark an item as complete until validation passes
- Write JSON files atomically: write to first, then rename to
{output_path}.tmpafter validation passes{output_path}
- 绝对不要修改或
outline.yaml—— 它们是只读输入fields.yaml - 启动每个批次前绝对不要跳过用户确认步骤
- 失败条目的重试次数绝对不要超过1次
- 写入每个JSON文件后必须运行验证脚本 —— 验证通过前不要将条目标记为已完成
- 原子化写入JSON文件:先写入,验证通过后再重命名为
{output_path}.tmp{output_path}
Gotchas
注意事项
- Slug collisions: two items could slug to the same filename (e.g. "C++" and "C" could both become ). If detected, append a numeric suffix:
c,c.json.c_2.json - Large item counts: if there are 50+ items, warn the user about total agent cost before starting.
- fields.yaml changes: if the user modifies fields.yaml between runs, previously completed items won't have the new fields. The validation script will catch this — those items will be re-researched on resume.
- Slug冲突:两个条目可能会生成相同的slug文件名(例如“C++”和“C”都会变成)。如果检测到冲突,添加数字后缀:
c、c.json。c_2.json - 条目数量过大:如果条目超过50个,启动前向用户预警总Agent调用成本。
- fields.yaml变更:如果用户在两次运行之间修改了fields.yaml,之前已完成的条目不会包含新字段,验证脚本会检测到该问题 —— 这些条目会在恢复执行时重新调研。