deep-research
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDeep Research
深度研究
Hypothesis-driven research swarm. Spawns specialist agents to investigate a task,
grades every finding by evidence quality, then adversarially challenges the emerging
conclusion before delivering a structured verdict.
基于假设驱动的研究智能体集群。生成专业Agent开展任务调研,对每项发现按证据质量分级,在交付结构化结论前,先通过对抗性验证挑战初步得出的结论。
When This Skill Activates
技能触发场景
Trigger on explicit research requests:
- User says: "research", "investigate", "discover", "how should I approach..."
- User asks: "what's the best way to...", "explore options for...", "deep research"
- User wants prior art, feasibility analysis, or approach comparison
Do NOT activate automatically on every task. This is an on-demand tool, not a gate.
在用户明确提出研究请求时触发:
- 用户表述包含:"research"、"investigate"、"discover"、"我该如何着手..."
- 用户提问包含:"what's the best way to..."(做...的最佳方式是什么)、"explore options for..."(探索...的方案)、"deep research"(深度研究)
- 用户需要了解现有技术、可行性分析或方案对比
请勿在所有任务中自动激活。这是一个按需调用的工具,而非必经流程。
Phase 1: Hypothesis Formation
阶段1:假设形成
Before spawning agents, frame the research:
-
Parse the task: Extract the core question or goal. If ambiguous, ask the user to clarify before proceeding. Identify technology keywords (languages, frameworks, libraries mentioned or implied by the codebase).
-
Identify repo context: Runto get
git rev-parse --show-toplevel. If this fails (not a git repo), set{repo_root}to the current working directory. Check for{repo_root},package.json,pyproject.toml,Cargo.toml, etc. to identify the language/framework stack. Pass this asgo.mod. If no manifests are found, set{tech_stack}to the primary file extensions present or ask the user.{tech_stack} -
speak-memory: Ifexists and an active story matches the current work, read it for context. If
.speak-memory/index.mddoes not exist, skip..speak-memory/ -
Form hypotheses: State 1-3 hypotheses to investigate. For each:
- Question: The specific question being answered
- Prior belief: What you expect the answer to be (best guess before research)
- Disconfirming evidence: What evidence would change the answer
If the task is open-ended ("how should I build X?") and you have no prior belief, that's fine — set prior belief to "unknown" and frame the hypothesis as: "There is an established approach for X in this codebase/ecosystem." Disconfirming evidence: "No established patterns exist; this is novel." -
Set scope budget: Declare upfront: "Budget: 5 research agents + 1 adversarial challenge = 6 agent calls, investigating N hypotheses." (substitute the actual hypothesis count for N). When budget is exhausted, synthesize what you have rather than expanding.
Present hypotheses to the user before proceeding:
Hypotheses:
1. [Question] — Prior belief: [X]. Would change if: [Y].
2. ...
Budget: 6 agents, [hypothesis count] questions. Proceed?If the user declines or asks to revise, update the hypotheses and re-present.
Do not spawn agents until the user confirms.
在生成Agent前,先明确研究框架:
-
解析任务:提取核心问题或目标。若存在歧义,先请求用户澄清。识别技术关键词(代码库涉及或隐含的语言、框架、库)。
-
识别仓库上下文:执行获取
git rev-parse --show-toplevel。若执行失败(非git仓库),将{repo_root}设为当前工作目录。检查是否存在{repo_root}、package.json、pyproject.toml、Cargo.toml等文件,以识别语言/技术栈,将其作为go.mod传入。若未找到任何清单文件,将{tech_stack}设为当前目录的主要文件扩展名,或请求用户提供。{tech_stack} -
speak-memory:若存在且有与当前工作匹配的活跃记录,读取该文件获取上下文。若
.speak-memory/index.md不存在,则跳过此步骤。.speak-memory/ -
形成假设:提出1-3个待调研的假设。每个假设需包含:
- 问题:具体要解答的问题
- 初始预判:预期的答案(调研前的最佳猜测)
- 反证依据:哪些证据会改变当前答案
若任务为开放式(如“我该如何构建X?”)且无初始预判,可将初始预判设为“未知”,并将假设框架设为: "在该代码库/生态系统中,存在构建X的成熟方案。" 反证依据:"不存在成熟模式;此需求属于全新场景。" -
设定范围预算:提前声明:"预算:5个研究Agent + 1个对抗性验证 = 6次Agent调用,调研N个假设。"(将N替换为实际假设数量)。预算耗尽后,基于已有结果进行综合分析,不得扩大范围。
在继续前,需向用户展示假设:
假设:
1. [问题] — 初始预判:[X]。若出现[Y]则改变结论。
2. ...
预算:6个Agent,[假设数量]个问题。是否继续?若用户拒绝或要求修改,更新假设后重新展示。需获得用户确认后,方可生成Agent。
Phase 2: Evidence Gathering
阶段2:证据收集
Spawn all 5 agents in a single message (parallel Agent tool calls). Each agent
returns a JSON findings array — see for full prompts.
references/agent-roles.md| Agent | Subagent Type | Model | Focus |
|---|---|---|---|
| codebase | Explore | opus | Existing patterns, utilities, similar implementations, conventions |
| web-research | general-purpose | opus | Solutions, libraries, best practices, documentation |
| tools-mcp | general-purpose | opus | Available MCP servers, tools, and resources |
| skills | general-purpose | opus | Installed skills and marketplace matches |
| dependencies | general-purpose | opus | Installed packages, version constraints, compatibility |
Agent tool grants:
- codebase: (read-only codebase exploration)
Read, Glob, Grep - web-research: (external research)
WebSearch, WebFetch - tools-mcp: (tool discovery)
ListMcpResourcesTool, ReadMcpResourceTool - skills: (read skills +
Read, Glob, Grep, Bash)npx skills find - dependencies: (read manifests +
Read, Glob, Grep, Bash/npm ls/pip list)cargo tree
Pass each agent the hypotheses so they can focus their search, but instruct them
to also report unexpected relevant findings outside the hypothesis scope.
Required output format (every finding must include ):
evidence_tierjson
[
{
"evidence_tier": "primary|secondary|speculative",
"relevance": "high|medium|low",
"source": "where this was found (file path, URL, tool name)",
"finding": "what was discovered",
"supports_hypothesis": "which hypothesis this relates to, or 'unexpected'",
"recommendation": "how this finding applies to the task",
"references": ["file paths or URLs for further reading"]
}
]Evidence tiers:
- primary: Direct from authoritative source — code you read, API response, official documentation, test output
- secondary: Reputable third-party — blog post with evidence, SO answer with code examples, well-maintained library README
- speculative: Inference, analogy, or "I think" — no direct source confirms this
Return empty array if no relevant findings in that domain.
[]Instruct each agent: "Return your top 10 findings maximum, prioritized by relevance.
Tag every finding with an evidence_tier. Never report speculative findings as primary."
Error handling: If an agent returns non-JSON output, strip code fences and attempt
JSON extraction. If extraction fails or the agent times out, treat as empty array
and log a warning. Continue with results from agents that succeeded.
[]在单条消息中生成全部5个Agent(并行调用Agent工具)。每个Agent返回JSON格式的发现数组——完整提示词请参考。
references/agent-roles.md| Agent | 子Agent类型 | 模型 | 研究重点 |
|---|---|---|---|
| codebase | Explore | opus | 现有模式、工具类、类似实现、约定规范 |
| web-research | general-purpose | opus | 解决方案、库、最佳实践、文档 |
| tools-mcp | general-purpose | opus | 可用的MCP服务器、工具及资源 |
| skills | general-purpose | opus | 已安装技能及市场匹配技能 |
| dependencies | general-purpose | opus | 已安装包、版本约束、兼容性 |
Agent工具权限:
- codebase:(仅读取代码库)
Read, Glob, Grep - web-research:(外部调研)
WebSearch, WebFetch - tools-mcp:(工具发现)
ListMcpResourcesTool, ReadMcpResourceTool - skills:(读取技能 +
Read, Glob, Grep, Bash)npx skills find - dependencies:(读取清单文件 +
Read, Glob, Grep, Bash/npm ls/pip list)cargo tree
向每个Agent传入假设,使其聚焦搜索,但需指示它们同时报告假设范围外的意外相关发现。
必填输出格式(每项发现必须包含):
evidence_tierjson
[
{
"evidence_tier": "primary|secondary|speculative",
"relevance": "high|medium|low",
"source": "发现来源(文件路径、URL、工具名称)",
"finding": "发现内容",
"supports_hypothesis": "关联的假设编号,或'unexpected'",
"recommendation": "该发现对任务的应用建议",
"references": ["供进一步阅读的文件路径或URL"]
}
]证据分级:
- primary:来自权威来源的直接证据——读取的代码、API响应、官方文档、测试输出
- secondary:可信第三方来源——带证据的博客文章、含代码示例的Stack Overflow回答、维护良好的库README
- speculative:推断、类比或“个人观点”——无直接来源佐证
若该领域无相关发现,返回空数组。
[]指示每个Agent:"最多返回10项发现,按相关性排序。为每项发现标记evidence_tier。不得将推测性发现标记为primary。"
错误处理:若Agent返回非JSON输出,移除代码块并尝试提取JSON。若提取失败或Agent超时,视为空数组并记录警告。基于成功返回的Agent结果继续执行。
[]Phase 3: Synthesis
阶段3:结果合成
Merge all 5 finding arrays. Apply the synthesis rules from :
references/synthesis.md- Merge: Collect all 5 JSON arrays into a single flat array. Tag each finding with its source agent name. Note which agents failed or returned empty.
- Deduplicate: Merge identical findings; note corroborating sources.
- Resolve conflicts: Flag when agents disagree; note the resolution.
- Grade evidence: Flag any conclusion that rests entirely on speculative evidence. A conclusion needs at least one primary or secondary source to be credible.
- Rank: Sort by evidence_tier (primary first), then relevance, then corroboration. Cap the merged array at 30 findings — drop low-relevance speculative findings first to keep context manageable.
- Form preliminary conclusion: Based on the ranked findings, form a preliminary answer to each hypothesis. State whether the prior belief was confirmed or changed.
- Generate approaches: Propose 2-3 approaches with trade-offs. Each approach must reference the evidence that supports it with tier tags.
合并所有5个发现数组。遵循中的合成规则:
references/synthesis.md- 合并:将5个JSON数组合并为单个扁平数组。为每项发现标记来源Agent名称。记录哪些Agent执行失败或返回空结果。
- 去重:合并相同发现;记录佐证来源。
- 解决冲突:标记Agent意见不一致的情况;记录解决方式。
- 证据评级:标记完全基于推测性证据的结论。结论需至少有一个primary或secondary来源才具备可信度。
- 排序:按evidence_tier排序(primary优先),其次是相关性,最后是佐证数量。合并后的数组最多保留30项发现——优先移除低相关性的推测性发现,以保持上下文简洁。
- 形成初步结论:基于排序后的发现,为每个假设形成初步答案。说明初始预判是否被确认或改变。
- 生成方案:提出2-3种方案并说明权衡。每种方案必须引用带分级标签的支持证据。
Phase 4: Adversarial Challenge
阶段4:对抗性验证
Before delivering findings, actively try to disprove the emerging conclusion.
Spawn one devil's advocate agent (, ).
Give it:
subagent_type: "general-purpose"model: "opus"- The preliminary conclusion from Phase 3
- The recommended approach
- The hypotheses and their current status (confirmed/changed)
The devil's advocate agent's job:
- Search for disconfirming evidence using
WebSearch, WebFetch, Read, Glob, Grep - Find reasons the recommended approach would fail
- Identify assumptions that haven't been tested
- Look for alternatives the research may have missed
See for the full devil's advocate prompt.
references/agent-roles.mdHandle the devil's advocate result:
- : The conclusion survives — it's stronger. Note any speculative counterarguments as dissent but don't change the recommendation.
conclusion_holds - : The devil's advocate found credible disconfirming evidence (primary or secondary tier). Revise the conclusion, note the revision, and re-generate approaches (re-run synthesis Step 7) to reflect the updated position.
conclusion_weakened - : The recommended approach is fundamentally flawed. Discard it, revise all affected hypothesis conclusions, and re-generate approaches from scratch based on the combined original + adversarial evidence.
conclusion_overturned
Error handling: If the devil's advocate agent fails (non-JSON output, timeout, or
invalid structure), log a warning: "Devil's advocate failed — conclusion not adversarially
tested. Reduce confidence by one level." Continue to Phase 5 with the untested conclusion.
在交付发现结果前,主动尝试推翻初步结论。
生成一个魔鬼代言人Agent(, )。向其传入:
subagent_type: "general-purpose"model: "opus"- 阶段3得出的初步结论
- 推荐的方案
- 假设及其当前状态(已确认/已改变)
魔鬼代言人Agent的职责:
- 使用搜索反证证据
WebSearch, WebFetch, Read, Glob, Grep - 找出推荐方案可能失败的原因
- 识别未验证的假设前提
- 寻找调研可能遗漏的替代方案
完整的魔鬼代言人提示词请参考。
references/agent-roles.md处理魔鬼代言人的结果:
- :结论成立——可信度更高。将推测性的反驳意见记录为异议,但不修改推荐方案。
conclusion_holds - :魔鬼代言人找到可信的反证证据(primary或secondary分级)。修改结论,记录修改内容,并重新生成方案(重新执行合成步骤7)以反映更新后的结论。
conclusion_weakened - :推荐方案存在根本性缺陷。废弃该方案,修改所有受影响的假设结论,并基于原始证据+对抗性证据从头重新生成方案。
conclusion_overturned
错误处理:若魔鬼代言人Agent执行失败(返回非JSON输出、超时或结构无效),记录警告:"魔鬼代言人执行失败——结论未经过对抗性验证。将可信度降低一级。" 继续执行阶段5,使用未验证的结论。
Phase 5: Verdict
阶段5:最终结论
Output a structured verdict (use for formatting):
assets/report-template.mdVERDICT: [one sentence answer]
CONFIDENCE: high|medium|low — [justification]
EVIDENCE:
1. [primary] source — finding
2. [secondary] source — finding
3. ...
DISSENT: [strongest counterargument from devil's advocate]
ACTION: [what to do next — implement X, investigate Y further, do nothing]After the verdict, output the structured artifact for downstream skills:
json
{
"task": "original task description",
"tech_stack": ["identified technologies"],
"hypotheses": [
{
"question": "...",
"prior_belief": "...",
"conclusion": "confirmed|changed|inconclusive",
"conclusion_detail": "what we found"
}
],
"metadata": {
"agents_completed": ["codebase", "web-research", "tools-mcp", "skills", "dependencies", "devils-advocate"],
"agents_failed": [],
"total_findings": 0,
"speculative_only_conclusions": 0,
"timestamp": "ISO-8601"
},
"findings": [... merged findings array (high/medium only, with evidence_tier) ...],
"conflicts": [
{
"description": "what conflicts",
"agents": ["which agents disagree"],
"resolution": "how resolved, or null if user must decide"
}
],
"verdict": {
"summary": "one sentence",
"confidence": "high|medium|low",
"confidence_justification": "why this confidence level",
"dissent": "strongest counterargument"
},
"approaches": [
{
"name": "Approach A",
"summary": "one-line description",
"pros": ["..."],
"cons": ["..."],
"recommended": true,
"evidence": ["[primary] source — finding", "..."],
"relevant_files": ["paths from codebase agent"],
"relevant_tools": ["MCP tools/skills discovered"],
"dependencies_needed": ["new packages, if any"],
"estimated_complexity": "low|medium|high"
}
]
}No-findings edge case: If all agents return empty arrays, output the structured
artifact with , , and .
Set to: "Research found no directly relevant prior art. Recommend
exploratory implementation or breaking the task into smaller sub-problems."
findings: []approaches: []verdict.confidence: "low"verdict.summaryspeak-memory: If an active story was loaded in Phase 1, use Write/Edit to update
the story file — append to Recent Activity and update Current Context.
输出结构化结论(格式请参考):
assets/report-template.md结论:[一句话总结]
可信度:高|中|低 — [理由]
证据:
1. [primary] 来源 — 发现内容
2. [secondary] 来源 — 发现内容
3. ...
异议:[来自魔鬼代言人的最强反驳意见]
行动建议:[下一步操作——实现X、进一步调研Y、暂不行动]输出结论后,为下游技能输出结构化成果:
json
{
"task": "原始任务描述",
"tech_stack": ["识别出的技术栈"],
"hypotheses": [
{
"question": "...",
"prior_belief": "...",
"conclusion": "confirmed|changed|inconclusive",
"conclusion_detail": "调研发现"
}
],
"metadata": {
"agents_completed": ["codebase", "web-research", "tools-mcp", "skills", "dependencies", "devils-advocate"],
"agents_failed": [],
"total_findings": 0,
"speculative_only_conclusions": 0,
"timestamp": "ISO-8601"
},
"findings": [... 合并后的发现数组(仅保留高/中相关性,含evidence_tier) ...],
"conflicts": [
{
"description": "冲突内容",
"agents": ["意见不一致的Agent"],
"resolution": "解决方式,若需用户决策则为null"
}
],
"verdict": {
"summary": "一句话总结",
"confidence": "high|medium|low",
"confidence_justification": "可信度理由",
"dissent": "最强反驳意见"
},
"approaches": [
{
"name": "方案A",
"summary": "一句话描述",
"pros": ["..."],
"cons": ["..."],
"recommended": true,
"evidence": ["[primary] 来源 — 发现内容", "..."],
"relevant_files": ["来自codebase Agent的文件路径"],
"relevant_tools": ["发现的MCP工具/技能"],
"dependencies_needed": ["所需的新包(如有)"],
"estimated_complexity": "低|中|高"
}
]
}无发现的边缘场景:若所有Agent均返回空数组,输出结构化成果时设置、、。将设为:"调研未找到直接相关的现有技术。建议开展探索性实现,或将任务拆分为更小的子问题。"
findings: []approaches: []verdict.confidence: "low"verdict.summaryspeak-memory:若阶段1加载了活跃记录,使用Write/Edit更新记录文件——在“近期活动”中追加内容,并更新“当前上下文”。
Key Constraints
关键约束
- All research agents: Opus () for maximum research quality.
model: "opus" - Agent execution caps: Research agents: . Devil's advocate:
max_turns: 30.max_turns: 20 - Max 2 levels of nesting: orchestrator → specialist. Specialists never spawn agents.
- Scope budget: 5 research agents + 1 devil's advocate = 6 total. Do not expand.
- Findings cap: Max 30 findings enter synthesis (after merge + dedup). Drop low-relevance speculative findings first.
- All sub-agents are read-only — no code modifications, no git changes. The
orchestrator may write to only.
.speak-memory/ - Bash (sub-agents only) limited to dependency queries (,
npm ls,pip list) and skill search (cargo tree). No other Bash commands.npx skills find - Conclusions resting entirely on speculative evidence must be flagged as low confidence.
- The structured artifact stays in conversation context — no file writing.
- 所有研究Agent:使用Opus()以确保最高调研质量。
model: "opus" - Agent执行次数上限:研究Agent:。魔鬼代言人:
max_turns: 30。max_turns: 20 - 最多2层嵌套:编排器 → 专业Agent。专业Agent不得生成其他Agent。
- 范围预算:5个研究Agent + 1个魔鬼代言人 = 6次调用。不得扩大预算。
- 发现数量上限:合成阶段最多保留30项发现(合并+去重后)。优先移除低相关性的推测性发现。
- 所有子Agent仅具备只读权限——不得修改代码、不得变更git内容。仅编排器可写入。
.speak-memory/ - Bash(仅子Agent可用)仅限用于依赖查询(、
npm ls、pip list)和技能搜索(cargo tree)。禁止使用其他Bash命令。npx skills find - 完全基于推测性证据的结论必须标记为低可信度。
- 结构化成果需保留在对话上下文中——不得写入文件。
Closing Checklist
收尾检查清单
Do not declare the research done until all boxes are checked:
- Hypotheses stated with prior beliefs and disconfirming criteria
- All 5 research agents completed (returned valid JSON) or failed (logged as warning)
- Every finding tagged with evidence_tier (primary/secondary/speculative)
- Preliminary conclusion formed and approaches generated
- Devil's advocate challenged the conclusion
- Structured verdict delivered (verdict, confidence, evidence, dissent, action)
- Structured artifact output for downstream consumption
- speak-memory story updated (if applicable)
需完成所有检查项后,方可宣布调研完成:
- 已明确假设,包含初始预判和反证条件
- 所有5个研究Agent已完成(返回有效JSON)或执行失败(已记录警告)
- 每项发现均标记了evidence_tier(primary/secondary/speculative)
- 已形成初步结论并生成方案
- 魔鬼代言人已对结论进行验证挑战
- 已交付结构化结论(结论、可信度、证据、异议、行动建议)
- 已输出结构化成果供下游技能使用
- 已更新speak-memory记录(如适用)
Reference Files
参考文件
Load only when needed:
- — Full prompt templates for each research agent + devil's advocate
references/agent-roles.md - — Evidence grading, dedup, conflict resolution, approach generation
references/synthesis.md - — Verdict and report format
assets/report-template.md
仅在需要时加载:
- — 各研究Agent及魔鬼代言人的完整提示词模板
references/agent-roles.md - — 证据评级、去重、冲突解决、方案生成规则
references/synthesis.md - — 结论与报告格式
assets/report-template.md