research
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDeep Research
深度研究
General-purpose deep research with multi-source synthesis, confidence scoring, and anti-hallucination verification. Adopts SOTA patterns from OpenAI Deep Research (multi-agent triage pipeline), Google Gemini Deep Research (user-reviewable plans), STORM (perspective-guided conversations), Perplexity (source confidence ratings), and LangChain ODR (supervisor-researcher with reflection).
通用型深度研究工具,具备多源信息整合、置信度评分和防幻觉校验能力,采用了来自OpenAI Deep Research(多Agent分流流水线)、Google Gemini Deep Research(支持用户审核的研究计划)、STORM(视角引导对话)、Perplexity(来源置信度评级)和LangChain ODR(带反思的督导-研究员模式)的业界最优实践。
Vocabulary
词汇表
| Term | Definition |
|---|---|
| query | The user's research question or topic; the unit of investigation |
| claim | A discrete assertion to be verified; extracted from sources or user input |
| source | A specific origin of information: URL, document, database record, or API response |
| evidence | A source-backed datum supporting or contradicting a claim; always has provenance |
| provenance | The chain from evidence to source: tool used, URL, access timestamp, excerpt |
| confidence | Score 0.0-1.0 per claim; based on evidence strength and cross-validation |
| cross-validation | Verifying a claim across 2+ independent sources; the core anti-hallucination mechanism |
| triangulation | Confirming a finding using 3+ methodologically diverse sources |
| contradiction | When two credible sources assert incompatible claims; must be surfaced explicitly |
| synthesis | The final research product: not a summary but a novel integration of evidence with analysis |
| journal | The saved markdown record of a research session, stored in |
| sweep | Wave 1: broad parallel search across multiple tools and sources |
| deep dive | Wave 2: targeted follow-up on specific leads from the sweep |
| lead | A promising source or thread identified during the sweep, warranting deeper investigation |
| tier | Complexity classification: Quick (0-2), Standard (3-5), Deep (6-8), Exhaustive (9-10) |
| finding | A verified claim with evidence chain, confidence score, and provenance; the atomic unit of output |
| gap | An identified area where evidence is insufficient, contradictory, or absent |
| bias marker | An explicit flag on a finding indicating potential bias (recency, authority, LLM prior, etc.) |
| degraded mode | Operation when research tools are unavailable; confidence ceilings applied |
| 术语 | 定义 |
|---|---|
| query | 用户的研究问题或主题,是调研的基本单位 |
| claim | 待验证的离散断言,从信息来源或用户输入中提取 |
| source | 具体的信息来源:URL、文档、数据库记录或API响应 |
| evidence | 支持或反驳某一断言的来源背书数据,始终包含出处信息 |
| provenance | 证据到来源的追溯链:使用的工具、URL、访问时间戳、摘录内容 |
| confidence | 每个断言的0.0-1.0分值,基于证据强度和交叉验证结果计算 |
| cross-validation | 跨2个及以上独立来源验证断言,是核心的防幻觉机制 |
| triangulation | 使用3个及以上方法学上差异化的来源确认研究结果 |
| contradiction | 两个可信来源提出互不相容的断言,必须明确披露 |
| synthesis | 最终研究产物:不是简单摘要,而是结合分析的证据创新性整合成果 |
| journal | 研究会话的保存markdown记录,存储在 |
| sweep | 第一阶段:跨多个工具和来源的广泛并行搜索 |
| deep dive | 第二阶段:对第一阶段发现的特定线索进行针对性跟进 |
| lead | 第一阶段识别出的有价值来源或线索,值得进一步深度调研 |
| tier | 复杂度分类:快速(0-2)、标准(3-5)、深度(6-8)、穷尽(9-10) |
| finding | 经过验证的断言,附带证据链、置信度评分和出处,是输出的原子单位 |
| gap | 识别出的证据不足、矛盾或缺失的领域 |
| bias marker | 研究结果上的明确标记,表明存在潜在偏见(时效性、权威性、LLM先验等) |
| degraded mode | 研究工具不可用时的运行模式,会施加置信度上限 |
Dispatch
指令分发
| Action |
|---|---|
Question or topic text (has verb or | Investigate — classify complexity, execute wave pipeline |
Vague input (<5 words, no verb, no | Intake — ask 2-3 clarifying questions, then classify |
| Fact-check — verify claim against 3+ search engines |
| Compare — structured comparison with decision matrix output |
| Survey — landscape mapping, annotated bibliography |
| Track — load prior journal, search for updates since last session |
| Resume — resume a saved research session |
| List — show journal metadata table |
| Archive — move journals older than 90 days |
| Delete — delete journal N with confirmation |
| Export — render HTML dashboard for journal N (default: current) |
| Empty | Gallery — show topic examples + "ask me anything" prompt |
| 操作 |
|---|---|
问题或主题文本(包含动词或 | 调研 — 分类复杂度,执行阶段流水线 |
模糊输入(少于5个词,无动词,无 | 信息收集 — 询问2-3个澄清问题,之后进行分类 |
| 事实核查 — 跨3个及以上搜索引擎验证断言 |
| 对比 — 输出带决策矩阵的结构化对比结果 |
| 综述 — 领域全景梳理,带注释的参考文献列表 |
| 跟踪 — 加载之前的研究记录,搜索上次会话之后的更新内容 |
| 恢复 — 恢复已保存的研究会话 |
| 列表 — 展示研究记录元数据表 |
| 归档 — 移动超过90天的研究记录到归档目录 |
| 删除 — 经确认后删除第N条研究记录 |
| 导出 — 为第N条研究记录渲染HTML仪表板(默认:当前会话) |
| 空输入 | 示例展示 — 展示主题示例 + "请问我任何问题"提示 |
Auto-Detection Heuristic
自动检测规则
If no mode keyword matches:
- Ends with or starts with question word (who/what/when/where/why/how/is/are/can/does/should/will) → Investigate
? - Contains ,
vs,versus,compared tobetween noun phrases → Compareor - Declarative statement with factual claim, no question syntax → Fact-check
- Broad field name with no specific question → ask: "Investigate a specific question, or survey the entire field?"
- Ambiguous → ask: "Would you like me to investigate this question, verify this claim, or survey this field?"
如果没有匹配到模式关键词:
- 以结尾或以疑问词(who/what/when/where/why/how/is/are/can/does/should/will)开头 → 调研
? - 名词短语之间包含、
vs、versus、compared to→ 对比or - 带事实断言的陈述性语句,无疑问句式 → 事实核查
- 宽泛的领域名称,无具体问题 → 询问:"想要调研某个具体问题,还是综述整个领域?"
- 含义模糊 → 询问:"您希望我调研这个问题、验证这个断言,还是综述这个领域?"
Gallery (Empty Arguments)
示例展示(空参数)
Present research examples spanning domains:
| # | Domain | Example | Likely Tier |
|---|---|---|---|
| 1 | Technology | "What are the current best practices for LLM agent architectures?" | Deep |
| 2 | Academic | "What is the state of evidence on intermittent fasting for longevity?" | Standard |
| 3 | Market | "How does the competitive landscape for vector databases compare?" | Deep |
| 4 | Fact-check | "Is it true that 90% of startups fail within the first year?" | Standard |
| 5 | Architecture | "When should you choose event sourcing over CRUD?" | Standard |
| 6 | Trends | "What emerging programming languages gained traction in 2025-2026?" | Standard |
Pick a number, paste your own question, or type.guide me
展示跨领域的研究示例:
| # | 领域 | 示例 | 可能层级 |
|---|---|---|---|
| 1 | 技术 | "LLM Agent架构当前的最佳实践有哪些?" | 深度 |
| 2 | 学术 | "间歇性 fasting 对长寿的证据现状如何?" | 标准 |
| 3 | 市场 | "向量数据库的竞争格局对比情况如何?" | 深度 |
| 4 | 事实核查 | "90%的初创公司会在第一年倒闭是真的吗?" | 标准 |
| 5 | 架构 | "什么时候应该选择事件溯源而不是CRUD?" | 标准 |
| 6 | 趋势 | "2025-2026年有哪些新兴编程语言获得了关注?" | 标准 |
选择一个编号,粘贴你自己的问题,或者输入。guide me
Skill Awareness
技能适配检查
Before starting research, check if another skill is a better fit:
| Signal | Redirect |
|---|---|
| Code review, PR review, diff analysis | Suggest |
| Strategic decision with adversaries, game theory | Suggest |
| Multi-perspective expert debate | Suggest |
| Prompt optimization, model-specific prompting | Suggest |
If the user confirms they want general research, proceed.
在开始研究之前,检查是否有其他技能更适合当前需求:
| 信号 | 跳转建议 |
|---|---|
| 代码评审、PR评审、diff分析 | 建议使用 |
| 带对抗方的战略决策、博弈论相关 | 建议使用 |
| 多视角专家辩论 | 建议使用 |
| Prompt优化、特定模型的prompt编写 | 建议使用 |
如果用户确认需要通用研究,再继续执行。
Complexity Classification
复杂度分类
Score the query on 5 dimensions (0-2 each, total 0-10):
| Dimension | 0 | 1 | 2 |
|---|---|---|---|
| Scope breadth | Single fact/definition | Multi-faceted, 2-3 domains | Cross-disciplinary, 4+ domains |
| Source difficulty | Top search results suffice | Specialized databases or multiple source types | Paywalled, fragmented, or conflicting sources |
| Temporal sensitivity | Stable/historical | Evolving field (months matter) | Fast-moving (days/weeks matter), active controversy |
| Verification complexity | Easily verifiable (official docs) | 2-3 independent sources needed | Contested claims, expert disagreement, no consensus |
| Synthesis demand | Answer is a fact or list | Compare/contrast viewpoints | Novel integration of conflicting threads |
| Total | Tier | Strategy |
|---|---|---|
| 0-2 | Quick | Inline, 1-2 searches, fire-and-forget |
| 3-5 | Standard | Subagent wave, 3-5 parallel searchers, report delivered |
| 6-8 | Deep | Agent team (TeamCreate), 3-5 teammates, interactive session |
| 9-10 | Exhaustive | Agent team, 4-6 teammates + nested subagent waves, interactive |
Present the scoring to the user. User can override tier with .
--depth <tier>从5个维度为查询打分(每个维度0-2分,总分0-10):
| 维度 | 0分 | 1分 | 2分 |
|---|---|---|---|
| 范围广度 | 单个事实/定义 | 多层面,涉及2-3个领域 | 跨学科,涉及4个及以上领域 |
| 来源获取难度 | 顶部搜索结果即可满足 | 需要专业数据库或多种来源类型 | 付费墙、碎片化或相互冲突的来源 |
| 时间敏感性 | 稳定/历史内容 | 演进中的领域(以月为单位变化) | 快速变化(以天/周为单位变化),存在活跃争议 |
| 验证复杂度 | 容易验证(官方文档即可) | 需要2-3个独立来源 | 存在争议的断言、专家意见分歧、无共识 |
| 整合需求 | 答案是事实或列表 | 对比/对照不同观点 | 需要创新性整合相互冲突的线索 |
| 总分 | 层级 | 执行策略 |
|---|---|---|
| 0-2 | 快速 | 内联执行,1-2次搜索,即发即得 |
| 3-5 | 标准 | 子Agent阶段执行,3-5个并行搜索器,交付报告 |
| 6-8 | 深度 | Agent团队(TeamCreate),3-5个协作成员,交互式会话 |
| 9-10 | 穷尽 | Agent团队,4-6个协作成员 + 嵌套子Agent阶段,交互式 |
向用户展示打分结果,用户可以通过参数覆盖层级。
--depth <tier>Wave Pipeline
阶段流水线
All non-Quick research follows this 5-wave pipeline. Quick merges Waves 0+1+4 inline.
所有非快速级别的研究都遵循以下5阶段流水线,快速级别会内联合并阶段0+1+4。
Wave 0: Triage (always inline, never parallelized)
阶段0:分流(始终内联执行,从不并行)
- Run for deterministic pre-scan
!uv run python skills/research/scripts/research-scanner.py "$ARGUMENTS" - Decompose query into 2-5 sub-questions
- Score complexity on the 5-dimension rubric
- Check tool availability — probe key MCP tools; set degraded mode flags and confidence ceilings per
references/source-selection.md - Select tools per domain signals — read
references/source-selection.md - Check for existing journals — if or
track, load prior stateresume - Present triage to user — show: complexity score, sub-questions, planned strategy, estimated tier. User may override.
- 执行进行确定性预扫描
!uv run python skills/research/scripts/research-scanner.py "$ARGUMENTS" - 将查询拆解为2-5个子问题
- 按照5维度评分规则进行复杂度打分
- 检查工具可用性——探测核心MCP工具;按照设置降级模式标志和置信度上限
references/source-selection.md - 按照领域信号选择工具——阅读
references/source-selection.md - 检查是否存在已有研究记录——如果是或
track指令,加载之前的状态resume - 向用户展示分流结果——展示:复杂度得分、子问题、计划策略、预估层级,用户可覆盖调整
Wave 1: Broad Sweep (parallel)
阶段1:广泛扫描(并行)
Scale by tier:
Quick (inline): 1-2 tool calls sequentially. No subagents.
Standard (subagent wave): Dispatch 3-5 parallel subagents via Task tool:
Subagent A → brave-search + duckduckgo-search for sub-question 1
Subagent B → exa + g-search for sub-question 2
Subagent C → context7 / deepwiki / arxiv / semantic-scholar for technical specifics
Subagent D → wikipedia / wikidata for factual grounding
[Subagent E → PubMed / openalex if academic domain detected]Deep (agent team): TeamCreate :
"research-{slug}"Lead: triage (Wave 0), orchestrate, judge reconcile (Wave 3), synthesize (Wave 4)
|-- web-researcher: brave-search, duckduckgo-search, exa, g-search
|-- tech-researcher: context7, deepwiki, arxiv, semantic-scholar, package-version
|-- content-extractor: fetcher, trafilatura, docling, wikipedia, wayback
|-- [academic-researcher: arxiv, semantic-scholar, openalex, crossref, PubMed]
|-- [adversarial-reviewer: devil's advocate — counter-search all emerging findings]Spawn academic-researcher if domain signals include academic/scientific. Spawn adversarial-reviewer for Exhaustive tier or if verification complexity >= 2.
Exhaustive: Deep team + each teammate runs nested subagent waves internally.
Each subagent/teammate returns structured findings:
json
{
"sub_question": "...",
"findings": [{"claim": "...", "source_url": "...", "source_tool": "...", "excerpt": "...", "confidence_raw": 0.6}],
"leads": ["url1", "url2"],
"gaps": ["could not find data on X"]
}根据层级扩展规模:
快速(内联): 顺序执行1-2次工具调用,无子Agent。
标准(子Agent阶段): 通过Task工具调度3-5个并行子Agent:
Subagent A → brave-search + duckduckgo-search 处理子问题1
Subagent B → exa + g-search 处理子问题2
Subagent C → context7 / deepwiki / arxiv / semantic-scholar 处理技术细节
Subagent D → wikipedia / wikidata 进行事实基础校验
[Subagent E → PubMed / openalex 如果检测到学术领域]深度(Agent团队): 创建团队:
"research-{slug}"负责人: 分流(阶段0)、协调、判断调和(阶段3)、整合输出(阶段4)
|-- 网络研究员: brave-search, duckduckgo-search, exa, g-search
|-- 技术研究员: context7, deepwiki, arxiv, semantic-scholar, package-version
|-- 内容提取器: fetcher, trafilatura, docling, wikipedia, wayback
|-- [学术研究员: arxiv, semantic-scholar, openalex, crossref, PubMed]
|-- [对抗评审员: 唱反调者——对所有新出现的研究结果进行反向搜索]如果领域信号包含学术/科学相关内容则生成学术研究员,对于穷尽层级或验证复杂度>=2的场景生成对抗评审员。
穷尽: 深度团队 + 每个团队成员内部运行嵌套子Agent阶段。
每个子Agent/团队成员返回结构化研究结果:
json
{
"sub_question": "...",
"findings": [{"claim": "...", "source_url": "...", "source_tool": "...", "excerpt": "...", "confidence_raw": 0.6}],
"leads": ["url1", "url2"],
"gaps": ["无法找到X相关数据"]
}Wave 1.5: Perspective Expansion (Deep/Exhaustive only)
阶段1.5:视角扩展(仅深度/穷尽层级)
STORM-style perspective-guided conversation. Spawn 2-4 perspective subagents:
| Perspective | Focus | Question Style |
|---|---|---|
| Skeptic | What could be wrong? What's missing? | "What evidence would disprove this?" |
| Domain Expert | Technical depth, nuance, edge cases | "What do practitioners actually encounter?" |
| Practitioner | Real-world applicability, trade-offs | "What matters when you actually build this?" |
| Theorist | First principles, abstractions, frameworks | "What underlying model explains this?" |
Each perspective agent reviews Wave 1 findings and generates 2-3 additional sub-questions from their viewpoint. These sub-questions feed into Wave 2.
STORM风格的视角引导对话,生成2-4个不同视角的子Agent:
| 视角 | 关注重点 | 提问风格 |
|---|---|---|
| 怀疑者 | 可能存在什么问题?缺失了什么? | "有什么证据可以反驳这一点?" |
| 领域专家 | 技术深度、细节、边缘案例 | "从业者实际会遇到什么问题?" |
| 实践者 | 实际适用性、权衡 | "真正落地的时候需要注意什么?" |
| 理论家 | 第一性原理、抽象、框架 | "什么底层模型可以解释这一点?" |
每个视角Agent审阅阶段1的研究结果,从各自视角生成2-3个额外的子问题,这些子问题会进入阶段2。
Wave 2: Deep Dive (parallel, targeted)
阶段2:深度调研(并行、针对性)
- Rank leads from Wave 1 by potential value (citation frequency, source authority, relevance)
- Dispatch deep-read subagents — use fetcher/trafilatura/docling to extract full content from top leads
- Follow citation chains — if a source cites another, fetch the original
- Fill gaps — for each gap identified in Wave 1, dispatch targeted searches
- Use thinking MCPs:
- for multi-perspective analysis of complex findings
cascade-thinking - for tracking evidence chains and contradictions
structured-thinking - for complex question decomposition (Standard+ only)
think-strategies
- 按照潜在价值(引用频率、来源权威性、相关性)对阶段1的线索进行排序
- 调度深度阅读子Agent——使用fetcher/trafilatura/docling提取顶级线索的完整内容
- 跟踪引用链——如果某个来源引用了另一个来源,获取原始来源内容
- 填补空白——针对阶段1识别出的每个空白,调度针对性搜索
- 使用思考类MCP:
- 用于复杂研究结果的多视角分析
cascade-thinking - 用于跟踪证据链和矛盾
structured-thinking - 用于复杂问题拆解(仅标准及以上层级)
think-strategies
Wave 3: Cross-Validation (parallel)
阶段3:交叉验证(并行)
The anti-hallucination wave. Read and .
references/confidence-rubric.mdreferences/self-verification.mdFor every claim surviving Waves 1-2:
- Independence check — are supporting sources truly independent? Sources citing each other are NOT independent.
- Counter-search — explicitly search for evidence AGAINST each major claim using a different search engine
- Freshness check — verify sources are current (flag if >1 year old for time-sensitive topics)
- Contradiction scan — read , identify and classify disagreements
references/contradiction-protocol.md - Confidence scoring — assign 0.0-1.0 per
references/confidence-rubric.md - Bias sweep — check each finding against 10 bias categories (7 core + 3 LLM-specific) per
references/bias-detection.md
Self-Verification (3+ findings survive): Spawn devil's advocate subagent per :
references/self-verification.mdFor each finding, attempt to disprove it. Search for counterarguments. Check if evidence is outdated. Verify claims actually follow from cited evidence. Flag LLM confabulations.
Adjust confidence: Survives +0.05, Weakened -0.10, Disproven set to 0.0.
Adjustments are subject to hard caps — single-source claims remain capped at 0.60 even after survival adjustment.
防幻觉阶段,阅读和。
references/confidence-rubric.mdreferences/self-verification.md针对阶段1-2保留的每个断言:
- 独立性检查——支持证据的来源是否真正独立?互相引用的来源不算独立
- 反向搜索——使用不同的搜索引擎明确搜索反对每个核心断言的证据
- 时效性检查——验证来源是否是最新的(对于时间敏感主题,如果来源超过1年需标记)
- 矛盾扫描——阅读,识别并分类分歧
references/contradiction-protocol.md - 置信度打分——按照分配0.0-1.0的分值
references/confidence-rubric.md - 偏见排查——按照对照10种偏见类别(7种核心+3种LLM特定)检查每个研究结果
references/bias-detection.md
自校验(保留3个及以上研究结果时): 按照生成唱反调子Agent:
references/self-verification.md针对每个研究结果,尝试反驳它,搜索反对论点,检查证据是否过时,验证断言是否真的符合引用的证据,标记LLM虚构内容。
调整置信度:验证通过+0.05,证据削弱-0.10,被反驳设为0.0。调整有严格上限——即使通过验证,单来源断言的置信度上限仍为0.60。
Wave 4: Synthesis (always inline, lead only)
阶段4:整合输出(始终内联执行,仅负责人操作)
Produce the final research product. Read for templates.
references/output-formats.mdThe synthesis is NOT a summary. It must:
- Answer directly — answer the user's question clearly
- Map evidence — all verified findings with confidence and citations
- Surface contradictions — where sources disagree, with analysis of why
- Show confidence landscape — what is known confidently, what is uncertain, what is unknown
- Audit biases — biases detected during research
- Identify gaps — what evidence is missing, what further research would help
- Distill takeaways — 3-7 numbered key findings
- Cite sources — full bibliography with provenance
Output format adapts to mode:
- Investigate → Research Brief (Standard) or Deep Report (Deep/Exhaustive)
- Fact-check → Quick Answer with verdict + evidence
- Compare → Decision Matrix
- Survey → Annotated Bibliography
- User can override with
--format brief|deep|bib|matrix
生成最终研究产物,阅读获取模板。
references/output-formats.md整合结果不是简单摘要,必须包含:
- 直接回答——清晰回答用户的问题
- 证据映射——所有经过验证的研究结果,附带置信度和引用
- 矛盾披露——来源存在分歧的地方,以及原因分析
- 置信度全景——哪些是有信心的结论,哪些是不确定的,哪些是未知的
- 偏见审计——研究过程中检测到的偏见
- 空白识别——缺失了哪些证据,哪些进一步研究会有帮助
- 要点提炼——3-7条编号的核心研究结果
- 来源引用——带出处的完整参考文献列表
输出格式适配不同模式:
- 调研 → 研究简报(标准层级)或深度报告(深度/穷尽层级)
- 事实核查 → 带结论+证据的快速回答
- 对比 → 决策矩阵
- 综述 → 带注释的参考文献列表
- 用户可以通过覆盖格式
--format brief|deep|bib|matrix
Confidence Scoring
置信度评分
| Score | Basis |
|---|---|
| 0.9-1.0 | Official docs + 2 independent sources agree, no contradictions |
| 0.7-0.8 | 2+ independent sources agree, minor qualifications |
| 0.5-0.6 | Single authoritative source, or 2 sources with partial agreement |
| 0.3-0.4 | Single non-authoritative source, or conflicting evidence |
| 0.2-0.3 | Multiple non-authoritative sources with partial agreement, or single source with significant caveats |
| 0.1-0.2 | LLM reasoning only, no external evidence found |
| 0.0 | Actively contradicted by evidence |
Hard rules:
- No claim reported at >= 0.7 unless supported by 2+ independent sources
- Single-source claims cap at 0.6 regardless of source authority
- Degraded mode (all research tools unavailable): max confidence 0.4, all findings labeled "unverified"
Merged confidence (for claims supported by multiple sources):
capped at 0.99
c_merged = 1 - (1-c1)(1-c2)...(1-cN)| 分值 | 依据 |
|---|---|
| 0.9-1.0 | 官方文档 + 2个独立来源一致,无矛盾 |
| 0.7-0.8 | 2个及以上独立来源一致,存在少量限定条件 |
| 0.5-0.6 | 单个权威来源,或2个来源部分一致 |
| 0.3-0.4 | 单个非权威来源,或存在冲突证据 |
| 0.2-0.3 | 多个非权威来源部分一致,或单个来源存在重大附加说明 |
| 0.1-0.2 | 仅LLM推理,未找到外部证据 |
| 0.0 | 被证据明确反驳 |
硬性规则:
- 除非有2个及以上独立来源支持,否则任何断言的置信度不得>=0.7
- 无论来源权威性如何,单来源断言的置信度上限为0.6
- 降级模式(所有研究工具不可用):最高置信度0.4,所有研究结果标记为"未验证"
合并置信度(针对多个来源支持的断言):
上限为0.99
c_merged = 1 - (1-c1)(1-c2)...(1-cN)Evidence Chain Structure
证据链结构
Every finding carries this structure:
FINDING RR-{seq:03d}: [claim statement]
CONFIDENCE: [0.0-1.0]
EVIDENCE:
1. [source_tool] [url] [access_timestamp] — [relevant excerpt, max 100 words]
2. [source_tool] [url] [access_timestamp] — [relevant excerpt, max 100 words]
CROSS-VALIDATION: [agrees|contradicts|partial] across [N] independent sources
BIAS MARKERS: [none | list of detected biases with category]
GAPS: [none | what additional evidence would strengthen this finding]Use to normalize.
!uv run python skills/research/scripts/finding-formatter.py --format markdown每个研究结果都遵循以下结构:
FINDING RR-{seq:03d}: [断言内容]
CONFIDENCE: [0.0-1.0]
EVIDENCE:
1. [source_tool] [url] [access_timestamp] — [相关摘录,最多100字]
2. [source_tool] [url] [access_timestamp] — [相关摘录,最多100字]
CROSS-VALIDATION: [一致|矛盾|部分一致] 跨 [N] 个独立来源
BIAS MARKERS: [无 | 检测到的偏见列表及类别]
GAPS: [无 | 哪些额外证据可以强化该研究结果]使用进行格式标准化。
!uv run python skills/research/scripts/finding-formatter.py --format markdownSource Selection
来源选择
Read during Wave 0 for the full tool-to-domain mapping. Summary:
references/source-selection.md| Domain Signal | Primary Tools | Secondary Tools |
|---|---|---|
| Library/API docs | context7, deepwiki, package-version | brave-search |
| Academic/scientific | arxiv, semantic-scholar, PubMed, openalex | crossref, brave-search |
| Current events/trends | brave-search, exa, duckduckgo-search, g-search | fetcher, trafilatura |
| GitHub repos/OSS | deepwiki, repomix | brave-search |
| General knowledge | wikipedia, wikidata, brave-search | fetcher |
| Historical content | wayback, brave-search | fetcher |
| Fact-checking | 3+ search engines mandatory | wikidata for structured claims |
| PDF/document analysis | docling | trafilatura |
Multi-engine protocol: For any claim requiring verification, use minimum 2 different search engines. Different engines have different indices and biases. Agreement across engines increases confidence.
阶段0期间阅读获取完整的工具到领域映射,摘要如下:
references/source-selection.md| 领域信号 | 主要工具 | 次要工具 |
|---|---|---|
| 库/API文档 | context7, deepwiki, package-version | brave-search |
| 学术/科学 | arxiv, semantic-scholar, PubMed, openalex | crossref, brave-search |
| 时事/趋势 | brave-search, exa, duckduckgo-search, g-search | fetcher, trafilatura |
| GitHub仓库/开源软件 | deepwiki, repomix | brave-search |
| 通用知识 | wikipedia, wikidata, brave-search | fetcher |
| 历史内容 | wayback, brave-search | fetcher |
| 事实核查 | 强制使用3个及以上搜索引擎 | wikidata 用于结构化断言 |
| PDF/文档分析 | docling | trafilatura |
多引擎规则: 任何需要验证的断言,最少使用2个不同的搜索引擎。不同引擎有不同的索引和偏见,跨引擎一致会提升置信度。
Bias Detection
偏见检测
Check every finding against 10 bias categories. Read for full detection signals and mitigation strategies.
references/bias-detection.md| Bias | Detection Signal | Mitigation |
|---|---|---|
| LLM prior | Matches common training patterns, lacks fresh evidence | Flag; require fresh source confirmation |
| Recency | Overweighting recent results, ignoring historical context | Search for historical perspective |
| Authority | Uncritically accepting prestigious sources | Cross-validate even authoritative claims |
| Confirmation | Queries constructed to confirm initial hypothesis | Use neutral queries; search for counterarguments |
| Survivorship | Only finding successful examples | Search for failures/counterexamples |
| Selection | Search engine bubble, English-only | Use multiple engines; note coverage limitations |
| Anchoring | First source disproportionately shapes interpretation | Document first source separately; seek contrast |
对照10种偏见类别检查每个研究结果,阅读获取完整的检测信号和缓解策略。
references/bias-detection.md| 偏见 | 检测信号 | 缓解策略 |
|---|---|---|
| LLM先验 | 匹配常见训练模式,缺乏最新证据 | 标记;要求最新来源确认 |
| 时效性偏见 | 过度重视近期结果,忽略历史背景 | 搜索历史视角内容 |
| 权威性偏见 | 不加批判地接受知名来源内容 | 即使是权威断言也要交叉验证 |
| 确认偏见 | 查询构造倾向于确认初始假设 | 使用中立查询;搜索反对论点 |
| 幸存者偏差 | 只找到成功案例 | 搜索失败/反例 |
| 选择偏见 | 搜索引擎过滤泡、仅英文内容 | 使用多个引擎;注明覆盖范围限制 |
| 锚定偏见 | 第一个来源过度影响解读 | 单独记录第一个来源;寻找对比内容 |
State Management
状态管理
- Journal path:
~/.claude/research/ - Archive path:
~/.claude/research/archive/ - Filename convention:
{YYYY-MM-DD}-{domain}-{slug}.md- :
{domain},tech,academic,market,policy,factcheck,compare,survey,trackgeneral - : 3-5 word semantic summary, kebab-case
{slug} - Collision: append ,
-v2-v3
- Format: YAML frontmatter + markdown body + blocks
<!-- STATE -->
Save protocol:
- Quick: save once at end with
status: Complete - Standard/Deep/Exhaustive: save after Wave 1 with , update after each wave, finalize after synthesis
status: In Progress
Resume protocol:
- (no args): find
resumejournals. One → auto-resume. Multiple → show list.status: In Progress - : Nth journal from
resume Noutput (reverse chronological).list - : search frontmatter
resume keywordandqueryfor match.domain_tags
Use for all journal operations.
!uv run python skills/research/scripts/journal-store.pyState snapshot (appended after each wave save):
html
<!-- STATE
wave_completed: 2
findings_count: 12
leads_pending: ["url1", "url2"]
gaps: ["topic X needs more sources"]
contradictions: 1
next_action: "Wave 3: cross-validate top 8 findings"
-->- 研究记录路径:
~/.claude/research/ - 归档路径:
~/.claude/research/archive/ - 文件名规则:
{YYYY-MM-DD}-{domain}-{slug}.md- :
{domain},tech,academic,market,policy,factcheck,compare,survey,trackgeneral - : 3-5个词的语义摘要,短横线分隔
{slug} - 重名:追加 ,
-v2-v3
- 格式: YAML frontmatter + markdown 正文 + 块
<!-- STATE -->
保存规则:
- 快速级别:结束时保存一次,
status: Complete - 标准/深度/穷尽级别:阶段1完成后保存,,每个阶段结束后更新,整合完成后最终保存
status: In Progress
恢复规则:
- (无参数):查找
resume的研究记录,仅1条则自动恢复,多条则展示列表status: In Progress - :
resume N输出中的第N条研究记录(倒序排列)list - :搜索frontmatter中的
resume keyword和query匹配项domain_tags
所有研究记录操作使用。
!uv run python skills/research/scripts/journal-store.py状态快照(每个阶段保存后追加):
html
<!-- STATE
wave_completed: 2
findings_count: 12
leads_pending: ["url1", "url2"]
gaps: ["主题X需要更多来源"]
contradictions: 1
next_action: "阶段3:交叉验证前8个研究结果"
-->In-Session Commands (Deep/Exhaustive)
会话中命令(深度/穷尽层级)
Available during active research sessions:
| Command | Effect |
|---|---|
| Deep dive into a specific finding with more sources |
| Redirect research to a new sub-question |
| Explicitly search for evidence against a finding |
| Render HTML dashboard |
| Show current research state without advancing |
| List all sources consulted so far |
| Show confidence distribution across findings |
| List identified knowledge gaps |
| Show command menu |
Read for full protocols.
references/session-commands.md活跃研究会话期间可用:
| 命令 | 效果 |
|---|---|
| 针对特定研究结果调用更多来源进行深度调研 |
| 将研究转向新的子问题 |
| 明确搜索反对某个研究结果的证据 |
| 渲染HTML仪表板 |
| 展示当前研究状态,不推进流程 |
| 列出目前参考的所有来源 |
| 展示研究结果的置信度分布 |
| 列出识别出的知识空白 |
| 展示命令菜单 |
阅读获取完整规则。
references/session-commands.mdReference File Index
参考文件索引
| File | Content | Read When |
|---|---|---|
| Tool-to-domain mapping, multi-engine protocol, degraded mode | Wave 0 (selecting tools) |
| Scoring rubric, cross-validation rules, independence checks | Wave 3 (assigning confidence) |
| Finding template, provenance format, citation standards | Any wave (structuring evidence) |
| 10 bias categories (7 core + 3 LLM-specific), detection signals, mitigation strategies | Wave 3 (bias audit) |
| 4 contradiction types, resolution framework | Wave 3 (contradiction detection) |
| Devil's advocate protocol, hallucination detection | Wave 3 (self-verification) |
| Templates for all 5 output formats | Wave 4 (formatting output) |
| Team archetypes, subagent prompts, perspective agents | Wave 0 (designing team) |
| In-session command protocols | When user issues in-session command |
| JSON data contract for HTML dashboard | |
Loading rule: Load ONE reference at a time per the "Read When" column. Do not preload.
| 文件 | 内容 | 读取时机 |
|---|---|---|
| 工具到领域的映射、多引擎规则、降级模式 | 阶段0(选择工具时) |
| 评分规则、交叉验证规则、独立性检查 | 阶段3(分配置信度时) |
| 研究结果模板、出处格式、引用标准 | 所有阶段(结构化证据时) |
| 10种偏见类别(7种核心+3种LLM特定)、检测信号、缓解策略 | 阶段3(偏见审计时) |
| 4种矛盾类型、解决框架 | 阶段3(矛盾检测时) |
| 唱反调规则、幻觉检测 | 阶段3(自校验时) |
| 所有5种输出格式的模板 | 阶段4(格式化输出时) |
| 团队原型、子Agent提示词、视角Agent | 阶段0(设计团队时) |
| 会话中命令规则 | 用户发出会话中命令时 |
| HTML仪表板的JSON数据契约 | |
加载规则: 按照"读取时机"列每次仅加载一个参考文件,不要预加载。
Critical Rules
核心规则
- No claim >= 0.7 unless supported by 2+ independent sources — single-source claims cap at 0.6
- Never fabricate citations — if URL, author, title, or date cannot be verified, use vague attribution ("a study in this tradition") rather than inventing specifics
- Always surface contradictions explicitly — never silently resolve disagreements; present both sides with evidence
- Always present triage scoring before executing research — user must see and can override complexity tier
- Save journal after every wave in Deep/Exhaustive mode — enables resume after interruption
- Never skip Wave 3 (cross-validation) for Standard/Deep/Exhaustive tiers — this is the anti-hallucination mechanism
- Multi-engine search is mandatory for fact-checking — use minimum 2 different search tools (e.g., brave-search + duckduckgo-search)
- Apply the Accounting Rule after every parallel dispatch — N dispatched = N accounted for before proceeding to next wave
- Distinguish facts from interpretations in all output — factual claims carry evidence; interpretive claims are explicitly labeled as analysis
- Flag all LLM-prior findings — claims matching common training data but lacking fresh evidence must be flagged with bias marker
- Max confidence 0.4 in degraded mode — when all research tools are unavailable, report all findings as "unverified — based on training knowledge"
- Load ONE reference file at a time — do not preload all references into context
- Track mode must load prior journal before searching — avoid re-researching what is already known
- The synthesis is not a summary — it must integrate findings into novel analysis, identify patterns across sources, and surface emergent insights not present in any single source
- PreToolUse Edit hook is non-negotiable — the research skill never modifies source files; it only creates/updates journals in
~/.claude/research/
- 除非有2个及以上独立来源支持,否则任何断言的置信度不得>=0.7——单来源断言上限为0.6
- 绝不虚构引用——如果URL、作者、标题或日期无法验证,使用模糊归属("该领域的一项研究")而不是编造具体信息
- 始终明确披露矛盾——绝不默默解决分歧,同时展示双方的证据
- 执行研究前始终展示分流评分——用户必须看到并可以覆盖复杂度层级
- 深度/穷尽模式下每个阶段结束后保存研究记录——支持中断后恢复
- 标准/深度/穷尽层级绝不跳过阶段3(交叉验证)——这是防幻觉机制
- 事实核查必须使用多引擎搜索——最少使用2个不同的搜索工具(例如brave-search + duckduckgo-search)
- 每次并行调度后应用计数规则——进入下一阶段前必须确认N个调度的任务都已返回结果
- 所有输出中区分事实和解读——事实断言附带证据,解读性断言明确标记为分析
- 所有LLM先验研究结果必须标记——符合常见训练数据但缺乏最新证据的断言必须附带偏见标记
- 降级模式下最高置信度为0.4——所有研究工具不可用时,所有研究结果标注为"未验证——基于训练知识"
- 每次仅加载一个参考文件——不要将所有参考文件预加载到上下文
- 跟踪模式必须先加载之前的研究记录再搜索——避免重复研究已有内容
- 整合结果不是摘要——必须将研究结果整合为创新性分析,识别跨来源的模式,挖掘任何单个来源都不存在的新洞察
- PreToolUse编辑钩子是强制要求——研究技能绝不修改源文件,仅在路径下创建/更新研究记录
~/.claude/research/