research

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Deep Research

深度研究

General-purpose deep research with multi-source synthesis, confidence scoring, and anti-hallucination verification. Adopts SOTA patterns from OpenAI Deep Research (multi-agent triage pipeline), Google Gemini Deep Research (user-reviewable plans), STORM (perspective-guided conversations), Perplexity (source confidence ratings), and LangChain ODR (supervisor-researcher with reflection).
通用型深度研究工具,具备多源信息整合、置信度评分和防幻觉校验能力,采用了来自OpenAI Deep Research(多Agent分流流水线)、Google Gemini Deep Research(支持用户审核的研究计划)、STORM(视角引导对话)、Perplexity(来源置信度评级)和LangChain ODR(带反思的督导-研究员模式)的业界最优实践。

Vocabulary

词汇表

TermDefinition
queryThe user's research question or topic; the unit of investigation
claimA discrete assertion to be verified; extracted from sources or user input
sourceA specific origin of information: URL, document, database record, or API response
evidenceA source-backed datum supporting or contradicting a claim; always has provenance
provenanceThe chain from evidence to source: tool used, URL, access timestamp, excerpt
confidenceScore 0.0-1.0 per claim; based on evidence strength and cross-validation
cross-validationVerifying a claim across 2+ independent sources; the core anti-hallucination mechanism
triangulationConfirming a finding using 3+ methodologically diverse sources
contradictionWhen two credible sources assert incompatible claims; must be surfaced explicitly
synthesisThe final research product: not a summary but a novel integration of evidence with analysis
journalThe saved markdown record of a research session, stored in
~/.claude/research/
sweepWave 1: broad parallel search across multiple tools and sources
deep diveWave 2: targeted follow-up on specific leads from the sweep
leadA promising source or thread identified during the sweep, warranting deeper investigation
tierComplexity classification: Quick (0-2), Standard (3-5), Deep (6-8), Exhaustive (9-10)
findingA verified claim with evidence chain, confidence score, and provenance; the atomic unit of output
gapAn identified area where evidence is insufficient, contradictory, or absent
bias markerAn explicit flag on a finding indicating potential bias (recency, authority, LLM prior, etc.)
degraded modeOperation when research tools are unavailable; confidence ceilings applied
术语定义
query用户的研究问题或主题,是调研的基本单位
claim待验证的离散断言,从信息来源或用户输入中提取
source具体的信息来源:URL、文档、数据库记录或API响应
evidence支持或反驳某一断言的来源背书数据,始终包含出处信息
provenance证据到来源的追溯链:使用的工具、URL、访问时间戳、摘录内容
confidence每个断言的0.0-1.0分值,基于证据强度和交叉验证结果计算
cross-validation跨2个及以上独立来源验证断言,是核心的防幻觉机制
triangulation使用3个及以上方法学上差异化的来源确认研究结果
contradiction两个可信来源提出互不相容的断言,必须明确披露
synthesis最终研究产物:不是简单摘要,而是结合分析的证据创新性整合成果
journal研究会话的保存markdown记录,存储在
~/.claude/research/
路径下
sweep第一阶段:跨多个工具和来源的广泛并行搜索
deep dive第二阶段:对第一阶段发现的特定线索进行针对性跟进
lead第一阶段识别出的有价值来源或线索,值得进一步深度调研
tier复杂度分类:快速(0-2)、标准(3-5)、深度(6-8)、穷尽(9-10)
finding经过验证的断言,附带证据链、置信度评分和出处,是输出的原子单位
gap识别出的证据不足、矛盾或缺失的领域
bias marker研究结果上的明确标记,表明存在潜在偏见(时效性、权威性、LLM先验等)
degraded mode研究工具不可用时的运行模式,会施加置信度上限

Dispatch

指令分发

$ARGUMENTS
Action
Question or topic text (has verb or
?
)
Investigate — classify complexity, execute wave pipeline
Vague input (<5 words, no verb, no
?
)
Intake — ask 2-3 clarifying questions, then classify
check <claim>
or
verify <claim>
Fact-check — verify claim against 3+ search engines
compare <A> vs <B> [vs <C>...]
Compare — structured comparison with decision matrix output
survey <field or topic>
Survey — landscape mapping, annotated bibliography
track <topic>
Track — load prior journal, search for updates since last session
resume [number or keyword]
Resume — resume a saved research session
list [active|domain|tier]
List — show journal metadata table
archive
Archive — move journals older than 90 days
delete <N>
Delete — delete journal N with confirmation
export [N]
Export — render HTML dashboard for journal N (default: current)
EmptyGallery — show topic examples + "ask me anything" prompt
$ARGUMENTS
操作
问题或主题文本(包含动词或
?
调研 — 分类复杂度,执行阶段流水线
模糊输入(少于5个词,无动词,无
?
信息收集 — 询问2-3个澄清问题,之后进行分类
check <claim>
verify <claim>
事实核查 — 跨3个及以上搜索引擎验证断言
compare <A> vs <B> [vs <C>...]
对比 — 输出带决策矩阵的结构化对比结果
survey <field or topic>
综述 — 领域全景梳理,带注释的参考文献列表
track <topic>
跟踪 — 加载之前的研究记录,搜索上次会话之后的更新内容
resume [number or keyword]
恢复 — 恢复已保存的研究会话
list [active|domain|tier]
列表 — 展示研究记录元数据表
archive
归档 — 移动超过90天的研究记录到归档目录
delete <N>
删除 — 经确认后删除第N条研究记录
export [N]
导出 — 为第N条研究记录渲染HTML仪表板(默认:当前会话)
空输入示例展示 — 展示主题示例 + "请问我任何问题"提示

Auto-Detection Heuristic

自动检测规则

If no mode keyword matches:
  1. Ends with
    ?
    or starts with question word (who/what/when/where/why/how/is/are/can/does/should/will) → Investigate
  2. Contains
    vs
    ,
    versus
    ,
    compared to
    ,
    or
    between noun phrases → Compare
  3. Declarative statement with factual claim, no question syntax → Fact-check
  4. Broad field name with no specific question → ask: "Investigate a specific question, or survey the entire field?"
  5. Ambiguous → ask: "Would you like me to investigate this question, verify this claim, or survey this field?"
如果没有匹配到模式关键词:
  1. ?
    结尾或以疑问词(who/what/when/where/why/how/is/are/can/does/should/will)开头 → 调研
  2. 名词短语之间包含
    vs
    versus
    compared to
    or
    对比
  3. 带事实断言的陈述性语句,无疑问句式 → 事实核查
  4. 宽泛的领域名称,无具体问题 → 询问:"想要调研某个具体问题,还是综述整个领域?"
  5. 含义模糊 → 询问:"您希望我调研这个问题、验证这个断言,还是综述这个领域?"

Gallery (Empty Arguments)

示例展示(空参数)

Present research examples spanning domains:
#DomainExampleLikely Tier
1Technology"What are the current best practices for LLM agent architectures?"Deep
2Academic"What is the state of evidence on intermittent fasting for longevity?"Standard
3Market"How does the competitive landscape for vector databases compare?"Deep
4Fact-check"Is it true that 90% of startups fail within the first year?"Standard
5Architecture"When should you choose event sourcing over CRUD?"Standard
6Trends"What emerging programming languages gained traction in 2025-2026?"Standard
Pick a number, paste your own question, or type
guide me
.
展示跨领域的研究示例:
#领域示例可能层级
1技术"LLM Agent架构当前的最佳实践有哪些?"深度
2学术"间歇性 fasting 对长寿的证据现状如何?"标准
3市场"向量数据库的竞争格局对比情况如何?"深度
4事实核查"90%的初创公司会在第一年倒闭是真的吗?"标准
5架构"什么时候应该选择事件溯源而不是CRUD?"标准
6趋势"2025-2026年有哪些新兴编程语言获得了关注?"标准
选择一个编号,粘贴你自己的问题,或者输入
guide me

Skill Awareness

技能适配检查

Before starting research, check if another skill is a better fit:
SignalRedirect
Code review, PR review, diff analysisSuggest
/honest-review
Strategic decision with adversaries, game theorySuggest
/wargame
Multi-perspective expert debateSuggest
/host-panel
Prompt optimization, model-specific promptingSuggest
/prompt-engineer
If the user confirms they want general research, proceed.
在开始研究之前,检查是否有其他技能更适合当前需求:
信号跳转建议
代码评审、PR评审、diff分析建议使用
/honest-review
带对抗方的战略决策、博弈论相关建议使用
/wargame
多视角专家辩论建议使用
/host-panel
Prompt优化、特定模型的prompt编写建议使用
/prompt-engineer
如果用户确认需要通用研究,再继续执行。

Complexity Classification

复杂度分类

Score the query on 5 dimensions (0-2 each, total 0-10):
Dimension012
Scope breadthSingle fact/definitionMulti-faceted, 2-3 domainsCross-disciplinary, 4+ domains
Source difficultyTop search results sufficeSpecialized databases or multiple source typesPaywalled, fragmented, or conflicting sources
Temporal sensitivityStable/historicalEvolving field (months matter)Fast-moving (days/weeks matter), active controversy
Verification complexityEasily verifiable (official docs)2-3 independent sources neededContested claims, expert disagreement, no consensus
Synthesis demandAnswer is a fact or listCompare/contrast viewpointsNovel integration of conflicting threads
TotalTierStrategy
0-2QuickInline, 1-2 searches, fire-and-forget
3-5StandardSubagent wave, 3-5 parallel searchers, report delivered
6-8DeepAgent team (TeamCreate), 3-5 teammates, interactive session
9-10ExhaustiveAgent team, 4-6 teammates + nested subagent waves, interactive
Present the scoring to the user. User can override tier with
--depth <tier>
.
从5个维度为查询打分(每个维度0-2分,总分0-10):
维度0分1分2分
范围广度单个事实/定义多层面,涉及2-3个领域跨学科,涉及4个及以上领域
来源获取难度顶部搜索结果即可满足需要专业数据库或多种来源类型付费墙、碎片化或相互冲突的来源
时间敏感性稳定/历史内容演进中的领域(以月为单位变化)快速变化(以天/周为单位变化),存在活跃争议
验证复杂度容易验证(官方文档即可)需要2-3个独立来源存在争议的断言、专家意见分歧、无共识
整合需求答案是事实或列表对比/对照不同观点需要创新性整合相互冲突的线索
总分层级执行策略
0-2快速内联执行,1-2次搜索,即发即得
3-5标准子Agent阶段执行,3-5个并行搜索器,交付报告
6-8深度Agent团队(TeamCreate),3-5个协作成员,交互式会话
9-10穷尽Agent团队,4-6个协作成员 + 嵌套子Agent阶段,交互式
向用户展示打分结果,用户可以通过
--depth <tier>
参数覆盖层级。

Wave Pipeline

阶段流水线

All non-Quick research follows this 5-wave pipeline. Quick merges Waves 0+1+4 inline.
所有非快速级别的研究都遵循以下5阶段流水线,快速级别会内联合并阶段0+1+4。

Wave 0: Triage (always inline, never parallelized)

阶段0:分流(始终内联执行,从不并行)

  1. Run
    !uv run python skills/research/scripts/research-scanner.py "$ARGUMENTS"
    for deterministic pre-scan
  2. Decompose query into 2-5 sub-questions
  3. Score complexity on the 5-dimension rubric
  4. Check tool availability — probe key MCP tools; set degraded mode flags and confidence ceilings per
    references/source-selection.md
  5. Select tools per domain signals — read
    references/source-selection.md
  6. Check for existing journals — if
    track
    or
    resume
    , load prior state
  7. Present triage to user — show: complexity score, sub-questions, planned strategy, estimated tier. User may override.
  1. 执行
    !uv run python skills/research/scripts/research-scanner.py "$ARGUMENTS"
    进行确定性预扫描
  2. 将查询拆解为2-5个子问题
  3. 按照5维度评分规则进行复杂度打分
  4. 检查工具可用性——探测核心MCP工具;按照
    references/source-selection.md
    设置降级模式标志和置信度上限
  5. 按照领域信号选择工具——阅读
    references/source-selection.md
  6. 检查是否存在已有研究记录——如果是
    track
    resume
    指令,加载之前的状态
  7. 向用户展示分流结果——展示:复杂度得分、子问题、计划策略、预估层级,用户可覆盖调整

Wave 1: Broad Sweep (parallel)

阶段1:广泛扫描(并行)

Scale by tier:
Quick (inline): 1-2 tool calls sequentially. No subagents.
Standard (subagent wave): Dispatch 3-5 parallel subagents via Task tool:
Subagent A → brave-search + duckduckgo-search for sub-question 1
Subagent B → exa + g-search for sub-question 2
Subagent C → context7 / deepwiki / arxiv / semantic-scholar for technical specifics
Subagent D → wikipedia / wikidata for factual grounding
[Subagent E → PubMed / openalex if academic domain detected]
Deep (agent team): TeamCreate
"research-{slug}"
:
Lead: triage (Wave 0), orchestrate, judge reconcile (Wave 3), synthesize (Wave 4)
  |-- web-researcher:       brave-search, duckduckgo-search, exa, g-search
  |-- tech-researcher:      context7, deepwiki, arxiv, semantic-scholar, package-version
  |-- content-extractor:    fetcher, trafilatura, docling, wikipedia, wayback
  |-- [academic-researcher: arxiv, semantic-scholar, openalex, crossref, PubMed]
  |-- [adversarial-reviewer: devil's advocate — counter-search all emerging findings]
Spawn academic-researcher if domain signals include academic/scientific. Spawn adversarial-reviewer for Exhaustive tier or if verification complexity >= 2.
Exhaustive: Deep team + each teammate runs nested subagent waves internally.
Each subagent/teammate returns structured findings:
json
{
  "sub_question": "...",
  "findings": [{"claim": "...", "source_url": "...", "source_tool": "...", "excerpt": "...", "confidence_raw": 0.6}],
  "leads": ["url1", "url2"],
  "gaps": ["could not find data on X"]
}
根据层级扩展规模:
快速(内联): 顺序执行1-2次工具调用,无子Agent。
标准(子Agent阶段): 通过Task工具调度3-5个并行子Agent:
Subagent A → brave-search + duckduckgo-search 处理子问题1
Subagent B → exa + g-search 处理子问题2
Subagent C → context7 / deepwiki / arxiv / semantic-scholar 处理技术细节
Subagent D → wikipedia / wikidata 进行事实基础校验
[Subagent E → PubMed / openalex 如果检测到学术领域]
深度(Agent团队): 创建
"research-{slug}"
团队:
负责人: 分流(阶段0)、协调、判断调和(阶段3)、整合输出(阶段4)
  |-- 网络研究员:       brave-search, duckduckgo-search, exa, g-search
  |-- 技术研究员:      context7, deepwiki, arxiv, semantic-scholar, package-version
  |-- 内容提取器:    fetcher, trafilatura, docling, wikipedia, wayback
  |-- [学术研究员: arxiv, semantic-scholar, openalex, crossref, PubMed]
  |-- [对抗评审员: 唱反调者——对所有新出现的研究结果进行反向搜索]
如果领域信号包含学术/科学相关内容则生成学术研究员,对于穷尽层级或验证复杂度>=2的场景生成对抗评审员。
穷尽: 深度团队 + 每个团队成员内部运行嵌套子Agent阶段。
每个子Agent/团队成员返回结构化研究结果:
json
{
  "sub_question": "...",
  "findings": [{"claim": "...", "source_url": "...", "source_tool": "...", "excerpt": "...", "confidence_raw": 0.6}],
  "leads": ["url1", "url2"],
  "gaps": ["无法找到X相关数据"]
}

Wave 1.5: Perspective Expansion (Deep/Exhaustive only)

阶段1.5:视角扩展(仅深度/穷尽层级)

STORM-style perspective-guided conversation. Spawn 2-4 perspective subagents:
PerspectiveFocusQuestion Style
SkepticWhat could be wrong? What's missing?"What evidence would disprove this?"
Domain ExpertTechnical depth, nuance, edge cases"What do practitioners actually encounter?"
PractitionerReal-world applicability, trade-offs"What matters when you actually build this?"
TheoristFirst principles, abstractions, frameworks"What underlying model explains this?"
Each perspective agent reviews Wave 1 findings and generates 2-3 additional sub-questions from their viewpoint. These sub-questions feed into Wave 2.
STORM风格的视角引导对话,生成2-4个不同视角的子Agent:
视角关注重点提问风格
怀疑者可能存在什么问题?缺失了什么?"有什么证据可以反驳这一点?"
领域专家技术深度、细节、边缘案例"从业者实际会遇到什么问题?"
实践者实际适用性、权衡"真正落地的时候需要注意什么?"
理论家第一性原理、抽象、框架"什么底层模型可以解释这一点?"
每个视角Agent审阅阶段1的研究结果,从各自视角生成2-3个额外的子问题,这些子问题会进入阶段2。

Wave 2: Deep Dive (parallel, targeted)

阶段2:深度调研(并行、针对性)

  1. Rank leads from Wave 1 by potential value (citation frequency, source authority, relevance)
  2. Dispatch deep-read subagents — use fetcher/trafilatura/docling to extract full content from top leads
  3. Follow citation chains — if a source cites another, fetch the original
  4. Fill gaps — for each gap identified in Wave 1, dispatch targeted searches
  5. Use thinking MCPs:
    • cascade-thinking
      for multi-perspective analysis of complex findings
    • structured-thinking
      for tracking evidence chains and contradictions
    • think-strategies
      for complex question decomposition (Standard+ only)
  1. 按照潜在价值(引用频率、来源权威性、相关性)对阶段1的线索进行排序
  2. 调度深度阅读子Agent——使用fetcher/trafilatura/docling提取顶级线索的完整内容
  3. 跟踪引用链——如果某个来源引用了另一个来源,获取原始来源内容
  4. 填补空白——针对阶段1识别出的每个空白,调度针对性搜索
  5. 使用思考类MCP:
    • cascade-thinking
      用于复杂研究结果的多视角分析
    • structured-thinking
      用于跟踪证据链和矛盾
    • think-strategies
      用于复杂问题拆解(仅标准及以上层级)

Wave 3: Cross-Validation (parallel)

阶段3:交叉验证(并行)

The anti-hallucination wave. Read
references/confidence-rubric.md
and
references/self-verification.md
.
For every claim surviving Waves 1-2:
  1. Independence check — are supporting sources truly independent? Sources citing each other are NOT independent.
  2. Counter-search — explicitly search for evidence AGAINST each major claim using a different search engine
  3. Freshness check — verify sources are current (flag if >1 year old for time-sensitive topics)
  4. Contradiction scan — read
    references/contradiction-protocol.md
    , identify and classify disagreements
  5. Confidence scoring — assign 0.0-1.0 per
    references/confidence-rubric.md
  6. Bias sweep — check each finding against 10 bias categories (7 core + 3 LLM-specific) per
    references/bias-detection.md
Self-Verification (3+ findings survive): Spawn devil's advocate subagent per
references/self-verification.md
:
For each finding, attempt to disprove it. Search for counterarguments. Check if evidence is outdated. Verify claims actually follow from cited evidence. Flag LLM confabulations.
Adjust confidence: Survives +0.05, Weakened -0.10, Disproven set to 0.0. Adjustments are subject to hard caps — single-source claims remain capped at 0.60 even after survival adjustment.
防幻觉阶段,阅读
references/confidence-rubric.md
references/self-verification.md
针对阶段1-2保留的每个断言:
  1. 独立性检查——支持证据的来源是否真正独立?互相引用的来源不算独立
  2. 反向搜索——使用不同的搜索引擎明确搜索反对每个核心断言的证据
  3. 时效性检查——验证来源是否是最新的(对于时间敏感主题,如果来源超过1年需标记)
  4. 矛盾扫描——阅读
    references/contradiction-protocol.md
    ,识别并分类分歧
  5. 置信度打分——按照
    references/confidence-rubric.md
    分配0.0-1.0的分值
  6. 偏见排查——按照
    references/bias-detection.md
    对照10种偏见类别(7种核心+3种LLM特定)检查每个研究结果
自校验(保留3个及以上研究结果时): 按照
references/self-verification.md
生成唱反调子Agent:
针对每个研究结果,尝试反驳它,搜索反对论点,检查证据是否过时,验证断言是否真的符合引用的证据,标记LLM虚构内容。
调整置信度:验证通过+0.05,证据削弱-0.10,被反驳设为0.0。调整有严格上限——即使通过验证,单来源断言的置信度上限仍为0.60。

Wave 4: Synthesis (always inline, lead only)

阶段4:整合输出(始终内联执行,仅负责人操作)

Produce the final research product. Read
references/output-formats.md
for templates.
The synthesis is NOT a summary. It must:
  1. Answer directly — answer the user's question clearly
  2. Map evidence — all verified findings with confidence and citations
  3. Surface contradictions — where sources disagree, with analysis of why
  4. Show confidence landscape — what is known confidently, what is uncertain, what is unknown
  5. Audit biases — biases detected during research
  6. Identify gaps — what evidence is missing, what further research would help
  7. Distill takeaways — 3-7 numbered key findings
  8. Cite sources — full bibliography with provenance
Output format adapts to mode:
  • Investigate → Research Brief (Standard) or Deep Report (Deep/Exhaustive)
  • Fact-check → Quick Answer with verdict + evidence
  • Compare → Decision Matrix
  • Survey → Annotated Bibliography
  • User can override with
    --format brief|deep|bib|matrix
生成最终研究产物,阅读
references/output-formats.md
获取模板。
整合结果不是简单摘要,必须包含:
  1. 直接回答——清晰回答用户的问题
  2. 证据映射——所有经过验证的研究结果,附带置信度和引用
  3. 矛盾披露——来源存在分歧的地方,以及原因分析
  4. 置信度全景——哪些是有信心的结论,哪些是不确定的,哪些是未知的
  5. 偏见审计——研究过程中检测到的偏见
  6. 空白识别——缺失了哪些证据,哪些进一步研究会有帮助
  7. 要点提炼——3-7条编号的核心研究结果
  8. 来源引用——带出处的完整参考文献列表
输出格式适配不同模式:
  • 调研 → 研究简报(标准层级)或深度报告(深度/穷尽层级)
  • 事实核查 → 带结论+证据的快速回答
  • 对比 → 决策矩阵
  • 综述 → 带注释的参考文献列表
  • 用户可以通过
    --format brief|deep|bib|matrix
    覆盖格式

Confidence Scoring

置信度评分

ScoreBasis
0.9-1.0Official docs + 2 independent sources agree, no contradictions
0.7-0.82+ independent sources agree, minor qualifications
0.5-0.6Single authoritative source, or 2 sources with partial agreement
0.3-0.4Single non-authoritative source, or conflicting evidence
0.2-0.3Multiple non-authoritative sources with partial agreement, or single source with significant caveats
0.1-0.2LLM reasoning only, no external evidence found
0.0Actively contradicted by evidence
Hard rules:
  • No claim reported at >= 0.7 unless supported by 2+ independent sources
  • Single-source claims cap at 0.6 regardless of source authority
  • Degraded mode (all research tools unavailable): max confidence 0.4, all findings labeled "unverified"
Merged confidence (for claims supported by multiple sources):
c_merged = 1 - (1-c1)(1-c2)...(1-cN)
capped at 0.99
分值依据
0.9-1.0官方文档 + 2个独立来源一致,无矛盾
0.7-0.82个及以上独立来源一致,存在少量限定条件
0.5-0.6单个权威来源,或2个来源部分一致
0.3-0.4单个非权威来源,或存在冲突证据
0.2-0.3多个非权威来源部分一致,或单个来源存在重大附加说明
0.1-0.2仅LLM推理,未找到外部证据
0.0被证据明确反驳
硬性规则:
  • 除非有2个及以上独立来源支持,否则任何断言的置信度不得>=0.7
  • 无论来源权威性如何,单来源断言的置信度上限为0.6
  • 降级模式(所有研究工具不可用):最高置信度0.4,所有研究结果标记为"未验证"
合并置信度(针对多个来源支持的断言):
c_merged = 1 - (1-c1)(1-c2)...(1-cN)
上限为0.99

Evidence Chain Structure

证据链结构

Every finding carries this structure:
FINDING RR-{seq:03d}: [claim statement]
  CONFIDENCE: [0.0-1.0]
  EVIDENCE:
    1. [source_tool] [url] [access_timestamp] — [relevant excerpt, max 100 words]
    2. [source_tool] [url] [access_timestamp] — [relevant excerpt, max 100 words]
  CROSS-VALIDATION: [agrees|contradicts|partial] across [N] independent sources
  BIAS MARKERS: [none | list of detected biases with category]
  GAPS: [none | what additional evidence would strengthen this finding]
Use
!uv run python skills/research/scripts/finding-formatter.py --format markdown
to normalize.
每个研究结果都遵循以下结构:
FINDING RR-{seq:03d}: [断言内容]
  CONFIDENCE: [0.0-1.0]
  EVIDENCE:
    1. [source_tool] [url] [access_timestamp] — [相关摘录,最多100字]
    2. [source_tool] [url] [access_timestamp] — [相关摘录,最多100字]
  CROSS-VALIDATION: [一致|矛盾|部分一致] 跨 [N] 个独立来源
  BIAS MARKERS: [无 | 检测到的偏见列表及类别]
  GAPS: [无 | 哪些额外证据可以强化该研究结果]
使用
!uv run python skills/research/scripts/finding-formatter.py --format markdown
进行格式标准化。

Source Selection

来源选择

Read
references/source-selection.md
during Wave 0 for the full tool-to-domain mapping. Summary:
Domain SignalPrimary ToolsSecondary Tools
Library/API docscontext7, deepwiki, package-versionbrave-search
Academic/scientificarxiv, semantic-scholar, PubMed, openalexcrossref, brave-search
Current events/trendsbrave-search, exa, duckduckgo-search, g-searchfetcher, trafilatura
GitHub repos/OSSdeepwiki, repomixbrave-search
General knowledgewikipedia, wikidata, brave-searchfetcher
Historical contentwayback, brave-searchfetcher
Fact-checking3+ search engines mandatorywikidata for structured claims
PDF/document analysisdoclingtrafilatura
Multi-engine protocol: For any claim requiring verification, use minimum 2 different search engines. Different engines have different indices and biases. Agreement across engines increases confidence.
阶段0期间阅读
references/source-selection.md
获取完整的工具到领域映射,摘要如下:
领域信号主要工具次要工具
库/API文档context7, deepwiki, package-versionbrave-search
学术/科学arxiv, semantic-scholar, PubMed, openalexcrossref, brave-search
时事/趋势brave-search, exa, duckduckgo-search, g-searchfetcher, trafilatura
GitHub仓库/开源软件deepwiki, repomixbrave-search
通用知识wikipedia, wikidata, brave-searchfetcher
历史内容wayback, brave-searchfetcher
事实核查强制使用3个及以上搜索引擎wikidata 用于结构化断言
PDF/文档分析doclingtrafilatura
多引擎规则: 任何需要验证的断言,最少使用2个不同的搜索引擎。不同引擎有不同的索引和偏见,跨引擎一致会提升置信度。

Bias Detection

偏见检测

Check every finding against 10 bias categories. Read
references/bias-detection.md
for full detection signals and mitigation strategies.
BiasDetection SignalMitigation
LLM priorMatches common training patterns, lacks fresh evidenceFlag; require fresh source confirmation
RecencyOverweighting recent results, ignoring historical contextSearch for historical perspective
AuthorityUncritically accepting prestigious sourcesCross-validate even authoritative claims
ConfirmationQueries constructed to confirm initial hypothesisUse neutral queries; search for counterarguments
SurvivorshipOnly finding successful examplesSearch for failures/counterexamples
SelectionSearch engine bubble, English-onlyUse multiple engines; note coverage limitations
AnchoringFirst source disproportionately shapes interpretationDocument first source separately; seek contrast
对照10种偏见类别检查每个研究结果,阅读
references/bias-detection.md
获取完整的检测信号和缓解策略。
偏见检测信号缓解策略
LLM先验匹配常见训练模式,缺乏最新证据标记;要求最新来源确认
时效性偏见过度重视近期结果,忽略历史背景搜索历史视角内容
权威性偏见不加批判地接受知名来源内容即使是权威断言也要交叉验证
确认偏见查询构造倾向于确认初始假设使用中立查询;搜索反对论点
幸存者偏差只找到成功案例搜索失败/反例
选择偏见搜索引擎过滤泡、仅英文内容使用多个引擎;注明覆盖范围限制
锚定偏见第一个来源过度影响解读单独记录第一个来源;寻找对比内容

State Management

状态管理

  • Journal path:
    ~/.claude/research/
  • Archive path:
    ~/.claude/research/archive/
  • Filename convention:
    {YYYY-MM-DD}-{domain}-{slug}.md
    • {domain}
      :
      tech
      ,
      academic
      ,
      market
      ,
      policy
      ,
      factcheck
      ,
      compare
      ,
      survey
      ,
      track
      ,
      general
    • {slug}
      : 3-5 word semantic summary, kebab-case
    • Collision: append
      -v2
      ,
      -v3
  • Format: YAML frontmatter + markdown body +
    <!-- STATE -->
    blocks
Save protocol:
  • Quick: save once at end with
    status: Complete
  • Standard/Deep/Exhaustive: save after Wave 1 with
    status: In Progress
    , update after each wave, finalize after synthesis
Resume protocol:
  1. resume
    (no args): find
    status: In Progress
    journals. One → auto-resume. Multiple → show list.
  2. resume N
    : Nth journal from
    list
    output (reverse chronological).
  3. resume keyword
    : search frontmatter
    query
    and
    domain_tags
    for match.
Use
!uv run python skills/research/scripts/journal-store.py
for all journal operations.
State snapshot (appended after each wave save):
html
<!-- STATE
wave_completed: 2
findings_count: 12
leads_pending: ["url1", "url2"]
gaps: ["topic X needs more sources"]
contradictions: 1
next_action: "Wave 3: cross-validate top 8 findings"
-->
  • 研究记录路径:
    ~/.claude/research/
  • 归档路径:
    ~/.claude/research/archive/
  • 文件名规则:
    {YYYY-MM-DD}-{domain}-{slug}.md
    • {domain}
      :
      tech
      ,
      academic
      ,
      market
      ,
      policy
      ,
      factcheck
      ,
      compare
      ,
      survey
      ,
      track
      ,
      general
    • {slug}
      : 3-5个词的语义摘要,短横线分隔
    • 重名:追加
      -v2
      ,
      -v3
  • 格式: YAML frontmatter + markdown 正文 +
    <!-- STATE -->
保存规则:
  • 快速级别:结束时保存一次,
    status: Complete
  • 标准/深度/穷尽级别:阶段1完成后保存,
    status: In Progress
    ,每个阶段结束后更新,整合完成后最终保存
恢复规则:
  1. resume
    (无参数):查找
    status: In Progress
    的研究记录,仅1条则自动恢复,多条则展示列表
  2. resume N
    list
    输出中的第N条研究记录(倒序排列)
  3. resume keyword
    :搜索frontmatter中的
    query
    domain_tags
    匹配项
所有研究记录操作使用
!uv run python skills/research/scripts/journal-store.py
状态快照(每个阶段保存后追加):
html
<!-- STATE
wave_completed: 2
findings_count: 12
leads_pending: ["url1", "url2"]
gaps: ["主题X需要更多来源"]
contradictions: 1
next_action: "阶段3:交叉验证前8个研究结果"
-->

In-Session Commands (Deep/Exhaustive)

会话中命令(深度/穷尽层级)

Available during active research sessions:
CommandEffect
drill <finding #>
Deep dive into a specific finding with more sources
pivot <new angle>
Redirect research to a new sub-question
counter <finding #>
Explicitly search for evidence against a finding
export
Render HTML dashboard
status
Show current research state without advancing
sources
List all sources consulted so far
confidence
Show confidence distribution across findings
gaps
List identified knowledge gaps
?
Show command menu
Read
references/session-commands.md
for full protocols.
活跃研究会话期间可用:
命令效果
drill <finding #>
针对特定研究结果调用更多来源进行深度调研
pivot <new angle>
将研究转向新的子问题
counter <finding #>
明确搜索反对某个研究结果的证据
export
渲染HTML仪表板
status
展示当前研究状态,不推进流程
sources
列出目前参考的所有来源
confidence
展示研究结果的置信度分布
gaps
列出识别出的知识空白
?
展示命令菜单
阅读
references/session-commands.md
获取完整规则。

Reference File Index

参考文件索引

FileContentRead When
references/source-selection.md
Tool-to-domain mapping, multi-engine protocol, degraded modeWave 0 (selecting tools)
references/confidence-rubric.md
Scoring rubric, cross-validation rules, independence checksWave 3 (assigning confidence)
references/evidence-chain.md
Finding template, provenance format, citation standardsAny wave (structuring evidence)
references/bias-detection.md
10 bias categories (7 core + 3 LLM-specific), detection signals, mitigation strategiesWave 3 (bias audit)
references/contradiction-protocol.md
4 contradiction types, resolution frameworkWave 3 (contradiction detection)
references/self-verification.md
Devil's advocate protocol, hallucination detectionWave 3 (self-verification)
references/output-formats.md
Templates for all 5 output formatsWave 4 (formatting output)
references/team-templates.md
Team archetypes, subagent prompts, perspective agentsWave 0 (designing team)
references/session-commands.md
In-session command protocolsWhen user issues in-session command
references/dashboard-schema.md
JSON data contract for HTML dashboard
export
command
Loading rule: Load ONE reference at a time per the "Read When" column. Do not preload.
文件内容读取时机
references/source-selection.md
工具到领域的映射、多引擎规则、降级模式阶段0(选择工具时)
references/confidence-rubric.md
评分规则、交叉验证规则、独立性检查阶段3(分配置信度时)
references/evidence-chain.md
研究结果模板、出处格式、引用标准所有阶段(结构化证据时)
references/bias-detection.md
10种偏见类别(7种核心+3种LLM特定)、检测信号、缓解策略阶段3(偏见审计时)
references/contradiction-protocol.md
4种矛盾类型、解决框架阶段3(矛盾检测时)
references/self-verification.md
唱反调规则、幻觉检测阶段3(自校验时)
references/output-formats.md
所有5种输出格式的模板阶段4(格式化输出时)
references/team-templates.md
团队原型、子Agent提示词、视角Agent阶段0(设计团队时)
references/session-commands.md
会话中命令规则用户发出会话中命令时
references/dashboard-schema.md
HTML仪表板的JSON数据契约
export
命令执行时
加载规则: 按照"读取时机"列每次仅加载一个参考文件,不要预加载。

Critical Rules

核心规则

  1. No claim >= 0.7 unless supported by 2+ independent sources — single-source claims cap at 0.6
  2. Never fabricate citations — if URL, author, title, or date cannot be verified, use vague attribution ("a study in this tradition") rather than inventing specifics
  3. Always surface contradictions explicitly — never silently resolve disagreements; present both sides with evidence
  4. Always present triage scoring before executing research — user must see and can override complexity tier
  5. Save journal after every wave in Deep/Exhaustive mode — enables resume after interruption
  6. Never skip Wave 3 (cross-validation) for Standard/Deep/Exhaustive tiers — this is the anti-hallucination mechanism
  7. Multi-engine search is mandatory for fact-checking — use minimum 2 different search tools (e.g., brave-search + duckduckgo-search)
  8. Apply the Accounting Rule after every parallel dispatch — N dispatched = N accounted for before proceeding to next wave
  9. Distinguish facts from interpretations in all output — factual claims carry evidence; interpretive claims are explicitly labeled as analysis
  10. Flag all LLM-prior findings — claims matching common training data but lacking fresh evidence must be flagged with bias marker
  11. Max confidence 0.4 in degraded mode — when all research tools are unavailable, report all findings as "unverified — based on training knowledge"
  12. Load ONE reference file at a time — do not preload all references into context
  13. Track mode must load prior journal before searching — avoid re-researching what is already known
  14. The synthesis is not a summary — it must integrate findings into novel analysis, identify patterns across sources, and surface emergent insights not present in any single source
  15. PreToolUse Edit hook is non-negotiable — the research skill never modifies source files; it only creates/updates journals in
    ~/.claude/research/
  1. 除非有2个及以上独立来源支持,否则任何断言的置信度不得>=0.7——单来源断言上限为0.6
  2. 绝不虚构引用——如果URL、作者、标题或日期无法验证,使用模糊归属("该领域的一项研究")而不是编造具体信息
  3. 始终明确披露矛盾——绝不默默解决分歧,同时展示双方的证据
  4. 执行研究前始终展示分流评分——用户必须看到并可以覆盖复杂度层级
  5. 深度/穷尽模式下每个阶段结束后保存研究记录——支持中断后恢复
  6. 标准/深度/穷尽层级绝不跳过阶段3(交叉验证)——这是防幻觉机制
  7. 事实核查必须使用多引擎搜索——最少使用2个不同的搜索工具(例如brave-search + duckduckgo-search)
  8. 每次并行调度后应用计数规则——进入下一阶段前必须确认N个调度的任务都已返回结果
  9. 所有输出中区分事实和解读——事实断言附带证据,解读性断言明确标记为分析
  10. 所有LLM先验研究结果必须标记——符合常见训练数据但缺乏最新证据的断言必须附带偏见标记
  11. 降级模式下最高置信度为0.4——所有研究工具不可用时,所有研究结果标注为"未验证——基于训练知识"
  12. 每次仅加载一个参考文件——不要将所有参考文件预加载到上下文
  13. 跟踪模式必须先加载之前的研究记录再搜索——避免重复研究已有内容
  14. 整合结果不是摘要——必须将研究结果整合为创新性分析,识别跨来源的模式,挖掘任何单个来源都不存在的新洞察
  15. PreToolUse编辑钩子是强制要求——研究技能绝不修改源文件,仅在
    ~/.claude/research/
    路径下创建/更新研究记录