research

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Deep Research

深度研究

General-purpose deep research with multi-source synthesis, confidence scoring, and anti-hallucination verification. Adopts SOTA patterns from OpenAI Deep Research (multi-agent triage pipeline), Google Gemini Deep Research (user-reviewable plans), STORM (perspective-guided conversations), Perplexity (source confidence ratings), and LangChain ODR (supervisor-researcher with reflection).

通用型深度研究工具，具备多源信息整合、置信度评分和防幻觉校验能力，采用了来自OpenAI Deep Research（多Agent分流流水线）、Google Gemini Deep Research（支持用户审核的研究计划）、STORM（视角引导对话）、Perplexity（来源置信度评级）和LangChain ODR（带反思的督导-研究员模式）的业界最优实践。

Vocabulary

词汇表

Term	Definition
query	The user's research question or topic; the unit of investigation
claim	A discrete assertion to be verified; extracted from sources or user input
source	A specific origin of information: URL, document, database record, or API response
evidence	A source-backed datum supporting or contradicting a claim; always has provenance
provenance	The chain from evidence to source: tool used, URL, access timestamp, excerpt
confidence	Score 0.0-1.0 per claim; based on evidence strength and cross-validation
cross-validation	Verifying a claim across 2+ independent sources; the core anti-hallucination mechanism
triangulation	Confirming a finding using 3+ methodologically diverse sources
contradiction	When two credible sources assert incompatible claims; must be surfaced explicitly
synthesis	The final research product: not a summary but a novel integration of evidence with analysis
journal	The saved markdown record of a research session, stored in `~/.claude/research/`
sweep	Wave 1: broad parallel search across multiple tools and sources
deep dive	Wave 2: targeted follow-up on specific leads from the sweep
lead	A promising source or thread identified during the sweep, warranting deeper investigation
tier	Complexity classification: Quick (0-2), Standard (3-5), Deep (6-8), Exhaustive (9-10)
finding	A verified claim with evidence chain, confidence score, and provenance; the atomic unit of output
gap	An identified area where evidence is insufficient, contradictory, or absent
bias marker	An explicit flag on a finding indicating potential bias (recency, authority, LLM prior, etc.)
degraded mode	Operation when research tools are unavailable; confidence ceilings applied

术语	定义
query	用户的研究问题或主题，是调研的基本单位
claim	待验证的离散断言，从信息来源或用户输入中提取
source	具体的信息来源：URL、文档、数据库记录或API响应
evidence	支持或反驳某一断言的来源背书数据，始终包含出处信息
provenance	证据到来源的追溯链：使用的工具、URL、访问时间戳、摘录内容
confidence	每个断言的0.0-1.0分值，基于证据强度和交叉验证结果计算
cross-validation	跨2个及以上独立来源验证断言，是核心的防幻觉机制
triangulation	使用3个及以上方法学上差异化的来源确认研究结果
contradiction	两个可信来源提出互不相容的断言，必须明确披露
synthesis	最终研究产物：不是简单摘要，而是结合分析的证据创新性整合成果
journal	研究会话的保存markdown记录，存储在 `~/.claude/research/` 路径下
sweep	第一阶段：跨多个工具和来源的广泛并行搜索
deep dive	第二阶段：对第一阶段发现的特定线索进行针对性跟进
lead	第一阶段识别出的有价值来源或线索，值得进一步深度调研
tier	复杂度分类：快速(0-2)、标准(3-5)、深度(6-8)、穷尽(9-10)
finding	经过验证的断言，附带证据链、置信度评分和出处，是输出的原子单位
gap	识别出的证据不足、矛盾或缺失的领域
bias marker	研究结果上的明确标记，表明存在潜在偏见（时效性、权威性、LLM先验等）
degraded mode	研究工具不可用时的运行模式，会施加置信度上限

Dispatch

指令分发

`$ARGUMENTS`	Action
Question or topic text (has verb or `?` )	Investigate — classify complexity, execute wave pipeline
Vague input (<5 words, no verb, no `?` )	Intake — ask 2-3 clarifying questions, then classify
`check <claim>` or `verify <claim>`	Fact-check — verify claim against 3+ search engines
`compare <A> vs <B> [vs <C>...]`	Compare — structured comparison with decision matrix output
`survey <field or topic>`	Survey — landscape mapping, annotated bibliography
`track <topic>`	Track — load prior journal, search for updates since last session
`resume [number or keyword]`	Resume — resume a saved research session
`list [active\|domain\|tier]`	List — show journal metadata table
`archive`	Archive — move journals older than 90 days
`delete <N>`	Delete — delete journal N with confirmation
`export [N]`	Export — render HTML dashboard for journal N (default: current)
Empty	Gallery — show topic examples + "ask me anything" prompt

`$ARGUMENTS`	操作
问题或主题文本（包含动词或 `?` ）	调研 — 分类复杂度，执行阶段流水线
模糊输入（少于5个词，无动词，无 `?` ）	信息收集 — 询问2-3个澄清问题，之后进行分类
`check <claim>` 或 `verify <claim>`	事实核查 — 跨3个及以上搜索引擎验证断言
`compare <A> vs <B> [vs <C>...]`	对比 — 输出带决策矩阵的结构化对比结果
`survey <field or topic>`	综述 — 领域全景梳理，带注释的参考文献列表
`track <topic>`	跟踪 — 加载之前的研究记录，搜索上次会话之后的更新内容
`resume [number or keyword]`	恢复 — 恢复已保存的研究会话
`list [active\|domain\|tier]`	列表 — 展示研究记录元数据表
`archive`	归档 — 移动超过90天的研究记录到归档目录
`delete <N>`	删除 — 经确认后删除第N条研究记录
`export [N]`	导出 — 为第N条研究记录渲染HTML仪表板（默认：当前会话）
空输入	示例展示 — 展示主题示例 + "请问我任何问题"提示

Auto-Detection Heuristic

自动检测规则

If no mode keyword matches:

Ends with
```
?
```
or starts with question word (who/what/when/where/why/how/is/are/can/does/should/will) → Investigate
Contains
```
vs
```
,
```
versus
```
,
```
compared to
```
,
```
or
```
between noun phrases → Compare
Declarative statement with factual claim, no question syntax → Fact-check
Broad field name with no specific question → ask: "Investigate a specific question, or survey the entire field?"
Ambiguous → ask: "Would you like me to investigate this question, verify this claim, or survey this field?"

如果没有匹配到模式关键词：

以
```
?
```
结尾或以疑问词（who/what/when/where/why/how/is/are/can/does/should/will）开头 → 调研
名词短语之间包含
```
vs
```
、
```
versus
```
、
```
compared to
```
、
```
or
```
→ 对比
带事实断言的陈述性语句，无疑问句式 → 事实核查
宽泛的领域名称，无具体问题 → 询问："想要调研某个具体问题，还是综述整个领域？"
含义模糊 → 询问："您希望我调研这个问题、验证这个断言，还是综述这个领域？"

Gallery (Empty Arguments)

示例展示（空参数）

Present research examples spanning domains:

#	Domain	Example	Likely Tier
1	Technology	"What are the current best practices for LLM agent architectures?"	Deep
2	Academic	"What is the state of evidence on intermittent fasting for longevity?"	Standard
3	Market	"How does the competitive landscape for vector databases compare?"	Deep
4	Fact-check	"Is it true that 90% of startups fail within the first year?"	Standard
5	Architecture	"When should you choose event sourcing over CRUD?"	Standard
6	Trends	"What emerging programming languages gained traction in 2025-2026?"	Standard

Pick a number, paste your own question, or type
guide me
.

展示跨领域的研究示例：

#	领域	示例	可能层级
1	技术	"LLM Agent架构当前的最佳实践有哪些？"	深度
2	学术	"间歇性 fasting 对长寿的证据现状如何？"	标准
3	市场	"向量数据库的竞争格局对比情况如何？"	深度
4	事实核查	"90%的初创公司会在第一年倒闭是真的吗？"	标准
5	架构	"什么时候应该选择事件溯源而不是CRUD？"	标准
6	趋势	"2025-2026年有哪些新兴编程语言获得了关注？"	标准

选择一个编号，粘贴你自己的问题，或者输入
guide me
。

Skill Awareness

技能适配检查

Before starting research, check if another skill is a better fit:

Signal	Redirect
Code review, PR review, diff analysis	Suggest `/honest-review`
Strategic decision with adversaries, game theory	Suggest `/wargame`
Multi-perspective expert debate	Suggest `/host-panel`
Prompt optimization, model-specific prompting	Suggest `/prompt-engineer`

If the user confirms they want general research, proceed.

在开始研究之前，检查是否有其他技能更适合当前需求：

信号	跳转建议
代码评审、PR评审、diff分析	建议使用 `/honest-review`
带对抗方的战略决策、博弈论相关	建议使用 `/wargame`
多视角专家辩论	建议使用 `/host-panel`
Prompt优化、特定模型的prompt编写	建议使用 `/prompt-engineer`

如果用户确认需要通用研究，再继续执行。

Complexity Classification

复杂度分类

Score the query on 5 dimensions (0-2 each, total 0-10):

Dimension	0	1	2
Scope breadth	Single fact/definition	Multi-faceted, 2-3 domains	Cross-disciplinary, 4+ domains
Source difficulty	Top search results suffice	Specialized databases or multiple source types	Paywalled, fragmented, or conflicting sources
Temporal sensitivity	Stable/historical	Evolving field (months matter)	Fast-moving (days/weeks matter), active controversy
Verification complexity	Easily verifiable (official docs)	2-3 independent sources needed	Contested claims, expert disagreement, no consensus
Synthesis demand	Answer is a fact or list	Compare/contrast viewpoints	Novel integration of conflicting threads

Total	Tier	Strategy
0-2	Quick	Inline, 1-2 searches, fire-and-forget
3-5	Standard	Subagent wave, 3-5 parallel searchers, report delivered
6-8	Deep	Agent team (TeamCreate), 3-5 teammates, interactive session
9-10	Exhaustive	Agent team, 4-6 teammates + nested subagent waves, interactive

Present the scoring to the user. User can override tier with

--depth <tier>

从5个维度为查询打分（每个维度0-2分，总分0-10）：

维度	0分	1分	2分
范围广度	单个事实/定义	多层面，涉及2-3个领域	跨学科，涉及4个及以上领域
来源获取难度	顶部搜索结果即可满足	需要专业数据库或多种来源类型	付费墙、碎片化或相互冲突的来源
时间敏感性	稳定/历史内容	演进中的领域（以月为单位变化）	快速变化（以天/周为单位变化），存在活跃争议
验证复杂度	容易验证（官方文档即可）	需要2-3个独立来源	存在争议的断言、专家意见分歧、无共识
整合需求	答案是事实或列表	对比/对照不同观点	需要创新性整合相互冲突的线索

总分	层级	执行策略
0-2	快速	内联执行，1-2次搜索，即发即得
3-5	标准	子Agent阶段执行，3-5个并行搜索器，交付报告
6-8	深度	Agent团队（TeamCreate），3-5个协作成员，交互式会话
9-10	穷尽	Agent团队，4-6个协作成员 + 嵌套子Agent阶段，交互式

向用户展示打分结果，用户可以通过

--depth <tier>

参数覆盖层级。

Wave Pipeline

阶段流水线

All non-Quick research follows this 5-wave pipeline. Quick merges Waves 0+1+4 inline.

所有非快速级别的研究都遵循以下5阶段流水线，快速级别会内联合并阶段0+1+4。

Wave 0: Triage (always inline, never parallelized)

阶段0：分流（始终内联执行，从不并行）

Run

!uv run python skills/research/scripts/research-scanner.py "$ARGUMENTS"

for deterministic pre-scan

Decompose query into 2-5 sub-questions
Score complexity on the 5-dimension rubric
Check tool availability — probe key MCP tools; set degraded mode flags and confidence ceilings per
```
references/source-selection.md
```
Select tools per domain signals — read
```
references/source-selection.md
```
Check for existing journals — if
```
track
```
or
```
resume
```
, load prior state
Present triage to user — show: complexity score, sub-questions, planned strategy, estimated tier. User may override.

执行

!uv run python skills/research/scripts/research-scanner.py "$ARGUMENTS"

进行确定性预扫描

将查询拆解为2-5个子问题
按照5维度评分规则进行复杂度打分
检查工具可用性——探测核心MCP工具；按照
```
references/source-selection.md
```
设置降级模式标志和置信度上限
按照领域信号选择工具——阅读
```
references/source-selection.md
```
检查是否存在已有研究记录——如果是
```
track
```
或
```
resume
```
指令，加载之前的状态
向用户展示分流结果——展示：复杂度得分、子问题、计划策略、预估层级，用户可覆盖调整

Wave 1: Broad Sweep (parallel)

阶段1：广泛扫描（并行）

Scale by tier:

Quick (inline): 1-2 tool calls sequentially. No subagents.

Standard (subagent wave): Dispatch 3-5 parallel subagents via Task tool:

Subagent A → brave-search + duckduckgo-search for sub-question 1
Subagent B → exa + g-search for sub-question 2
Subagent C → context7 / deepwiki / arxiv / semantic-scholar for technical specifics
Subagent D → wikipedia / wikidata for factual grounding
[Subagent E → PubMed / openalex if academic domain detected]

Deep (agent team): TeamCreate

"research-{slug}"

Lead: triage (Wave 0), orchestrate, judge reconcile (Wave 3), synthesize (Wave 4)
  |-- web-researcher:       brave-search, duckduckgo-search, exa, g-search
  |-- tech-researcher:      context7, deepwiki, arxiv, semantic-scholar, package-version
  |-- content-extractor:    fetcher, trafilatura, docling, wikipedia, wayback
  |-- [academic-researcher: arxiv, semantic-scholar, openalex, crossref, PubMed]
  |-- [adversarial-reviewer: devil's advocate — counter-search all emerging findings]

Spawn academic-researcher if domain signals include academic/scientific. Spawn adversarial-reviewer for Exhaustive tier or if verification complexity >= 2.

Exhaustive: Deep team + each teammate runs nested subagent waves internally.

Each subagent/teammate returns structured findings:

json

{
  "sub_question": "...",
  "findings": [{"claim": "...", "source_url": "...", "source_tool": "...", "excerpt": "...", "confidence_raw": 0.6}],
  "leads": ["url1", "url2"],
  "gaps": ["could not find data on X"]
}

根据层级扩展规模：

快速（内联）： 顺序执行1-2次工具调用，无子Agent。

标准（子Agent阶段）： 通过Task工具调度3-5个并行子Agent：

Subagent A → brave-search + duckduckgo-search 处理子问题1
Subagent B → exa + g-search 处理子问题2
Subagent C → context7 / deepwiki / arxiv / semantic-scholar 处理技术细节
Subagent D → wikipedia / wikidata 进行事实基础校验
[Subagent E → PubMed / openalex 如果检测到学术领域]

深度（Agent团队）： 创建

"research-{slug}"

团队：

负责人: 分流（阶段0）、协调、判断调和（阶段3）、整合输出（阶段4）
  |-- 网络研究员:       brave-search, duckduckgo-search, exa, g-search
  |-- 技术研究员:      context7, deepwiki, arxiv, semantic-scholar, package-version
  |-- 内容提取器:    fetcher, trafilatura, docling, wikipedia, wayback
  |-- [学术研究员: arxiv, semantic-scholar, openalex, crossref, PubMed]
  |-- [对抗评审员: 唱反调者——对所有新出现的研究结果进行反向搜索]

如果领域信号包含学术/科学相关内容则生成学术研究员，对于穷尽层级或验证复杂度>=2的场景生成对抗评审员。

穷尽： 深度团队 + 每个团队成员内部运行嵌套子Agent阶段。

每个子Agent/团队成员返回结构化研究结果：

json

{
  "sub_question": "...",
  "findings": [{"claim": "...", "source_url": "...", "source_tool": "...", "excerpt": "...", "confidence_raw": 0.6}],
  "leads": ["url1", "url2"],
  "gaps": ["无法找到X相关数据"]
}

Wave 1.5: Perspective Expansion (Deep/Exhaustive only)

阶段1.5：视角扩展（仅深度/穷尽层级）

STORM-style perspective-guided conversation. Spawn 2-4 perspective subagents:

Perspective	Focus	Question Style
Skeptic	What could be wrong? What's missing?	"What evidence would disprove this?"
Domain Expert	Technical depth, nuance, edge cases	"What do practitioners actually encounter?"
Practitioner	Real-world applicability, trade-offs	"What matters when you actually build this?"
Theorist	First principles, abstractions, frameworks	"What underlying model explains this?"

Each perspective agent reviews Wave 1 findings and generates 2-3 additional sub-questions from their viewpoint. These sub-questions feed into Wave 2.

STORM风格的视角引导对话，生成2-4个不同视角的子Agent：

视角	关注重点	提问风格
怀疑者	可能存在什么问题？缺失了什么？	"有什么证据可以反驳这一点？"
领域专家	技术深度、细节、边缘案例	"从业者实际会遇到什么问题？"
实践者	实际适用性、权衡	"真正落地的时候需要注意什么？"
理论家	第一性原理、抽象、框架	"什么底层模型可以解释这一点？"

每个视角Agent审阅阶段1的研究结果，从各自视角生成2-3个额外的子问题，这些子问题会进入阶段2。

Wave 2: Deep Dive (parallel, targeted)

阶段2：深度调研（并行、针对性）

Rank leads from Wave 1 by potential value (citation frequency, source authority, relevance)
Dispatch deep-read subagents — use fetcher/trafilatura/docling to extract full content from top leads
Follow citation chains — if a source cites another, fetch the original
Fill gaps — for each gap identified in Wave 1, dispatch targeted searches
Use thinking MCPs:
- ```
cascade-thinking
```
  for multi-perspective analysis of complex findings
- ```
structured-thinking
```
  for tracking evidence chains and contradictions
- ```
think-strategies
```
  for complex question decomposition (Standard+ only)

按照潜在价值（引用频率、来源权威性、相关性）对阶段1的线索进行排序
调度深度阅读子Agent——使用fetcher/trafilatura/docling提取顶级线索的完整内容
跟踪引用链——如果某个来源引用了另一个来源，获取原始来源内容
填补空白——针对阶段1识别出的每个空白，调度针对性搜索
使用思考类MCP：
- ```
cascade-thinking
```
  用于复杂研究结果的多视角分析
- ```
structured-thinking
```
  用于跟踪证据链和矛盾
- ```
think-strategies
```
  用于复杂问题拆解（仅标准及以上层级）

Wave 3: Cross-Validation (parallel)

阶段3：交叉验证（并行）

The anti-hallucination wave. Read

references/confidence-rubric.md

and

references/self-verification.md

For every claim surviving Waves 1-2:

Independence check — are supporting sources truly independent? Sources citing each other are NOT independent.
Counter-search — explicitly search for evidence AGAINST each major claim using a different search engine
Freshness check — verify sources are current (flag if >1 year old for time-sensitive topics)
Contradiction scan — read
```
references/contradiction-protocol.md
```
, identify and classify disagreements
Confidence scoring — assign 0.0-1.0 per
```
references/confidence-rubric.md
```
Bias sweep — check each finding against 10 bias categories (7 core + 3 LLM-specific) per
```
references/bias-detection.md
```

Self-Verification (3+ findings survive): Spawn devil's advocate subagent per

references/self-verification.md

For each finding, attempt to disprove it. Search for counterarguments. Check if evidence is outdated. Verify claims actually follow from cited evidence. Flag LLM confabulations.

Adjust confidence: Survives +0.05, Weakened -0.10, Disproven set to 0.0. Adjustments are subject to hard caps — single-source claims remain capped at 0.60 even after survival adjustment.

防幻觉阶段，阅读

references/confidence-rubric.md

和

references/self-verification.md

。

针对阶段1-2保留的每个断言：

独立性检查——支持证据的来源是否真正独立？互相引用的来源不算独立
反向搜索——使用不同的搜索引擎明确搜索反对每个核心断言的证据
时效性检查——验证来源是否是最新的（对于时间敏感主题，如果来源超过1年需标记）
矛盾扫描——阅读
```
references/contradiction-protocol.md
```
，识别并分类分歧
置信度打分——按照
```
references/confidence-rubric.md
```
分配0.0-1.0的分值
偏见排查——按照
```
references/bias-detection.md
```
对照10种偏见类别（7种核心+3种LLM特定）检查每个研究结果

自校验（保留3个及以上研究结果时）： 按照

references/self-verification.md

生成唱反调子Agent：

针对每个研究结果，尝试反驳它，搜索反对论点，检查证据是否过时，验证断言是否真的符合引用的证据，标记LLM虚构内容。

调整置信度：验证通过+0.05，证据削弱-0.10，被反驳设为0.0。调整有严格上限——即使通过验证，单来源断言的置信度上限仍为0.60。

Wave 4: Synthesis (always inline, lead only)

阶段4：整合输出（始终内联执行，仅负责人操作）

Produce the final research product. Read

references/output-formats.md

for templates.

The synthesis is NOT a summary. It must:

Answer directly — answer the user's question clearly
Map evidence — all verified findings with confidence and citations
Surface contradictions — where sources disagree, with analysis of why
Show confidence landscape — what is known confidently, what is uncertain, what is unknown
Audit biases — biases detected during research
Identify gaps — what evidence is missing, what further research would help
Distill takeaways — 3-7 numbered key findings
Cite sources — full bibliography with provenance

Output format adapts to mode:

Investigate → Research Brief (Standard) or Deep Report (Deep/Exhaustive)
Fact-check → Quick Answer with verdict + evidence
Compare → Decision Matrix
Survey → Annotated Bibliography
User can override with
```
--format brief|deep|bib|matrix
```

生成最终研究产物，阅读

references/output-formats.md

获取模板。

整合结果不是简单摘要，必须包含：

直接回答——清晰回答用户的问题
证据映射——所有经过验证的研究结果，附带置信度和引用
矛盾披露——来源存在分歧的地方，以及原因分析
置信度全景——哪些是有信心的结论，哪些是不确定的，哪些是未知的
偏见审计——研究过程中检测到的偏见
空白识别——缺失了哪些证据，哪些进一步研究会有帮助
要点提炼——3-7条编号的核心研究结果
来源引用——带出处的完整参考文献列表

输出格式适配不同模式：

调研 → 研究简报（标准层级）或深度报告（深度/穷尽层级）
事实核查 → 带结论+证据的快速回答
对比 → 决策矩阵
综述 → 带注释的参考文献列表
用户可以通过
```
--format brief|deep|bib|matrix
```
覆盖格式

Confidence Scoring

置信度评分

Score	Basis
0.9-1.0	Official docs + 2 independent sources agree, no contradictions
0.7-0.8	2+ independent sources agree, minor qualifications
0.5-0.6	Single authoritative source, or 2 sources with partial agreement
0.3-0.4	Single non-authoritative source, or conflicting evidence
0.2-0.3	Multiple non-authoritative sources with partial agreement, or single source with significant caveats
0.1-0.2	LLM reasoning only, no external evidence found
0.0	Actively contradicted by evidence

Hard rules:

No claim reported at >= 0.7 unless supported by 2+ independent sources
Single-source claims cap at 0.6 regardless of source authority
Degraded mode (all research tools unavailable): max confidence 0.4, all findings labeled "unverified"

Merged confidence (for claims supported by multiple sources):

c_merged = 1 - (1-c1)(1-c2)...(1-cN)

capped at 0.99

分值	依据
0.9-1.0	官方文档 + 2个独立来源一致，无矛盾
0.7-0.8	2个及以上独立来源一致，存在少量限定条件
0.5-0.6	单个权威来源，或2个来源部分一致
0.3-0.4	单个非权威来源，或存在冲突证据
0.2-0.3	多个非权威来源部分一致，或单个来源存在重大附加说明
0.1-0.2	仅LLM推理，未找到外部证据
0.0	被证据明确反驳

硬性规则：

除非有2个及以上独立来源支持，否则任何断言的置信度不得>=0.7
无论来源权威性如何，单来源断言的置信度上限为0.6
降级模式（所有研究工具不可用）：最高置信度0.4，所有研究结果标记为"未验证"

合并置信度（针对多个来源支持的断言）：

c_merged = 1 - (1-c1)(1-c2)...(1-cN)

上限为0.99

Evidence Chain Structure

证据链结构

Every finding carries this structure:

FINDING RR-{seq:03d}: [claim statement]
  CONFIDENCE: [0.0-1.0]
  EVIDENCE:
    1. [source_tool] [url] [access_timestamp] — [relevant excerpt, max 100 words]
    2. [source_tool] [url] [access_timestamp] — [relevant excerpt, max 100 words]
  CROSS-VALIDATION: [agrees|contradicts|partial] across [N] independent sources
  BIAS MARKERS: [none | list of detected biases with category]
  GAPS: [none | what additional evidence would strengthen this finding]

Use

!uv run python skills/research/scripts/finding-formatter.py --format markdown

to normalize.

每个研究结果都遵循以下结构：

FINDING RR-{seq:03d}: [断言内容]
  CONFIDENCE: [0.0-1.0]
  EVIDENCE:
    1. [source_tool] [url] [access_timestamp] — [相关摘录，最多100字]
    2. [source_tool] [url] [access_timestamp] — [相关摘录，最多100字]
  CROSS-VALIDATION: [一致|矛盾|部分一致] 跨 [N] 个独立来源
  BIAS MARKERS: [无 | 检测到的偏见列表及类别]
  GAPS: [无 | 哪些额外证据可以强化该研究结果]

使用

!uv run python skills/research/scripts/finding-formatter.py --format markdown

进行格式标准化。

Source Selection

来源选择

Read

references/source-selection.md

during Wave 0 for the full tool-to-domain mapping. Summary:

Domain Signal	Primary Tools	Secondary Tools
Library/API docs	context7, deepwiki, package-version	brave-search
Academic/scientific	arxiv, semantic-scholar, PubMed, openalex	crossref, brave-search
Current events/trends	brave-search, exa, duckduckgo-search, g-search	fetcher, trafilatura
GitHub repos/OSS	deepwiki, repomix	brave-search
General knowledge	wikipedia, wikidata, brave-search	fetcher
Historical content	wayback, brave-search	fetcher
Fact-checking	3+ search engines mandatory	wikidata for structured claims
PDF/document analysis	docling	trafilatura

Multi-engine protocol: For any claim requiring verification, use minimum 2 different search engines. Different engines have different indices and biases. Agreement across engines increases confidence.

阶段0期间阅读

references/source-selection.md

获取完整的工具到领域映射，摘要如下：

领域信号	主要工具	次要工具
库/API文档	context7, deepwiki, package-version	brave-search
学术/科学	arxiv, semantic-scholar, PubMed, openalex	crossref, brave-search
时事/趋势	brave-search, exa, duckduckgo-search, g-search	fetcher, trafilatura
GitHub仓库/开源软件	deepwiki, repomix	brave-search
通用知识	wikipedia, wikidata, brave-search	fetcher
历史内容	wayback, brave-search	fetcher
事实核查	强制使用3个及以上搜索引擎	wikidata 用于结构化断言
PDF/文档分析	docling	trafilatura

多引擎规则： 任何需要验证的断言，最少使用2个不同的搜索引擎。不同引擎有不同的索引和偏见，跨引擎一致会提升置信度。

Bias Detection

偏见检测

Check every finding against 10 bias categories. Read

references/bias-detection.md

for full detection signals and mitigation strategies.

Bias	Detection Signal	Mitigation
LLM prior	Matches common training patterns, lacks fresh evidence	Flag; require fresh source confirmation
Recency	Overweighting recent results, ignoring historical context	Search for historical perspective
Authority	Uncritically accepting prestigious sources	Cross-validate even authoritative claims
Confirmation	Queries constructed to confirm initial hypothesis	Use neutral queries; search for counterarguments
Survivorship	Only finding successful examples	Search for failures/counterexamples
Selection	Search engine bubble, English-only	Use multiple engines; note coverage limitations
Anchoring	First source disproportionately shapes interpretation	Document first source separately; seek contrast

对照10种偏见类别检查每个研究结果，阅读

references/bias-detection.md

获取完整的检测信号和缓解策略。

偏见	检测信号	缓解策略
LLM先验	匹配常见训练模式，缺乏最新证据	标记；要求最新来源确认
时效性偏见	过度重视近期结果，忽略历史背景	搜索历史视角内容
权威性偏见	不加批判地接受知名来源内容	即使是权威断言也要交叉验证
确认偏见	查询构造倾向于确认初始假设	使用中立查询；搜索反对论点
幸存者偏差	只找到成功案例	搜索失败/反例
选择偏见	搜索引擎过滤泡、仅英文内容	使用多个引擎；注明覆盖范围限制
锚定偏见	第一个来源过度影响解读	单独记录第一个来源；寻找对比内容

State Management

状态管理

Journal path:
```
~/.claude/research/
```
Archive path:
```
~/.claude/research/archive/
```

Filename convention:

{YYYY-MM-DD}-{domain}-{slug}.md

{domain}

tech

academic

market

policy

factcheck

compare

survey

track

general

```
{slug}
```
: 3-5 word semantic summary, kebab-case
Collision: append
```
-v2
```
,
```
-v3
```

Format: YAML frontmatter + markdown body +
```

```
blocks

Save protocol:

Quick: save once at end with
```
status: Complete
```
Standard/Deep/Exhaustive: save after Wave 1 with
```
status: In Progress
```
, update after each wave, finalize after synthesis

Resume protocol:

```
resume
```
(no args): find
```
status: In Progress
```
journals. One → auto-resume. Multiple → show list.
```
resume N
```
: Nth journal from
```
list
```
output (reverse chronological).
```
resume keyword
```
: search frontmatter
```
query
```
and
```
domain_tags
```
for match.

Use

!uv run python skills/research/scripts/journal-store.py

for all journal operations.

State snapshot (appended after each wave save):

html

<!-- STATE
wave_completed: 2
findings_count: 12
leads_pending: ["url1", "url2"]
gaps: ["topic X needs more sources"]
contradictions: 1
next_action: "Wave 3: cross-validate top 8 findings"
-->

研究记录路径：
```
~/.claude/research/
```
归档路径：
```
~/.claude/research/archive/
```

文件名规则：

{YYYY-MM-DD}-{domain}-{slug}.md

{domain}

tech

academic

market

policy

factcheck

compare

survey

track

general

```
{slug}
```
: 3-5个词的语义摘要，短横线分隔
重名：追加
```
-v2
```
,
```
-v3
```

格式： YAML frontmatter + markdown 正文 +
```

```
块

保存规则：

快速级别：结束时保存一次，
```
status: Complete
```
标准/深度/穷尽级别：阶段1完成后保存，
```
status: In Progress
```
，每个阶段结束后更新，整合完成后最终保存

恢复规则：

```
resume
```
（无参数）：查找
```
status: In Progress
```
的研究记录，仅1条则自动恢复，多条则展示列表
```
resume N
```
：
```
list
```
输出中的第N条研究记录（倒序排列）
```
resume keyword
```
：搜索frontmatter中的
```
query
```
和
```
domain_tags
```
匹配项

所有研究记录操作使用

!uv run python skills/research/scripts/journal-store.py

。

状态快照（每个阶段保存后追加）：

html

<!-- STATE
wave_completed: 2
findings_count: 12
leads_pending: ["url1", "url2"]
gaps: ["主题X需要更多来源"]
contradictions: 1
next_action: "阶段3：交叉验证前8个研究结果"
-->

In-Session Commands (Deep/Exhaustive)

会话中命令（深度/穷尽层级）

Available during active research sessions:

Command	Effect
`drill <finding #>`	Deep dive into a specific finding with more sources
`pivot <new angle>`	Redirect research to a new sub-question
`counter <finding #>`	Explicitly search for evidence against a finding
`export`	Render HTML dashboard
`status`	Show current research state without advancing
`sources`	List all sources consulted so far
`confidence`	Show confidence distribution across findings
`gaps`	List identified knowledge gaps
`?`	Show command menu

Read

references/session-commands.md

for full protocols.

活跃研究会话期间可用：

命令	效果
`drill <finding #>`	针对特定研究结果调用更多来源进行深度调研
`pivot <new angle>`	将研究转向新的子问题
`counter <finding #>`	明确搜索反对某个研究结果的证据
`export`	渲染HTML仪表板
`status`	展示当前研究状态，不推进流程
`sources`	列出目前参考的所有来源
`confidence`	展示研究结果的置信度分布
`gaps`	列出识别出的知识空白
`?`	展示命令菜单

阅读

references/session-commands.md

获取完整规则。

Reference File Index

参考文件索引

File	Content	Read When
`references/source-selection.md`	Tool-to-domain mapping, multi-engine protocol, degraded mode	Wave 0 (selecting tools)
`references/confidence-rubric.md`	Scoring rubric, cross-validation rules, independence checks	Wave 3 (assigning confidence)
`references/evidence-chain.md`	Finding template, provenance format, citation standards	Any wave (structuring evidence)
`references/bias-detection.md`	10 bias categories (7 core + 3 LLM-specific), detection signals, mitigation strategies	Wave 3 (bias audit)
`references/contradiction-protocol.md`	4 contradiction types, resolution framework	Wave 3 (contradiction detection)
`references/self-verification.md`	Devil's advocate protocol, hallucination detection	Wave 3 (self-verification)
`references/output-formats.md`	Templates for all 5 output formats	Wave 4 (formatting output)
`references/team-templates.md`	Team archetypes, subagent prompts, perspective agents	Wave 0 (designing team)
`references/session-commands.md`	In-session command protocols	When user issues in-session command
`references/dashboard-schema.md`	JSON data contract for HTML dashboard	`export` command

Loading rule: Load ONE reference at a time per the "Read When" column. Do not preload.

文件	内容	读取时机
`references/source-selection.md`	工具到领域的映射、多引擎规则、降级模式	阶段0（选择工具时）
`references/confidence-rubric.md`	评分规则、交叉验证规则、独立性检查	阶段3（分配置信度时）
`references/evidence-chain.md`	研究结果模板、出处格式、引用标准	所有阶段（结构化证据时）
`references/bias-detection.md`	10种偏见类别（7种核心+3种LLM特定）、检测信号、缓解策略	阶段3（偏见审计时）
`references/contradiction-protocol.md`	4种矛盾类型、解决框架	阶段3（矛盾检测时）
`references/self-verification.md`	唱反调规则、幻觉检测	阶段3（自校验时）
`references/output-formats.md`	所有5种输出格式的模板	阶段4（格式化输出时）
`references/team-templates.md`	团队原型、子Agent提示词、视角Agent	阶段0（设计团队时）
`references/session-commands.md`	会话中命令规则	用户发出会话中命令时
`references/dashboard-schema.md`	HTML仪表板的JSON数据契约	`export` 命令执行时

加载规则： 按照"读取时机"列每次仅加载一个参考文件，不要预加载。

Critical Rules

核心规则

No claim >= 0.7 unless supported by 2+ independent sources — single-source claims cap at 0.6
Never fabricate citations — if URL, author, title, or date cannot be verified, use vague attribution ("a study in this tradition") rather than inventing specifics
Always surface contradictions explicitly — never silently resolve disagreements; present both sides with evidence
Always present triage scoring before executing research — user must see and can override complexity tier
Save journal after every wave in Deep/Exhaustive mode — enables resume after interruption
Never skip Wave 3 (cross-validation) for Standard/Deep/Exhaustive tiers — this is the anti-hallucination mechanism
Multi-engine search is mandatory for fact-checking — use minimum 2 different search tools (e.g., brave-search + duckduckgo-search)
Apply the Accounting Rule after every parallel dispatch — N dispatched = N accounted for before proceeding to next wave
Distinguish facts from interpretations in all output — factual claims carry evidence; interpretive claims are explicitly labeled as analysis
Flag all LLM-prior findings — claims matching common training data but lacking fresh evidence must be flagged with bias marker
Max confidence 0.4 in degraded mode — when all research tools are unavailable, report all findings as "unverified — based on training knowledge"
Load ONE reference file at a time — do not preload all references into context
Track mode must load prior journal before searching — avoid re-researching what is already known
The synthesis is not a summary — it must integrate findings into novel analysis, identify patterns across sources, and surface emergent insights not present in any single source
PreToolUse Edit hook is non-negotiable — the research skill never modifies source files; it only creates/updates journals in
```
~/.claude/research/
```

除非有2个及以上独立来源支持，否则任何断言的置信度不得>=0.7——单来源断言上限为0.6
绝不虚构引用——如果URL、作者、标题或日期无法验证，使用模糊归属（"该领域的一项研究"）而不是编造具体信息
始终明确披露矛盾——绝不默默解决分歧，同时展示双方的证据
执行研究前始终展示分流评分——用户必须看到并可以覆盖复杂度层级
深度/穷尽模式下每个阶段结束后保存研究记录——支持中断后恢复
标准/深度/穷尽层级绝不跳过阶段3（交叉验证）——这是防幻觉机制
事实核查必须使用多引擎搜索——最少使用2个不同的搜索工具（例如brave-search + duckduckgo-search）
每次并行调度后应用计数规则——进入下一阶段前必须确认N个调度的任务都已返回结果
所有输出中区分事实和解读——事实断言附带证据，解读性断言明确标记为分析
所有LLM先验研究结果必须标记——符合常见训练数据但缺乏最新证据的断言必须附带偏见标记
降级模式下最高置信度为0.4——所有研究工具不可用时，所有研究结果标注为"未验证——基于训练知识"
每次仅加载一个参考文件——不要将所有参考文件预加载到上下文
跟踪模式必须先加载之前的研究记录再搜索——避免重复研究已有内容
整合结果不是摘要——必须将研究结果整合为创新性分析，识别跨来源的模式，挖掘任何单个来源都不存在的新洞察
PreToolUse编辑钩子是强制要求——研究技能绝不修改源文件，仅在
```
~/.claude/research/
```
路径下创建/更新研究记录