tooluniverse-literature-deep-research

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Literature Deep Research Strategy (Enhanced)

增强型文献深度研究策略

A systematic approach to comprehensive literature research that starts with target disambiguation to prevent missing details, uses evidence grading to separate signal from noise, and produces a content-focused report with mandatory completeness sections.

KEY PRINCIPLES:

Target disambiguation FIRST - Resolve IDs, synonyms, naming collisions before literature search
Right-size the deliverable - Use Factoid / Verification Mode for single, answerable questions; use full report mode for “deep research”
Report-first output - Default deliverable is a report file; an inline answer is allowed (and recommended) for Factoid / Verification Mode
Evidence grading - Grade every claim by evidence strength (mechanistic paper vs screen hit vs review vs text-mined)
Mandatory completeness - All checklist sections must exist, even if "unknown/limited evidence"
Source attribution - Every piece of information traceable to database/tool
English-first queries - Always use English terms for literature searches and tool calls, even if the user writes in another language. Only try original-language terms as a fallback if English returns no results. Respond in the user's language

这是一套系统化的全面文献研究方法，从目标消歧起步以避免遗漏细节，通过证据分级区分有效信息与干扰信息，最终产出聚焦内容的报告，且包含强制完整性章节。

核心原则:

先做目标消歧 - 在文献搜索前解析ID、同义词、命名冲突
匹配交付物规模 - 针对单一可回答问题使用事实类/验证模式；针对“深度研究”需求使用完整报告模式
以报告为核心输出 - 默认交付物为报告文件；事实类/验证模式下允许（且推荐）提供内嵌式答案
证据分级 - 依据证据强度为每个主张分级（机制研究论文 vs 筛选结果 vs 综述 vs 文本挖掘内容）
强制完整性 - 所有清单章节必须存在，即使内容为“未知/证据有限”
来源归因 - 所有信息均可追溯至数据库/工具
优先使用英文查询 - 文献搜索和工具调用始终使用英文术语，即便用户使用其他语言提问。仅当英文查询无结果时，才尝试使用原语言术语作为备选。最终以用户使用的语言回复

Workflow Overview

工作流概述

User Query
  ↓
Phase 0: CLARIFY + MODE SELECT (factoid vs deep report)
  ↓
Phase 1: TARGET DISAMBIGUATION + PROFILE (default ON for biological targets)
  ├─ Resolve official IDs (Ensembl, UniProt, HGNC)
  ├─ Gather synonyms/aliases + known naming collisions
  ├─ Get protein length, isoforms, domain architecture
  ├─ Get subcellular location, expression, GO terms, pathways
  └─ Output: Target Profile section + Collision-aware search plan
  ↓
Phase 2: LITERATURE SEARCH (internal methodology, not shown)
  ├─ High-precision seed queries (build mechanistic core)
  ├─ Citation network expansion from seeds
  ├─ Collision-filtered broader queries
  └─ Theme clustering + evidence grading
  ↓
Phase 3: REPORT SYNTHESIS
  ├─ Progressive writing to [topic]_report.md
  ├─ Mandatory completeness checklist validation
  └─ Biological model + testable hypotheses
  ↓
Optional: methods_appendix.md (only if user requests)

用户查询
  ↓
阶段0：澄清需求 + 模式选择（事实类 vs 深度报告）
  ↓
阶段1：目标消歧 + 概况分析（生物靶点默认开启）
  ├─ 解析官方ID（Ensembl、UniProt、HGNC）
  ├─ 收集同义词/别名 + 已知命名冲突
  ├─ 获取蛋白长度、异构体、结构域架构
  ├─ 获取亚细胞定位、表达情况、GO术语、通路信息
  └─ 输出：靶点概况章节 + 冲突感知搜索方案
  ↓
阶段2：文献搜索（内部方法，不对外展示）
  ├─ 高精度种子查询（构建机制研究核心）
  ├─ 基于种子文献的引文网络扩展
  ├─ 冲突过滤后的宽泛查询
  └─ 主题聚类 + 证据分级
  ↓
阶段3：报告合成
  ├─ 逐步撰写至[topic]_report.md
  ├─ 强制完整性清单验证
  └─ 生物模型 + 可测试假设
  ↓
可选：methods_appendix.md（仅在用户要求时提供）

Phase 0: Initial Clarification

阶段0：初始澄清

Mandatory Questions

必问问题

Target type: Is this a biological target (gene/protein), a general topic, or a disease?
Scope: Is this a single factoid to verify (“Which antibiotic?”, “Which strain?”, “Which year?”) or a comprehensive/deep review?
Known aliases: Any specific gene symbols or protein names you use?
Constraints: Open access only? Include preprints? Specific organisms?
Methods appendix: Do you want methodology details in a separate file?

目标类型：这是生物靶点（基因/蛋白）、通用主题还是疾病？
范围：这是需验证的单一事实（如“哪种抗生素？”、“哪个菌株？”、“哪一年？”）还是全面/深度综述？
已知别名：你使用的特定基因符号或蛋白名称是什么？
约束条件：仅开放获取文献？是否包含预印本？特定物种？
方法附录：是否需要将方法细节单独放在一个文件中？

Mode Selection (CRITICAL)

模式选择（关键）

Pick exactly one mode based on the user’s intent and the question structure:

Factoid / Verification Mode (single concrete question; answer should be a short phrase/sentence)
Mini-review Mode (narrow topic; 1–3 pages of synthesis)
Full Deep-Research Mode (use the full template + completeness checklist)

Heuristic:

If the user asks “X has been evolved to be resistant to which antibiotic?” → Factoid / Verification Mode
If the user asks “What does the literature say about X?” → Full Deep-Research Mode

根据用户意图和问题结构，精确选择一种模式：

事实类/验证模式（单一具体问题；答案应为短语/短句）
迷你综述模式（窄主题；1-3页合成内容）
完整深度研究模式（使用完整模板 + 完整性清单）

判断规则:

如果用户问“X进化出了对哪种抗生素的抗性？” → 事实类/验证模式
如果用户问“文献中关于X的内容有哪些？” → 完整深度研究模式

Factoid / Verification Mode (Fast Path)

事实类/验证模式（快速路径）

Goal: Provide a correct, source-verified single answer, with minimal but explicit evidence attribution.

Deliverables (still file-backed):

```
[topic]_factcheck_report.md
```
(≤ 1 page)
```
[topic]_bibliography.json
```
(+ CSV) containing the key paper(s)

Fact-check report template:

markdown

undefined

目标：提供经来源验证的正确单一答案，附带最少但明确的证据归因。

交付物（仍以文件形式提供）:

```
[topic]_factcheck_report.md
```
（≤1页）
```
[topic]_bibliography.json
```
（+ CSV格式），包含关键文献

事实核查报告模板:

markdown

undefined

[TOPIC]: Fact-check Report

[主题]：事实核查报告

Generated: [Date] Evidence cutoff: [Date]

生成日期：[日期] 证据截止日期：[日期]

Question

问题

[User question]

[用户的问题]

Answer

答案

[One-sentence answer] [Evidence: ★★★/★★☆/★☆☆/☆☆☆]

[一句话答案] [证据：★★★/★★☆/★☆☆/☆☆☆]

Source(s)

来源

[Primary paper citation: journal/year/PMID/DOI as available]

[主要文献引用：期刊/年份/PMID/DOI（如有）]

Verification Notes

验证说明

[1–3 bullets: where in the paper the statement appears (Abstract/Results/Methods), and any key constraints]

[1-3条要点：该陈述在文献的哪个部分出现（摘要/结果/方法），以及任何关键约束条件]

Limitations

局限性

[If full text not available, or if only review evidence exists]


**Required verification behavior**:
- Prefer ToolUniverse literature tools (Europe PMC / PubMed / PMC / Semantic Scholar) over general web browsing.
- Use full-text snippet verification when possible (Europe PMC auto-snippet tier is ideal).
- Avoid adding extra claims (e.g., “not X”) unless the paper explicitly supports them.

**Suggested tool pattern**:
- `EuropePMC_search_articles(query=..., extract_terms_from_fulltext=[...])` to pull OA full-text snippets for the key terms.
- If OA snippets unavailable: fall back to `PMC_search_papers` (if in PMC) or `SemanticScholar_search_papers` → `SemanticScholar_get_pdf_snippets`.

**Evidence grading (factoid)**:
- If the statement is explicitly made in a primary experimental paper (Results/Methods/Abstract): label **T1 (★★★)**.
- If it’s only in a review: label **T4 (☆☆☆)** and try to locate the primary source.

[如果无法获取全文，或仅能找到综述类证据]


**必做验证行为**:
- 优先使用ToolUniverse文献工具（Europe PMC / PubMed / PMC / Semantic Scholar）而非通用网页浏览。
- 尽可能使用全文片段验证（Europe PMC的自动片段层为理想选择）。
- 除非文献明确支持，否则不要添加额外主张（如“不是X”）。

**推荐工具使用模式**:
- 使用`EuropePMC_search_articles(query=..., extract_terms_from_fulltext=[...])`从开放获取全文中提取关键术语的片段。
- 如果无法获取开放获取片段： fallback到`PMC_search_papers`（若文献在PMC中）或`SemanticScholar_search_papers` → `SemanticScholar_get_pdf_snippets`。

**事实类证据分级**:
- 如果陈述在原始实验论文（结果/方法/摘要）中明确提出：标记为**T1（★★★）**。
- 如果仅出现在综述中：标记为**T4（☆☆☆）**，并尝试查找原始来源。

Detect Target Type

目标类型检测

Query Pattern	Type	Action
Gene symbol (EGFR, TP53, ATP6V1A)	Biological target	Phase 1 required
Protein name ("V-ATPase", "kinase")	Biological target	Phase 1 required
UniProt ID (P00533, Q93050)	Biological target	Phase 1 required
Disease, pathway, method	General topic	Phase 1 optional
"Literature on X"	Depends on X	Assess X

查询模式	类型	操作
基因符号（EGFR, TP53, ATP6V1A）	生物靶点	必须执行阶段1
蛋白名称（"V-ATPase", "kinase"）	生物靶点	必须执行阶段1
UniProt ID（P00533, Q93050）	生物靶点	必须执行阶段1
疾病、通路、方法	通用主题	阶段1可选
"关于X的文献"	取决于X	评估X的类型

Phase 1: Target Disambiguation + Profile (Default ON)

阶段1：目标消歧 + 概况分析（默认开启）

CRITICAL: This phase prevents "missing target details" when literature is sparse or noisy.

关键：当文献稀少或存在干扰信息时，此阶段可避免“遗漏靶点细节”。

1.1 Resolve Official Identifiers

1.1 解析官方标识符

Use these tools to establish canonical identity:

UniProt_search → Get UniProt accession for human protein
UniProt_get_entry_by_accession → Full entry with cross-references
UniProt_id_mapping → Map between ID types
ensembl_lookup_gene → Ensembl gene ID, biotype
MyGene_get_gene_annotation → NCBI Gene ID, aliases, summary

Output for report:

markdown

undefined

使用以下工具确立标准身份：

UniProt_search → 获取人类蛋白的UniProt登录号
UniProt_get_entry_by_accession → 带交叉引用的完整条目
UniProt_id_mapping → 不同ID类型间的映射
ensembl_lookup_gene → Ensembl基因ID、生物类型
MyGene_get_gene_annotation → NCBI基因ID、别名、摘要

报告输出内容:

markdown

undefined

Target Identity

靶点身份

Identifier	Value	Source
Official Symbol	ATP6V1A	HGNC
UniProt	P38606	UniProt
Ensembl Gene	ENSG00000114573	Ensembl
NCBI Gene ID	523	NCBI
ChEMBL Target	CHEMBL2364682	ChEMBL

Full Name: V-type proton ATPase catalytic subunit A Synonyms/Aliases: ATP6A1, VPP2, Vma1, VA68

undefined

标识符	取值	来源
官方符号	ATP6V1A	HGNC
UniProt	P38606	UniProt
Ensembl基因	ENSG00000114573	Ensembl
NCBI基因ID	523	NCBI
ChEMBL靶点	CHEMBL2364682	ChEMBL

全名：V型质子ATP酶催化亚基A 同义词/别名：ATP6A1, VPP2, Vma1, VA68

undefined

1.2 Identify Naming Collisions

1.2 识别命名冲突

CRITICAL: Many gene names have collisions. Examples:

TRAG: T-cell regulatory gene vs bacterial TraG conjugation protein
WDR7-7: Could match gene WDR7 vs lncRNA
JAK: Janus kinase vs Just Another Kinase
CAT: Catalase vs chloramphenicol acetyltransferase

Detection strategy:

Search PubMed for
```
"[SYMBOL]"[Title]
```
- review first 20 titles
If >20% off-topic, identify collision terms

Build negative filter:

NOT [collision_term1] NOT [collision_term2]

Output for report:

markdown

undefined

关键：许多基因名称存在冲突。示例：

TRAG：T细胞调节基因 vs 细菌TraG结合蛋白
WDR7-7：可能匹配基因WDR7或lncRNA
JAK：Janus激酶 vs Just Another Kinase
CAT：过氧化氢酶 vs 氯霉素乙酰转移酶

检测策略:

在PubMed中搜索
```
"[符号]"[Title]
```
- 查看前20篇标题
如果超过20%的内容偏离主题，识别冲突术语
构建负面过滤器：
```
NOT [冲突术语1] NOT [冲突术语2]
```

报告输出内容:

markdown

undefined

Known Naming Collisions

已知命名冲突

Symbol "ATP6V1A" is unambiguous (no major collisions detected)
Related but distinct: ATP6V0A1-4 (V0 subunits vs V1 subunits)
Search filter applied: Include "vacuolar" OR "V-ATPase", exclude "V0 domain" when V1-specific

undefined

符号"ATP6V1A"无歧义（未检测到主要冲突）
相关但不同的靶点：ATP6V0A1-4（V0亚基 vs V1亚基）
应用的搜索过滤器：包含"vacuolar"或"V-ATPase"，当聚焦V1亚基时排除"V0 domain"

undefined

1.3 Protein Architecture & Domains

1.3 蛋白架构与结构域

Use annotation tools (not literature):

InterPro_get_protein_domains → Domain architecture
UniProt_get_ptm_processing_by_accession → PTMs, active sites
proteins_api_get_protein → Additional protein features

Output for report:

markdown

undefined

使用注释工具（而非文献）：

InterPro_get_protein_domains → 结构域架构
UniProt_get_ptm_processing_by_accession → 翻译后修饰、活性位点
proteins_api_get_protein → 额外蛋白特征

报告输出内容:

markdown

undefined

Protein Architecture

蛋白架构

Domain	Position	InterPro ID	Function
V-ATPase A subunit, N-terminal	1-90	IPR022879	ATP binding
V-ATPase A subunit, catalytic	91-490	IPR005725	Catalysis
V-ATPase A subunit, C-terminal	491-617	IPR022878	Complex assembly

Length: 617 aa | Isoforms: 2 (canonical P38606-1, variant P38606-2 missing aa 1-45) Active sites: Lys-168 (ATP binding), Glu-261 (catalytic)

Sources: InterPro, UniProt

undefined

结构域	位置	InterPro ID	功能
V-ATPase A亚基，N端	1-90	IPR022879	ATP结合
V-ATPase A亚基，催化区	91-490	IPR005725	催化作用
V-ATPase A亚基，C端	491-617	IPR022878	复合物组装

长度：617个氨基酸 | 异构体：2种（标准型P38606-1，变异型P38606-2缺失1-45位氨基酸） 活性位点：Lys-168（ATP结合）、Glu-261（催化）

来源：InterPro、UniProt

undefined

1.4 Subcellular Location

1.4 亚细胞定位

HPA_get_subcellular_location → Human Protein Atlas localization
UniProt_get_subcellular_location_by_accession → UniProt annotation

Output for report:

markdown

undefined

HPA_get_subcellular_location → 人类蛋白图谱定位信息
UniProt_get_subcellular_location_by_accession → UniProt注释

报告输出内容:

markdown

undefined

Subcellular Localization

亚细胞定位

Location	Confidence	Source
Lysosome membrane	High	HPA + UniProt
Endosome membrane	High	UniProt
Golgi apparatus	Medium	HPA
Plasma membrane (subset)	Low	Literature

Primary location: Lysosomal/endosomal membranes (vacuolar ATPase complex) Sources: Human Protein Atlas, UniProt

undefined

位置	置信度	来源
溶酶体膜	高	HPA + UniProt
内体膜	高	UniProt
高尔基体	中	HPA
质膜（亚群）	低	文献

主要定位：溶酶体/内体膜（液泡ATP酶复合物） 来源：人类蛋白图谱、UniProt

undefined

1.5 Baseline Expression

1.5 基础表达情况

GTEx_get_median_gene_expression → Tissue expression (TPM)
HPA_get_rna_expression_by_source → HPA expression data

Output for report:

markdown

undefined

GTEx_get_median_gene_expression → 组织表达（TPM）
HPA_get_rna_expression_by_source → HPA表达数据

报告输出内容:

markdown

undefined

Baseline Tissue Expression

基础组织表达

Tissue	Expression (TPM)	Specificity
Kidney cortex	145.3	Elevated
Liver	98.7	Medium
Brain - Cerebellum	87.2	Medium
Lung	76.4	Medium
Ubiquitous baseline	~50	Broad

Tissue Specificity: Low (τ = 0.28) - broadly expressed housekeeping gene Source: GTEx v8

undefined

组织	表达量（TPM）	特异性
肾皮质	145.3	高表达
肝脏	98.7	中等
脑 - 小脑	87.2	中等
肺	76.4	中等
普遍基础表达	~50	广泛

组织特异性：低（τ = 0.28） - 广泛表达的管家基因 来源：GTEx v8

undefined

1.6 GO Terms & Pathway Placement

1.6 GO术语与通路定位

GO_get_annotations_for_gene → GO annotations
Reactome_map_uniprot_to_pathways → Reactome pathways
kegg_get_gene_info → KEGG pathways
OpenTargets_get_target_gene_ontology_by_ensemblID → Open Targets GO

Output for report:

markdown

undefined

GO_get_annotations_for_gene → GO注释
Reactome_map_uniprot_to_pathways → Reactome通路
kegg_get_gene_info → KEGG通路
OpenTargets_get_target_gene_ontology_by_ensemblID → Open Targets GO注释

报告输出内容:

markdown

undefined

Functional Annotations (GO)

功能注释（GO）

Molecular Function:

ATP hydrolysis activity (GO:0016887) [Evidence: IDA]
Proton-transporting ATPase activity (GO:0046961) [Evidence: IDA]

Biological Process:

Lysosomal acidification (GO:0007041) [Evidence: IMP]
Autophagy (GO:0006914) [Evidence: IMP]
Bone resorption (GO:0045453) [Evidence: IMP]

Cellular Component:

Vacuolar proton-transporting V-type ATPase, V1 domain (GO:0000221) [Evidence: IDA]

分子功能:

ATP水解活性（GO:0016887）[证据：IDA]
质子转运ATP酶活性（GO:0046961）[证据：IDA]

生物过程:

溶酶体酸化（GO:0007041）[证据：IMP]
自噬（GO:0006914）[证据：IMP]
骨吸收（GO:0045453）[证据：IMP]

细胞组分:

液泡质子转运V型ATP酶V1结构域（GO:0000221）[证据：IDA]

Pathway Involvement

通路参与情况

Pathway	Database	Significance
Lysosome	KEGG hsa04142	Core component
Phagosome	KEGG hsa04145	Acidification
Autophagy - animal	Reactome R-HSA-9612973	mTORC1 regulation

Sources: GO Consortium, Reactome, KEGG

---

通路	数据库	重要性
溶酶体	KEGG hsa04142	核心组分
吞噬体	KEGG hsa04145	酸化作用
自噬 - 动物	Reactome R-HSA-9612973	mTORC1调控

来源：GO联盟、Reactome、KEGG

---

Phase 2: Literature Search (Internal Methodology)

阶段2：文献搜索（内部方法）

NOTE: This methodology is kept internal. The report shows findings, not process.

注意：此方法为内部内容，报告中仅展示研究结果，不展示过程。

2.1 Query Strategy: Collision-Aware Synonym Plan

2.1 查询策略：冲突感知同义词方案

Step 1: High-Precision Seed Queries (Build Mechanistic Core)

步骤1：高精度种子查询（构建机制研究核心）

Query 1: "[GENE_SYMBOL]"[Title] AND (mechanism OR function OR structure)
Query 2: "[FULL_PROTEIN_NAME]"[Title] 
Query 3: "[UNIPROT_ID]" (catches supplementary materials)

Purpose: Get 15-30 high-confidence, mechanistic papers that are definitely on-target.

查询1："[基因符号]"[Title] AND (mechanism OR function OR structure)
查询2："[完整蛋白名称]"[Title] 
查询3："[UniProt ID]"（可捕获补充材料）

目的：获取15-30篇高置信度、明确针对目标的机制研究论文。

Step 2: Citation Network Expansion (Especially for Sparse Targets)

步骤2：引文网络扩展（尤其适用于文献稀少的靶点）

Once you have 5-15 core PMIDs:

PubMed_get_cited_by → Papers citing each seed
PubMed_get_related → Computationally related papers  
EuropePMC_get_citations → Alternative citation source
EuropePMC_get_references → Backward citations from seeds

Citation-network first option: For older targets with deprecated terminology, citation expansion often outperforms keyword searching.

一旦获得5-15篇核心PMID：

PubMed_get_cited_by → 引用每篇种子文献的论文
PubMed_get_related → 计算相关的论文  
EuropePMC_get_citations → 备选引文来源
EuropePMC_get_references → 种子文献的参考文献

优先使用引文网络的情况：对于使用过时术语的老靶点，引文扩展的效果通常优于关键词搜索。

Step 3: Collision-Filtered Broader Queries

步骤3：冲突过滤后的宽泛查询

Broader query: "[GENE_SYMBOL]" AND ([pathway1] OR [pathway2] OR [function])
Apply collision filter: NOT [collision_term1] NOT [collision_term2]

Example for bacterial TraG collision:

"TRAG" AND (T-cell OR immune OR cancer) NOT plasmid NOT conjugation NOT bacterial

宽泛查询："[基因符号]" AND ([通路1] OR [通路2] OR [功能])
应用冲突过滤器：NOT [冲突术语1] NOT [冲突术语2]

针对细菌TraG冲突的示例：

"TRAG" AND (T-cell OR immune OR cancer) NOT plasmid NOT conjugation NOT bacterial

2.2 Database Tools

2.2 数据库工具

Literature Search (use all relevant):

```
PubMed_search_articles
```
- Primary biomedical
```
PMC_search_papers
```
- Full-text
```
EuropePMC_search_articles
```
- European coverage
```
openalex_literature_search
```
- Broad academic
```
Crossref_search_works
```
- DOI registry
```
SemanticScholar_search_papers
```
- AI-ranked

BioRxiv_search_preprints

MedRxiv_search_preprints

- Preprints

Citation Tools (with failure handling):

```
PubMed_get_cited_by
```
- Primary (NCBI elink can be flaky)
```
EuropePMC_get_citations
```
- Fallback when PubMed fails
```
PubMed_get_related
```
- Related articles
```
EuropePMC_get_references
```
- Reference lists

Annotation Tools (not literature, but fill gaps):

```
UniProt_*
```
tools - Protein data
```
InterPro_get_protein_domains
```
- Domains
```
GTEx_*
```
tools - Expression
```
HPA_*
```
tools - Human Protein Atlas
```
OpenTargets_*
```
tools - Target-disease associations
```
GO_get_annotations_for_gene
```
- GO terms

文献搜索（使用所有相关工具）:

```
PubMed_search_articles
```
- 主要生物医学文献
```
PMC_search_papers
```
- 全文文献
```
EuropePMC_search_articles
```
- 欧洲地区覆盖
```
openalex_literature_search
```
- 广泛学术文献
```
Crossref_search_works
```
- DOI注册库
```
SemanticScholar_search_papers
```
- AI排序

BioRxiv_search_preprints

MedRxiv_search_preprints

- 预印本

引文工具（含故障处理）:

```
PubMed_get_cited_by
```
- 主要工具（NCBI elink可能不稳定）
```
EuropePMC_get_citations
```
- 当PubMed失败时的备选工具
```
PubMed_get_related
```
- 相关文章
```
EuropePMC_get_references
```
- 参考文献列表

注释工具（非文献工具，用于填补信息空白）:

```
UniProt_*
```
系列工具 - 蛋白数据
```
InterPro_get_protein_domains
```
- 结构域信息
```
GTEx_*
```
系列工具 - 表达数据
```
HPA_*
```
系列工具 - 人类蛋白图谱
```
OpenTargets_*
```
系列工具 - 靶点-疾病关联
```
GO_get_annotations_for_gene
```
- GO术语

2.3 Full-Text Verification Strategy

2.3 全文验证策略

WHEN TO USE: Abstracts lack critical experimental details (exact drugs, cell lines, concentrations, specific protocols).

Three-Tier Strategy:

使用场景：摘要缺少关键实验细节（如具体药物、细胞系、浓度、特定方案）。

三级策略:

Tier 1: Auto-Snippet Mode (Europe PMC) - FASTEST

层级1：自动片段模式（Europe PMC）- 最快

Use for: Exploratory queries with 3-5 specific terms

python

results = EuropePMC_search_articles(
    query="bacterial antibiotic resistance evolution",
    limit=10,
    extract_terms_from_fulltext=["ciprofloxacin", "meropenem", "A. baumannii", "MIC"]
)

适用场景：包含3-5个特定术语的探索性查询

python

results = EuropePMC_search_articles(
    query="bacterial antibiotic resistance evolution",
    limit=10,
    extract_terms_from_fulltext=["ciprofloxacin", "meropenem", "A. baumannii", "MIC"]
)

Check which articles have full-text snippets

检查哪些文章有全文片段

for article in results: if "fulltext_snippets" in article: # Snippets automatically extracted from OA full text for snippet in article["fulltext_snippets"]: # Use snippet["term"] and snippet["snippet"] for verification pass


**Advantages**:
- ✅ Single tool call (search + snippets)
- ✅ Bounded latency (max 3 OA articles, ~3-5 seconds total)
- ✅ No manual URL extraction
- ✅ Max 5 search terms

**Limitations**:
- ❌ Only works for OA articles with fullTextXML
- ❌ Limited to first 3 OA articles
- ❌ Europe PMC coverage only (~30-40% OA)

**When to use**: Initial exploration, quick verification of 1-2 papers

for article in results: if "fulltext_snippets" in article: # 片段自动从开放获取全文中提取 for snippet in article["fulltext_snippets"]: # 使用snippet["term"]和snippet["snippet"]进行验证 pass


**优势**:
- ✅ 单次工具调用（搜索 + 片段提取）
- ✅ 延迟可控（最多3篇开放获取文章，总耗时约3-5秒）
- ✅ 无需手动提取URL
- ✅ 最多支持5个搜索术语

**局限性**:
- ❌ 仅适用于提供fullTextXML的开放获取文章
- ❌ 仅支持前3篇开放获取文章
- ❌ 仅覆盖Europe PMC（约30-40%的开放获取文献）

**使用时机**：初步探索、快速验证1-2篇文献

Tier 2: Manual Two-Step (Semantic Scholar, ArXiv) - TARGETED

层级2：手动两步法（Semantic Scholar、ArXiv）- 精准

Use for: Specific high-value papers you identified from search

python

undefined

适用场景：针对搜索到的特定高价值文献

python

undefined

Step 1: Search

步骤1：搜索

papers = SemanticScholar_search_papers( query="machine learning interpretability", limit=10 )

Step 2: Extract from specific OA papers

步骤2：从特定开放获取文献中提取片段

for paper in papers: if paper.get("open_access_pdf_url"): snippets = SemanticScholar_get_pdf_snippets( open_access_pdf_url=paper["open_access_pdf_url"], terms=["SHAP", "gradient attribution", "layer-wise relevance"], window_chars=300 ) if snippets["status"] == "success": # Process snippets["snippets"] pass


**ArXiv variant** (100% OA, no paywall):

```python


**ArXiv变体**（100%开放获取，无付费墙）:

```python

All arXiv papers are freely available

所有ArXiv文献均可免费获取

snippets = ArXiv_get_pdf_snippets( arxiv_id="2301.12345", terms=["attention mechanism", "self-attention", "layer normalization"], max_snippets_per_term=5 )


**Advantages**:
- ✅ Full control over which papers to process
- ✅ Adjustable window size (20-2000 chars)
- ✅ Works for Semantic Scholar (~15-20% OA PDFs) and ArXiv (100%)
- ✅ Can process any number of papers

**Limitations**:
- ❌ Two tool calls per article (search → extract)
- ❌ Manual loop needed
- ❌ Slower than auto-snippet mode

**When to use**: Thorough review of key papers, preprint analysis

snippets = ArXiv_get_pdf_snippets( arxiv_id="2301.12345", terms=["attention mechanism", "self-attention", "layer normalization"], max_snippets_per_term=5 )


**优势**:
- ✅ 完全控制要处理的文献
- ✅ 可调整窗口大小（20-2000字符）
- ✅ 支持Semantic Scholar（约15-20%的开放获取PDF）和ArXiv（100%）
- ✅ 可处理任意数量的文献

**局限性**:
- ❌ 每篇文献需两次工具调用（搜索 → 提取）
- ❌ 需要手动循环处理
- ❌ 比自动片段模式慢

**使用时机**：对关键文献进行深入综述、预印本分析

Tier 3: Manual Download + Parse (Fallback) - SLOWEST

层级3：手动下载 + 解析（备选）- 最慢

Use for: Paywalled content via institutional access

python

undefined

适用场景：通过机构访问获取付费内容

python

undefined

For paywalled PDFs accessible via institution

针对可通过机构访问的付费PDF

webpage_text = get_webpage_text_from_url( url="https://doi.org/10.1016/...", # Requires institutional proxy or VPN )

webpage_text = get_webpage_text_from_url( url="https://doi.org/10.1016/...", # 需要机构代理或VPN )

Extract relevant sections manually

手动提取相关章节

if "Methods" in webpage_text: # Parse methods section pass


**Limitations**:
- ❌ Requires institutional access
- ❌ No snippet extraction (full HTML)
- ❌ Quality varies by publisher
- ❌ Slowest approach

**When to use**: Last resort for critical paywalled papers

if "Methods" in webpage_text: # 解析方法章节 pass


**局限性**:
- ❌ 需要机构访问权限
- ❌ 无片段提取（仅完整HTML）
- ❌ 质量因出版商而异
- ❌ 最慢的方法

**使用时机**：关键付费文献的最后手段

Decision Matrix

决策矩阵

Scenario	Recommended Tier	Rationale
Quick verification ("Which antibiotic?")	Tier 1 (Auto-snippet)	Fast, single call
Preprint deep-dive (arXiv, bioRxiv)	Tier 2 (Manual ArXiv)	100% coverage, no paywall
High-value paper deep analysis	Tier 2 (Manual S2)	Precise control
Systematic review (50+ papers)	Tier 1 + Tier 2	Auto for OA, manual for key papers
Paywalled critical paper	Tier 3 (Manual download)	Only option

场景	推荐层级	理由
快速验证（“哪种抗生素？”）	层级1（自动片段）	快速，单次调用
预印本深入分析（arXiv、bioRxiv）	层级2（手动ArXiv）	100%覆盖，无付费墙
高价值文献深度分析	层级2（手动S2）	精准控制
系统综述（50+篇文献）	层级1 + 层级2	自动处理开放获取文献，手动处理关键文献
关键付费文献	层级3（手动下载）	唯一选项

Best Practices

最佳实践

1. Limit search terms to 3-5 specific keywords:

✅ Good:

["ciprofloxacin 5 μg/mL", "HEK293 cells", "RNA-seq"]

❌ Bad:
```
["drug", "method", "significant"]
```
(too broad)

2. Check OA status before extraction:

python

if article.get("open_access") and article.get("fulltext_xml_url"):
    # Proceed with extraction
    pass

3. Adjust window size for context:

Methods: 400-500 chars (full sentences)
Quick verification: 150-200 chars
Default: 220 chars (balanced)

4. Handle failures gracefully:

python

if "fulltext_snippets" not in article:
    # Fallback: use abstract or skip
    print(f"No full text available: {article['title']}")

5. Document full-text sources in report:

markdown

undefined

1. 将搜索术语限制为3-5个特定关键词:

✅ 推荐：

["ciprofloxacin 5 μg/mL", "HEK293 cells", "RNA-seq"]

❌ 不推荐：
```
["drug", "method", "significant"]
```
（过于宽泛）

2. 提取前检查开放获取状态:

python

if article.get("open_access") and article.get("fulltext_xml_url"):
    # 继续提取
    pass

3. 根据上下文调整窗口大小:

方法章节：400-500字符（完整句子）
快速验证：150-200字符
默认：220字符（平衡上下文和精准度）

4. 优雅处理失败情况:

python

if "fulltext_snippets" not in article:
    # 备选方案：使用摘要或跳过
    print(f"无全文可用：{article['title']}")

5. 在报告中记录全文来源:

markdown

undefined

Methods Verification

方法验证

Antibiotic concentrations (verified from full text):

Study A: Ciprofloxacin 5 μg/mL [PMC12345, Methods section]
Study B: Meropenem 8 μg/mL [arXiv:2301.12345, Experimental Design]

Note: Full-text verification performed on 8/15 OA papers (53% coverage)

undefined

抗生素浓度（从全文验证）:

研究A：环丙沙星5 μg/mL [PMC12345，方法章节]
研究B：美罗培南8 μg/mL [arXiv:2301.12345，实验设计]

注：对15篇开放获取文献中的8篇进行了全文验证（覆盖率53%）

undefined

2.5 Tool Failure Handling

2.5 工具故障处理

Automatic retry strategy:

Attempt 1: Call tool
If timeout/error:
  Wait 2 seconds
  Attempt 2: Retry
If still fails:
  Wait 5 seconds  
  Attempt 3: Try fallback tool
If fallback fails:
  Document "Data unavailable" in report

Fallback chains:

Primary Tool	Fallback 1	Fallback 2
`PubMed_get_cited_by`	`EuropePMC_get_citations`	OpenAlex citations
`PubMed_get_related`	SemanticScholar recommendations	Manual keyword search
`GTEx_get_median_gene_expression`	`HPA_get_rna_expression_by_source`	Document as unavailable
`Unpaywall_check_oa_status`	Europe PMC OA flags	OpenAlex OA field

自动重试策略:

尝试1：调用工具
如果超时/错误:
  等待2秒
  尝试2：重试
如果仍失败:
  等待5秒  
  尝试3：使用备选工具
如果备选工具也失败:
  在报告中记录“数据不可用”

备选工具链:

主工具	备选1	备选2
`PubMed_get_cited_by`	`EuropePMC_get_citations`	OpenAlex引文
`PubMed_get_related`	SemanticScholar推荐文献	手动关键词搜索
`GTEx_get_median_gene_expression`	`HPA_get_rna_expression_by_source`	记录为不可用
`Unpaywall_check_oa_status`	Europe PMC开放获取标记	OpenAlex开放获取字段

2.6 Open Access Handling (Best-Effort)

2.6 开放获取处理（尽最大努力）

If Unpaywall email provided: Check OA status for all papers with DOIs

If no Unpaywall email: Use best-effort OA signals:

Europe PMC:
```
isOpenAccess
```
field
PMC: All PMC papers are OA
OpenAlex:
```
is_oa
```
field
DOAJ: All DOAJ papers are OA

Label in report:

markdown

*OA Status: Best-effort (Unpaywall not configured)*

如果提供Unpaywall邮箱：检查所有带DOI文献的开放获取状态

如果未提供Unpaywall邮箱：使用尽最大努力的开放获取信号:

Europe PMC：
```
isOpenAccess
```
字段
PMC：所有PMC文献均为开放获取
OpenAlex：
```
is_oa
```
字段
DOAJ：所有DOAJ文献均为开放获取

报告中标记:

markdown

*开放获取状态：尽最大努力检测（未配置Unpaywall）*

Phase 3: Evidence Grading

阶段3：证据分级

CRITICAL: Grade every claim by evidence strength to prevent low-signal mentions from diluting the report.

关键：为每个主张按证据强度分级，避免低价值信息稀释报告内容。

Evidence Tiers

证据层级

Tier	Label	Description	Example
T1	★★★ Mechanistic	In-target mechanistic study with direct experimental evidence	CRISPR KO + rescue
T2	★★☆ Functional	Functional study showing role (may be in pathway context)	siRNA knockdown phenotype
T3	★☆☆ Association	Screen hit, GWAS association, correlation	High-throughput screen
T4	☆☆☆ Mention	Review mention, text-mined interaction, peripheral reference	Review article

层级	标签	描述	示例
T1	★★★ 机制研究	针对目标的机制研究，带有直接实验证据	CRISPR敲除 + 回复实验
T2	★★☆ 功能研究	展示作用的功能研究（可能在通路背景下）	siRNA敲低表型
T3	★☆☆ 关联研究	筛选结果、GWAS关联、相关性	高通量筛选
T4	☆☆☆ 提及	综述提及、文本挖掘的相互作用、外围参考文献	综述文章

How to Apply

分级应用方法

In report, label sections and claims:

markdown

undefined

在报告中，为章节和主张添加标签：

markdown

undefined

Mechanism of Action

作用机制

ATP6V1A is the catalytic subunit responsible for ATP hydrolysis in the V-ATPase complex [★★★ Mechanistic: PMID:12345678]. Loss-of-function mutations cause vacuolar pH dysregulation [★★★: PMID:23456789].

The target has been implicated in mTORC1 signaling through lysosomal amino acid sensing [★★☆ Functional: PMID:34567890], though direct interaction data is limited.

A genome-wide screen identified ATP6V1A as essential in cancer cell lines [★☆☆ Association: PMID:45678901, DepMap].

undefined

ATP6V1A是V-ATPase复合物中负责ATP水解的催化亚基 [★★★ 机制研究：PMID:12345678]。功能缺失突变会导致液泡pH失调 [★★★：PMID:23456789]。

该靶点通过溶酶体氨基酸感应参与mTORC1信号通路 [★★☆ 功能研究：PMID:34567890]，但直接相互作用数据有限。

全基因组筛选发现ATP6V1A在癌细胞系中是必需的 [★☆☆ 关联研究：PMID:45678901，DepMap]。

undefined

Theme-Level Grading

主题层面的分级

For each theme section, summarize evidence quality:

markdown

undefined

针对每个主题章节，总结证据质量：

markdown

undefined

3.1 Lysosomal Acidification (12 papers)

3.1 溶酶体酸化（12篇文献）

Evidence Quality: Strong (8 mechanistic, 3 functional, 1 association)

[Theme content...]

---

证据质量：强（8篇机制研究、3篇功能研究、1篇关联研究）

[主题内容...]

---

Report Structure: Mandatory Completeness Checklist

报告结构：强制完整性清单

CRITICAL: This checklist/template applies to Full Deep-Research Mode. For Factoid / Verification Mode, use a short fact-check report (see Phase 0) and do not force the full 15-section template.

关键：此清单/模板适用于完整深度研究模式。对于事实类/验证模式，使用简短的事实核查报告（见阶段0），无需使用完整的15章节模板。

Output Files

输出文件

[topic]_report.md
- Main narrative report (Full Deep-Research Mode)
[topic]_factcheck_report.md
- Short verification report (Factoid / Verification Mode)
[topic]_bibliography.json
- Full deduplicated bibliography (always created)
methods_appendix.md
- Methodology details (ONLY if user requests)

[topic]_report.md
- 主要叙述性报告（完整深度研究模式）
[topic]_factcheck_report.md
- 简短验证报告（事实类/验证模式）
[topic]_bibliography.json
- 完整去重参考文献列表（始终生成）
methods_appendix.md
- 方法细节（仅在用户要求时提供）

Report Template

报告模板

markdown

undefined

markdown

undefined

[TARGET/TOPIC]: Comprehensive Research Report

[靶点/主题]：全面研究报告

Generated: [Date] Evidence cutoff: [Date] Total unique papers: [N]

生成日期：[日期] 证据截止日期：[日期] 总独特文献数：[N]

Executive Summary

执行摘要

[2-3 paragraphs synthesizing key findings across all sections]

Bottom Line: [One-sentence actionable conclusion]

[2-3段综合所有章节的关键发现]

核心结论：[一句话可执行结论]

1. Target Identity & Aliases

1. 靶点身份与别名

[MANDATORY - even for non-target topics, clarify scope]

[必填 - 即使是非靶点主题，也要明确范围]

1.1 Official Identifiers

1.1 官方标识符

[Table of IDs or scope definition]

[ID表格或范围定义]

1.2 Synonyms and Aliases

1.2 同义词与别名

[List all known names - critical for complete literature coverage]

[列出所有已知名称 - 对全面文献覆盖至关重要]

1.3 Known Naming Collisions

1.3 已知命名冲突

[Document collisions and how they were handled]

[记录冲突及处理方式]

2. Protein Architecture

2. 蛋白架构

[MANDATORY for protein targets; state "N/A - not a protein target" otherwise]

[蛋白靶点必填；非蛋白靶点请注明“不适用 - 非蛋白靶点”]

2.1 Domain Structure

2.1 结构域结构

[Table of domains with positions, InterPro IDs]

[带位置、InterPro ID的结构域表格]

2.2 Isoforms

2.2 异构体

[List isoforms, functional differences if known]

[列出异构体及已知功能差异]

2.3 Key Structural Features

2.3 关键结构特征

[Active sites, binding sites, PTMs]

[活性位点、结合位点、翻译后修饰]

2.4 Available Structures

2.4 可用结构

[PDB entries, AlphaFold availability]

[PDB条目、AlphaFold可用性]

3. Complexes & Interaction Partners

3. 复合物与相互作用伙伴

[MANDATORY]

[必填]

3.1 Known Complexes

3.1 已知复合物

[List complexes the protein participates in]

[列出蛋白参与的复合物]

3.2 Direct Interactors

3.2 直接相互作用物

[Table of top interactors with evidence type and scores]

[带证据类型和得分的主要相互作用物表格]

3.3 Functional Interaction Network

3.3 功能相互作用网络

[Describe network context]

[描述网络背景]

4. Subcellular Localization

4. 亚细胞定位

[MANDATORY]

[Table of locations with confidence levels and sources]

[必填]

[带置信度和来源的定位表格]

5. Expression Profile

5. 表达概况

[MANDATORY]

[必填]

5.1 Tissue Expression

5.1 组织表达

[Table of top tissues with TPM values]

[带TPM值的主要组织表格]

5.2 Cell-Type Expression

5.2 细胞类型表达

[If single-cell data available]

[如有单细胞数据]

5.3 Disease-Specific Expression

5.3 疾病特异性表达

[Expression changes in disease contexts]

[疾病背景下的表达变化]

6. Core Mechanisms

6. 核心机制

[MANDATORY - this is the heart of the report]

[必填 - 报告的核心内容]

6.1 Molecular Function

6.1 分子功能

[What the protein does biochemically] Evidence Quality: [Strong/Moderate/Limited]

[蛋白的生化功能] 证据质量：[强/中等/有限]

6.2 Biological Role

6.2 生物学作用

[Role in cellular/organismal context] Evidence Quality: [Strong/Moderate/Limited]

[在细胞/生物层面的作用] 证据质量：[强/中等/有限]

6.3 Key Pathways

6.3 关键通路

[Pathway involvement with evidence grades]

[参与的通路及证据分级]

6.4 Regulation

6.4 调控机制

[How the target is regulated]

[靶点的调控方式]

7. Model Organism Evidence

7. 模式生物证据

[MANDATORY]

[必填]

7.1 Mouse Models

7.1 小鼠模型

[Knockout/knockin phenotypes, if any]

[敲除/敲入表型（如有）]

7.2 Other Model Organisms

7.2 其他模式生物

[Yeast, fly, zebrafish, worm data if relevant]

[酵母、果蝇、斑马鱼、线虫数据（如相关）]

7.3 Cross-Species Conservation

7.3 跨物种保守性

[Conservation and functional studies]

[保守性及功能研究]

8. Human Genetics & Variants

8. 人类遗传学与变异

[MANDATORY]

[必填]

8.1 Constraint Scores

8.1 约束评分

[pLI, LOEUF, missense Z - with interpretation]

[pLI、LOEUF、错义Z值及解读]

8.2 Disease-Associated Variants

8.2 疾病相关变异

[ClinVar pathogenic variants]

[ClinVar致病性变异]

8.3 Population Variants

8.3 人群变异

[gnomAD notable variants]

[gnomAD显著变异]

8.4 GWAS Associations

8.4 GWAS关联

[Any GWAS hits for the locus]

[该位点的GWAS结果]

9. Disease Links

9. 疾病关联

[MANDATORY - include evidence strength]

[必填 - 包含证据强度]

9.1 Strong Evidence (Genetic + Functional)

9.1 强证据（遗传 + 功能）

[Diseases with causal evidence]

[有因果证据的疾病]

9.2 Moderate Evidence (Association + Mechanism)

9.2 中等证据（关联 + 机制）

[Diseases with supporting evidence]

[有支持性证据的疾病]

9.3 Weak Evidence (Association Only)

9.3 弱证据（仅关联）

[Diseases with correlation/association only]

[仅存在相关性/关联的疾病]

9.4 Evidence Summary Table

9.4 证据总结表格

Disease	Evidence Type	Score	Key Papers	Grade
[Disease 1]	Genetic + Functional	0.85	PMID:xxx	★★★
[Disease 2]	GWAS + Expression	0.45	PMID:yyy	★★☆

疾病	证据类型	得分	关键文献	分级
[疾病1]	遗传 + 功能	0.85	PMID:xxx	★★★
[疾病2]	GWAS + 表达	0.45	PMID:yyy	★★☆

10. Pathogen Involvement

10. 病原体参与情况

[MANDATORY - state "None identified" if not applicable]

[必填 - 如不适用请注明“未发现相关参与”]

10.1 Viral Interactions

10.1 病毒相互作用

[Any viral exploitation or targeting]

[任何病毒利用或靶向情况]

10.2 Bacterial Interactions

10.2 细菌相互作用

[Any bacterial relevance]

[任何细菌相关情况]

10.3 Host Defense Role

10.3 宿主防御作用

[Role in immune response if any]

[如有免疫应答作用]

11. Key Assays & Readouts

11. 关键实验与读数

[MANDATORY]

[必填]

11.1 Biochemical Assays

11.1 生化实验

[Available assays for target activity]

[用于靶点活性的可用实验]

11.2 Cellular Readouts

11.2 细胞读数

[Cell-based assays and phenotypes]

[基于细胞的实验和表型]

11.3 In Vivo Models

11.3 体内模型

[Animal models and endpoints]

[动物模型及终点指标]

12. Research Themes

12. 研究主题

[MANDATORY - structured theme extraction]

[必填 - 结构化主题提取]

12.1 [Theme 1 Name] (N papers)

12.1 [主题1名称]（N篇文献）

Evidence Quality: [Strong/Moderate/Limited] Representative Papers: [≥3 papers or state "insufficient"]

[Theme description with evidence-graded citations]

证据质量：[强/中等/有限] 代表性文献：[≥3篇或注明“证据不足”]

[带证据分级引用的主题描述]

12.2 [Theme 2 Name] (N papers)

12.2 [主题2名称]（N篇文献）

[Same structure]

[Continue for all themes - require ≥3 representative papers per theme, or state "limited evidence"]

[相同结构]

[继续所有主题 - 每个主题需≥3篇代表性文献，或注明“证据有限”]

13. Open Questions & Research Gaps

13. 开放问题与研究空白

[MANDATORY]

[必填]

13.1 Mechanistic Unknowns

13.1 机制未知点

[What we don't understand about the target]

[关于靶点的未知内容]

13.2 Therapeutic Unknowns

13.2 治疗未知点

[What we don't know for drug development]

[药物开发中的未知内容]

13.3 Suggested Priority Questions

13.3 建议优先研究问题

[Ranked list of important unanswered questions]

[按重要性排序的未解决问题列表]

14. Biological Model & Testable Hypotheses

14. 生物模型与可测试假设

[MANDATORY - synthesis section]

[必填 - 综合章节]

14.1 Integrated Biological Model

14.1 整合生物模型

[3-5 paragraph synthesis integrating all evidence into coherent model]

[3-5段综合所有证据的连贯模型]

14.2 Testable Hypotheses

14.2 可测试假设

#	Hypothesis	Perturbation	Readout	Expected Result	Priority
1	[Hypothesis]	[Experiment]	[Measure]	[Prediction]	HIGH
2	[Hypothesis]	[Experiment]	[Measure]	[Prediction]	HIGH
3	[Hypothesis]	[Experiment]	[Measure]	[Prediction]	MEDIUM

#	假设	扰动方式	读数	预期结果	优先级
1	[假设]	[实验]	[测量指标]	[预测结果]	高
2	[假设]	[实验]	[测量指标]	[预测结果]	高
3	[假设]	[实验]	[测量指标]	[预测结果]	中

14.3 Suggested Experiments

14.3 建议实验

[Brief description of key experiments to test hypotheses]

[用于验证假设的关键实验简要描述]

15. Conclusions & Recommendations

15. 结论与建议

[MANDATORY]

[必填]

15.1 Key Takeaways

15.1 核心发现

[Bullet points of most important findings]

[最重要发现的要点列表]

15.2 Confidence Assessment

15.2 置信度评估

[Overall confidence in the findings: High/Medium/Low with justification]

[对发现的整体置信度：高/中/低及理由]

15.3 Recommended Next Steps

15.3 推荐下一步行动

[Prioritized action items]

[按优先级排序的行动项]

References

参考文献

[Summary reference list in report - full bibliography in separate file]

[报告中的参考文献摘要 - 完整参考文献列表在单独文件中]

Key Papers (Must-Read)

关键文献（必读）

[Citation with PMID] - [Why important] [Grade: ★★★]
...

[带PMID的引用] - [重要性] [分级：★★★]
...

By Theme

按主题分类

[Organized reference lists]

[按主题组织的参考文献列表]

Data Limitations

数据局限性

[Any databases that failed or returned no data]
[Any known gaps in coverage]
[OA status method used]

Full methodology available in methods_appendix.md upon request.

---

[任何调用失败或无返回数据的数据库]
[任何已知的覆盖空白]
[使用的开放获取状态检测方法]

完整方法可应要求提供于methods_appendix.md中。

---

Bibliography File Format

参考文献文件格式

File:

[topic]_bibliography.json

json

{
  "metadata": {
    "generated": "2026-02-04",
    "query": "ATP6V1A",
    "total_papers": 342,
    "unique_after_dedup": 287
  },
  "papers": [
    {
      "pmid": "12345678",
      "doi": "10.1038/xxx",
      "title": "Paper Title",
      "authors": ["Smith A", "Jones B"],
      "year": 2024,
      "journal": "Nature",
      "source_databases": ["PubMed", "OpenAlex"],
      "evidence_tier": "T1",
      "themes": ["lysosomal_acidification", "autophagy"],
      "oa_status": "gold",
      "oa_url": "https://...",
      "citation_count": 45,
      "in_core_set": true
    }
  ]
}

Also generate

[topic]_bibliography.csv

with same data in tabular format.

文件：

[topic]_bibliography.json

json

{
  "metadata": {
    "generated": "2026-02-04",
    "query": "ATP6V1A",
    "total_papers": 342,
    "unique_after_dedup": 287
  },
  "papers": [
    {
      "pmid": "12345678",
      "doi": "10.1038/xxx",
      "title": "Paper Title",
      "authors": ["Smith A", "Jones B"],
      "year": 2024,
      "journal": "Nature",
      "source_databases": ["PubMed", "OpenAlex"],
      "evidence_tier": "T1",
      "themes": ["lysosomal_acidification", "autophagy"],
      "oa_status": "gold",
      "oa_url": "https://...",
      "citation_count": 45,
      "in_core_set": true
    }
  ]
}

同时生成

[topic]_bibliography.csv

，包含相同数据的表格格式。

Theme Extraction Protocol

主题提取协议

Standardized Theme Clustering

标准化主题聚类

Extract keywords from titles and abstracts
Cluster into themes using semantic similarity
Require minimum N papers per theme (default N=3)
Label themes with standardized names

提取关键词：从标题和摘要中提取关键词
聚类为主题：使用语义相似度进行聚类
主题最小文献数要求：每个主题至少N篇文献（默认N=3）
主题命名：使用标准化名称标记主题

Standard Theme Categories (adapt to target)

标准主题类别（根据靶点调整）

For V-ATPase target example:

```
lysosomal_acidification
```
- Core function
```
autophagy_regulation
```
- mTORC1 signaling
```
bone_resorption
```
- Osteoclast function
```
cancer_metabolism
```
- Tumor acidification
```
viral_infection
```
- Viral entry mechanism
```
neurodegenerative
```
- Neuronal dysfunction
```
kidney_function
```
- Renal acid-base
```
methodology
```
- Assays/tools papers

以V-ATPase靶点为例：

```
lysosomal_acidification
```
- 核心功能
```
autophagy_regulation
```
- mTORC1信号通路
```
bone_resorption
```
- 破骨细胞功能
```
cancer_metabolism
```
- 肿瘤酸化
```
viral_infection
```
- 病毒进入机制
```
neurodegenerative
```
- 神经元功能障碍
```
kidney_function
```
- 肾脏酸碱平衡
```
methodology
```
- 实验/工具类文献

Theme Quality Requirements

主题质量要求

Papers	Theme Status
≥10	Major theme (full section)
3-9	Minor theme (subsection)
<3	Insufficient (note in "limited evidence" or merge)

文献数	主题状态
≥10	主要主题（完整章节）
3-9	次要主题（子章节）
<3	证据不足（在“证据有限”中注明或合并）

Completeness Checklist (Verify Before Delivery)

完整性清单（交付前验证）

ALL boxes must be checked or explicitly marked "N/A" or "Limited evidence"

所有选项必须勾选，或明确标记为“不适用”或“证据有限”

Identity & Context

身份与背景

Mechanism & Function

机制与功能

Disease & Clinical

疾病与临床

Human genetic variants documented
Constraint scores with interpretation
Disease links with evidence strength grades
Pathogen involvement (or "none identified")

已记录人类遗传变异
已包含约束评分及解读
疾病关联带证据强度分级
已记录病原体参与情况（或“未发现相关参与”）

Synthesis

综合内容

Technical

技术要求

All claims have source attribution
Evidence grades applied throughout
Bibliography file generated
Data limitations documented

所有主张均有来源归因
全程应用证据分级
已生成参考文献文件
已记录数据局限性

Quick Reference: Tool Categories

工具分类速查

Literature Tools

文献工具

PubMed_search_articles

PMC_search_papers

EuropePMC_search_articles

openalex_literature_search

Crossref_search_works

SemanticScholar_search_papers

BioRxiv_search_preprints

MedRxiv_search_preprints

PubMed_search_articles

PMC_search_papers

EuropePMC_search_articles

openalex_literature_search

Crossref_search_works

SemanticScholar_search_papers

BioRxiv_search_preprints

MedRxiv_search_preprints

Citation Tools

引文工具

PubMed_get_cited_by

PubMed_get_related

EuropePMC_get_citations

EuropePMC_get_references

PubMed_get_cited_by

PubMed_get_related

EuropePMC_get_citations

EuropePMC_get_references

Protein/Gene Annotation Tools

蛋白/基因注释工具

UniProt_get_entry_by_accession

UniProt_search

UniProt_id_mapping

InterPro_get_protein_domains

proteins_api_get_protein

UniProt_get_entry_by_accession

UniProt_search

UniProt_id_mapping

InterPro_get_protein_domains

proteins_api_get_protein

Expression Tools

表达工具

GTEx_get_median_gene_expression

GTEx_get_gene_expression

HPA_get_rna_expression_by_source

HPA_get_comprehensive_gene_details_by_ensembl_id

HPA_get_subcellular_location

GTEx_get_median_gene_expression

GTEx_get_gene_expression

HPA_get_rna_expression_by_source

HPA_get_comprehensive_gene_details_by_ensembl_id

HPA_get_subcellular_location

Variant/Disease Tools

变异/疾病工具

gnomad_get_gene_constraints

gnomad_get_gene

clinvar_search_variants

OpenTargets_get_diseases_phenotypes_by_target_ensembl

gnomad_get_gene_constraints

gnomad_get_gene

clinvar_search_variants

OpenTargets_get_diseases_phenotypes_by_target_ensembl

Pathway Tools

通路工具

GO_get_annotations_for_gene

Reactome_map_uniprot_to_pathways

kegg_get_gene_info

OpenTargets_get_target_gene_ontology_by_ensemblID

GO_get_annotations_for_gene

Reactome_map_uniprot_to_pathways

kegg_get_gene_info

OpenTargets_get_target_gene_ontology_by_ensemblID

Interaction Tools

相互作用工具

STRING_get_protein_interactions

intact_get_interactions

OpenTargets_get_target_interactions_by_ensemblID

STRING_get_protein_interactions

intact_get_interactions

OpenTargets_get_target_interactions_by_ensemblID

OA Tools

开放获取工具

Unpaywall_check_oa_status

(if email provided), or use OA flags from Europe PMC/OpenAlex

Unpaywall_check_oa_status

（如提供邮箱），或使用Europe PMC/OpenAlex的开放获取标记

Communication with User

与用户的沟通

During research (brief updates):

"Resolving target identifiers and gathering baseline profile..."
"Building core paper set with high-precision queries..."
"Expanding via citation network..."
"Clustering into themes and grading evidence..."

When the question looks like a factoid:

Ask (once) if the user wants just the verified answer or a full deep-research report.
If the user doesn’t specify, default to Factoid / Verification Mode and keep it short + source-backed.

DO NOT expose:

Raw tool outputs
Deduplication counts
Search round details
Database-by-database results

The report is the deliverable. Methodology stays internal.

研究过程中（简短更新）:

“正在解析靶点标识符并收集基础概况...”
“正在使用高精度查询构建核心文献集...”
“正在通过引文网络扩展文献...”
“正在聚类主题并进行证据分级...”

当问题看起来是事实类时:

询问（一次）用户是想要仅经验证的答案还是完整深度研究报告。
如果用户未指定，默认使用事实类/验证模式，保持内容简短并附带来源。

禁止暴露:

原始工具输出
去重计数
搜索轮次细节
各数据库的单独结果

报告是交付物，方法为内部内容

Summary

总结

This skill produces comprehensive, evidence-graded research reports that:

Start with disambiguation to prevent naming collisions and missing details
Use annotation tools to fill gaps when literature is sparse
Grade all evidence to separate signal from noise
Require completeness even if stating "limited evidence"
Synthesize into biological models with testable hypotheses
Separate narrative from bibliography for scalability
Keep methodology internal unless explicitly requested

The result is a detailed, actionable research report that reads like an expert synthesis, not a search log.

该技能可生成全面、经证据分级的研究报告，具备以下特点：

从消歧起步：避免命名冲突和遗漏细节
使用注释工具：在文献稀少时填补空白
所有证据分级：区分有效信息与干扰信息
强制完整性：即使注明“证据有限”也要确保完整
合成生物模型：附带可测试假设
叙事与参考文献分离：提升可扩展性
方法内部化：除非明确要求，否则不对外展示方法

最终产出的详细、可执行研究报告，读起来像是专家的综合内容，而非搜索日志。