tooluniverse-literature-deep-research
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseLiterature Deep Research Strategy (Enhanced)
增强型文献深度研究策略
A systematic approach to comprehensive literature research that starts with target disambiguation to prevent missing details, uses evidence grading to separate signal from noise, and produces a content-focused report with mandatory completeness sections.
KEY PRINCIPLES:
- Target disambiguation FIRST - Resolve IDs, synonyms, naming collisions before literature search
- Right-size the deliverable - Use Factoid / Verification Mode for single, answerable questions; use full report mode for “deep research”
- Report-first output - Default deliverable is a report file; an inline answer is allowed (and recommended) for Factoid / Verification Mode
- Evidence grading - Grade every claim by evidence strength (mechanistic paper vs screen hit vs review vs text-mined)
- Mandatory completeness - All checklist sections must exist, even if "unknown/limited evidence"
- Source attribution - Every piece of information traceable to database/tool
- English-first queries - Always use English terms for literature searches and tool calls, even if the user writes in another language. Only try original-language terms as a fallback if English returns no results. Respond in the user's language
这是一套系统化的全面文献研究方法,从目标消歧起步以避免遗漏细节,通过证据分级区分有效信息与干扰信息,最终产出聚焦内容的报告,且包含强制完整性章节。
核心原则:
- 先做目标消歧 - 在文献搜索前解析ID、同义词、命名冲突
- 匹配交付物规模 - 针对单一可回答问题使用事实类/验证模式;针对“深度研究”需求使用完整报告模式
- 以报告为核心输出 - 默认交付物为报告文件;事实类/验证模式下允许(且推荐)提供内嵌式答案
- 证据分级 - 依据证据强度为每个主张分级(机制研究论文 vs 筛选结果 vs 综述 vs 文本挖掘内容)
- 强制完整性 - 所有清单章节必须存在,即使内容为“未知/证据有限”
- 来源归因 - 所有信息均可追溯至数据库/工具
- 优先使用英文查询 - 文献搜索和工具调用始终使用英文术语,即便用户使用其他语言提问。仅当英文查询无结果时,才尝试使用原语言术语作为备选。最终以用户使用的语言回复
Workflow Overview
工作流概述
User Query
↓
Phase 0: CLARIFY + MODE SELECT (factoid vs deep report)
↓
Phase 1: TARGET DISAMBIGUATION + PROFILE (default ON for biological targets)
├─ Resolve official IDs (Ensembl, UniProt, HGNC)
├─ Gather synonyms/aliases + known naming collisions
├─ Get protein length, isoforms, domain architecture
├─ Get subcellular location, expression, GO terms, pathways
└─ Output: Target Profile section + Collision-aware search plan
↓
Phase 2: LITERATURE SEARCH (internal methodology, not shown)
├─ High-precision seed queries (build mechanistic core)
├─ Citation network expansion from seeds
├─ Collision-filtered broader queries
└─ Theme clustering + evidence grading
↓
Phase 3: REPORT SYNTHESIS
├─ Progressive writing to [topic]_report.md
├─ Mandatory completeness checklist validation
└─ Biological model + testable hypotheses
↓
Optional: methods_appendix.md (only if user requests)用户查询
↓
阶段0:澄清需求 + 模式选择(事实类 vs 深度报告)
↓
阶段1:目标消歧 + 概况分析(生物靶点默认开启)
├─ 解析官方ID(Ensembl、UniProt、HGNC)
├─ 收集同义词/别名 + 已知命名冲突
├─ 获取蛋白长度、异构体、结构域架构
├─ 获取亚细胞定位、表达情况、GO术语、通路信息
└─ 输出:靶点概况章节 + 冲突感知搜索方案
↓
阶段2:文献搜索(内部方法,不对外展示)
├─ 高精度种子查询(构建机制研究核心)
├─ 基于种子文献的引文网络扩展
├─ 冲突过滤后的宽泛查询
└─ 主题聚类 + 证据分级
↓
阶段3:报告合成
├─ 逐步撰写至[topic]_report.md
├─ 强制完整性清单验证
└─ 生物模型 + 可测试假设
↓
可选:methods_appendix.md(仅在用户要求时提供)Phase 0: Initial Clarification
阶段0:初始澄清
Mandatory Questions
必问问题
- Target type: Is this a biological target (gene/protein), a general topic, or a disease?
- Scope: Is this a single factoid to verify (“Which antibiotic?”, “Which strain?”, “Which year?”) or a comprehensive/deep review?
- Known aliases: Any specific gene symbols or protein names you use?
- Constraints: Open access only? Include preprints? Specific organisms?
- Methods appendix: Do you want methodology details in a separate file?
- 目标类型:这是生物靶点(基因/蛋白)、通用主题还是疾病?
- 范围:这是需验证的单一事实(如“哪种抗生素?”、“哪个菌株?”、“哪一年?”)还是全面/深度综述?
- 已知别名:你使用的特定基因符号或蛋白名称是什么?
- 约束条件:仅开放获取文献?是否包含预印本?特定物种?
- 方法附录:是否需要将方法细节单独放在一个文件中?
Mode Selection (CRITICAL)
模式选择(关键)
Pick exactly one mode based on the user’s intent and the question structure:
- Factoid / Verification Mode (single concrete question; answer should be a short phrase/sentence)
- Mini-review Mode (narrow topic; 1–3 pages of synthesis)
- Full Deep-Research Mode (use the full template + completeness checklist)
Heuristic:
- If the user asks “X has been evolved to be resistant to which antibiotic?” → Factoid / Verification Mode
- If the user asks “What does the literature say about X?” → Full Deep-Research Mode
根据用户意图和问题结构,精确选择一种模式:
- 事实类/验证模式(单一具体问题;答案应为短语/短句)
- 迷你综述模式(窄主题;1-3页合成内容)
- 完整深度研究模式(使用完整模板 + 完整性清单)
判断规则:
- 如果用户问“X进化出了对哪种抗生素的抗性?” → 事实类/验证模式
- 如果用户问“文献中关于X的内容有哪些?” → 完整深度研究模式
Factoid / Verification Mode (Fast Path)
事实类/验证模式(快速路径)
Goal: Provide a correct, source-verified single answer, with minimal but explicit evidence attribution.
Deliverables (still file-backed):
- (≤ 1 page)
[topic]_factcheck_report.md - (+ CSV) containing the key paper(s)
[topic]_bibliography.json
Fact-check report template:
markdown
undefined目标:提供经来源验证的正确单一答案,附带最少但明确的证据归因。
交付物(仍以文件形式提供):
- (≤1页)
[topic]_factcheck_report.md - (+ CSV格式),包含关键文献
[topic]_bibliography.json
事实核查报告模板:
markdown
undefined[TOPIC]: Fact-check Report
[主题]:事实核查报告
Generated: [Date]
Evidence cutoff: [Date]
生成日期:[日期]
证据截止日期:[日期]
Question
问题
[User question]
[用户的问题]
Answer
答案
[One-sentence answer] [Evidence: ★★★/★★☆/★☆☆/☆☆☆]
[一句话答案] [证据:★★★/★★☆/★☆☆/☆☆☆]
Source(s)
来源
- [Primary paper citation: journal/year/PMID/DOI as available]
- [主要文献引用:期刊/年份/PMID/DOI(如有)]
Verification Notes
验证说明
- [1–3 bullets: where in the paper the statement appears (Abstract/Results/Methods), and any key constraints]
- [1-3条要点:该陈述在文献的哪个部分出现(摘要/结果/方法),以及任何关键约束条件]
Limitations
局限性
- [If full text not available, or if only review evidence exists]
**Required verification behavior**:
- Prefer ToolUniverse literature tools (Europe PMC / PubMed / PMC / Semantic Scholar) over general web browsing.
- Use full-text snippet verification when possible (Europe PMC auto-snippet tier is ideal).
- Avoid adding extra claims (e.g., “not X”) unless the paper explicitly supports them.
**Suggested tool pattern**:
- `EuropePMC_search_articles(query=..., extract_terms_from_fulltext=[...])` to pull OA full-text snippets for the key terms.
- If OA snippets unavailable: fall back to `PMC_search_papers` (if in PMC) or `SemanticScholar_search_papers` → `SemanticScholar_get_pdf_snippets`.
**Evidence grading (factoid)**:
- If the statement is explicitly made in a primary experimental paper (Results/Methods/Abstract): label **T1 (★★★)**.
- If it’s only in a review: label **T4 (☆☆☆)** and try to locate the primary source.- [如果无法获取全文,或仅能找到综述类证据]
**必做验证行为**:
- 优先使用ToolUniverse文献工具(Europe PMC / PubMed / PMC / Semantic Scholar)而非通用网页浏览。
- 尽可能使用全文片段验证(Europe PMC的自动片段层为理想选择)。
- 除非文献明确支持,否则不要添加额外主张(如“不是X”)。
**推荐工具使用模式**:
- 使用`EuropePMC_search_articles(query=..., extract_terms_from_fulltext=[...])`从开放获取全文中提取关键术语的片段。
- 如果无法获取开放获取片段: fallback到`PMC_search_papers`(若文献在PMC中)或`SemanticScholar_search_papers` → `SemanticScholar_get_pdf_snippets`。
**事实类证据分级**:
- 如果陈述在原始实验论文(结果/方法/摘要)中明确提出:标记为**T1(★★★)**。
- 如果仅出现在综述中:标记为**T4(☆☆☆)**,并尝试查找原始来源。Detect Target Type
目标类型检测
| Query Pattern | Type | Action |
|---|---|---|
| Gene symbol (EGFR, TP53, ATP6V1A) | Biological target | Phase 1 required |
| Protein name ("V-ATPase", "kinase") | Biological target | Phase 1 required |
| UniProt ID (P00533, Q93050) | Biological target | Phase 1 required |
| Disease, pathway, method | General topic | Phase 1 optional |
| "Literature on X" | Depends on X | Assess X |
| 查询模式 | 类型 | 操作 |
|---|---|---|
| 基因符号(EGFR, TP53, ATP6V1A) | 生物靶点 | 必须执行阶段1 |
| 蛋白名称("V-ATPase", "kinase") | 生物靶点 | 必须执行阶段1 |
| UniProt ID(P00533, Q93050) | 生物靶点 | 必须执行阶段1 |
| 疾病、通路、方法 | 通用主题 | 阶段1可选 |
| "关于X的文献" | 取决于X | 评估X的类型 |
Phase 1: Target Disambiguation + Profile (Default ON)
阶段1:目标消歧 + 概况分析(默认开启)
CRITICAL: This phase prevents "missing target details" when literature is sparse or noisy.
关键:当文献稀少或存在干扰信息时,此阶段可避免“遗漏靶点细节”。
1.1 Resolve Official Identifiers
1.1 解析官方标识符
Use these tools to establish canonical identity:
UniProt_search → Get UniProt accession for human protein
UniProt_get_entry_by_accession → Full entry with cross-references
UniProt_id_mapping → Map between ID types
ensembl_lookup_gene → Ensembl gene ID, biotype
MyGene_get_gene_annotation → NCBI Gene ID, aliases, summaryOutput for report:
markdown
undefined使用以下工具确立标准身份:
UniProt_search → 获取人类蛋白的UniProt登录号
UniProt_get_entry_by_accession → 带交叉引用的完整条目
UniProt_id_mapping → 不同ID类型间的映射
ensembl_lookup_gene → Ensembl基因ID、生物类型
MyGene_get_gene_annotation → NCBI基因ID、别名、摘要报告输出内容:
markdown
undefinedTarget Identity
靶点身份
| Identifier | Value | Source |
|---|---|---|
| Official Symbol | ATP6V1A | HGNC |
| UniProt | P38606 | UniProt |
| Ensembl Gene | ENSG00000114573 | Ensembl |
| NCBI Gene ID | 523 | NCBI |
| ChEMBL Target | CHEMBL2364682 | ChEMBL |
Full Name: V-type proton ATPase catalytic subunit A
Synonyms/Aliases: ATP6A1, VPP2, Vma1, VA68
undefined| 标识符 | 取值 | 来源 |
|---|---|---|
| 官方符号 | ATP6V1A | HGNC |
| UniProt | P38606 | UniProt |
| Ensembl基因 | ENSG00000114573 | Ensembl |
| NCBI基因ID | 523 | NCBI |
| ChEMBL靶点 | CHEMBL2364682 | ChEMBL |
全名:V型质子ATP酶催化亚基A
同义词/别名:ATP6A1, VPP2, Vma1, VA68
undefined1.2 Identify Naming Collisions
1.2 识别命名冲突
CRITICAL: Many gene names have collisions. Examples:
- TRAG: T-cell regulatory gene vs bacterial TraG conjugation protein
- WDR7-7: Could match gene WDR7 vs lncRNA
- JAK: Janus kinase vs Just Another Kinase
- CAT: Catalase vs chloramphenicol acetyltransferase
Detection strategy:
- Search PubMed for - review first 20 titles
"[SYMBOL]"[Title] - If >20% off-topic, identify collision terms
- Build negative filter:
NOT [collision_term1] NOT [collision_term2]
Output for report:
markdown
undefined关键:许多基因名称存在冲突。示例:
- TRAG:T细胞调节基因 vs 细菌TraG结合蛋白
- WDR7-7:可能匹配基因WDR7或lncRNA
- JAK:Janus激酶 vs Just Another Kinase
- CAT:过氧化氢酶 vs 氯霉素乙酰转移酶
检测策略:
- 在PubMed中搜索- 查看前20篇标题
"[符号]"[Title] - 如果超过20%的内容偏离主题,识别冲突术语
- 构建负面过滤器:
NOT [冲突术语1] NOT [冲突术语2]
报告输出内容:
markdown
undefinedKnown Naming Collisions
已知命名冲突
- Symbol "ATP6V1A" is unambiguous (no major collisions detected)
- Related but distinct: ATP6V0A1-4 (V0 subunits vs V1 subunits)
- Search filter applied: Include "vacuolar" OR "V-ATPase", exclude "V0 domain" when V1-specific
undefined- 符号"ATP6V1A"无歧义(未检测到主要冲突)
- 相关但不同的靶点:ATP6V0A1-4(V0亚基 vs V1亚基)
- 应用的搜索过滤器:包含"vacuolar"或"V-ATPase",当聚焦V1亚基时排除"V0 domain"
undefined1.3 Protein Architecture & Domains
1.3 蛋白架构与结构域
Use annotation tools (not literature):
InterPro_get_protein_domains → Domain architecture
UniProt_get_ptm_processing_by_accession → PTMs, active sites
proteins_api_get_protein → Additional protein featuresOutput for report:
markdown
undefined使用注释工具(而非文献):
InterPro_get_protein_domains → 结构域架构
UniProt_get_ptm_processing_by_accession → 翻译后修饰、活性位点
proteins_api_get_protein → 额外蛋白特征报告输出内容:
markdown
undefinedProtein Architecture
蛋白架构
| Domain | Position | InterPro ID | Function |
|---|---|---|---|
| V-ATPase A subunit, N-terminal | 1-90 | IPR022879 | ATP binding |
| V-ATPase A subunit, catalytic | 91-490 | IPR005725 | Catalysis |
| V-ATPase A subunit, C-terminal | 491-617 | IPR022878 | Complex assembly |
Length: 617 aa | Isoforms: 2 (canonical P38606-1, variant P38606-2 missing aa 1-45)
Active sites: Lys-168 (ATP binding), Glu-261 (catalytic)
Sources: InterPro, UniProt
undefined| 结构域 | 位置 | InterPro ID | 功能 |
|---|---|---|---|
| V-ATPase A亚基,N端 | 1-90 | IPR022879 | ATP结合 |
| V-ATPase A亚基,催化区 | 91-490 | IPR005725 | 催化作用 |
| V-ATPase A亚基,C端 | 491-617 | IPR022878 | 复合物组装 |
长度:617个氨基酸 | 异构体:2种(标准型P38606-1,变异型P38606-2缺失1-45位氨基酸)
活性位点:Lys-168(ATP结合)、Glu-261(催化)
来源:InterPro、UniProt
undefined1.4 Subcellular Location
1.4 亚细胞定位
HPA_get_subcellular_location → Human Protein Atlas localization
UniProt_get_subcellular_location_by_accession → UniProt annotationOutput for report:
markdown
undefinedHPA_get_subcellular_location → 人类蛋白图谱定位信息
UniProt_get_subcellular_location_by_accession → UniProt注释报告输出内容:
markdown
undefinedSubcellular Localization
亚细胞定位
| Location | Confidence | Source |
|---|---|---|
| Lysosome membrane | High | HPA + UniProt |
| Endosome membrane | High | UniProt |
| Golgi apparatus | Medium | HPA |
| Plasma membrane (subset) | Low | Literature |
Primary location: Lysosomal/endosomal membranes (vacuolar ATPase complex)
Sources: Human Protein Atlas, UniProt
undefined| 位置 | 置信度 | 来源 |
|---|---|---|
| 溶酶体膜 | 高 | HPA + UniProt |
| 内体膜 | 高 | UniProt |
| 高尔基体 | 中 | HPA |
| 质膜(亚群) | 低 | 文献 |
主要定位:溶酶体/内体膜(液泡ATP酶复合物)
来源:人类蛋白图谱、UniProt
undefined1.5 Baseline Expression
1.5 基础表达情况
GTEx_get_median_gene_expression → Tissue expression (TPM)
HPA_get_rna_expression_by_source → HPA expression dataOutput for report:
markdown
undefinedGTEx_get_median_gene_expression → 组织表达(TPM)
HPA_get_rna_expression_by_source → HPA表达数据报告输出内容:
markdown
undefinedBaseline Tissue Expression
基础组织表达
| Tissue | Expression (TPM) | Specificity |
|---|---|---|
| Kidney cortex | 145.3 | Elevated |
| Liver | 98.7 | Medium |
| Brain - Cerebellum | 87.2 | Medium |
| Lung | 76.4 | Medium |
| Ubiquitous baseline | ~50 | Broad |
Tissue Specificity: Low (τ = 0.28) - broadly expressed housekeeping gene
Source: GTEx v8
undefined| 组织 | 表达量(TPM) | 特异性 |
|---|---|---|
| 肾皮质 | 145.3 | 高表达 |
| 肝脏 | 98.7 | 中等 |
| 脑 - 小脑 | 87.2 | 中等 |
| 肺 | 76.4 | 中等 |
| 普遍基础表达 | ~50 | 广泛 |
组织特异性:低(τ = 0.28) - 广泛表达的管家基因
来源:GTEx v8
undefined1.6 GO Terms & Pathway Placement
1.6 GO术语与通路定位
GO_get_annotations_for_gene → GO annotations
Reactome_map_uniprot_to_pathways → Reactome pathways
kegg_get_gene_info → KEGG pathways
OpenTargets_get_target_gene_ontology_by_ensemblID → Open Targets GOOutput for report:
markdown
undefinedGO_get_annotations_for_gene → GO注释
Reactome_map_uniprot_to_pathways → Reactome通路
kegg_get_gene_info → KEGG通路
OpenTargets_get_target_gene_ontology_by_ensemblID → Open Targets GO注释报告输出内容:
markdown
undefinedFunctional Annotations (GO)
功能注释(GO)
Molecular Function:
- ATP hydrolysis activity (GO:0016887) [Evidence: IDA]
- Proton-transporting ATPase activity (GO:0046961) [Evidence: IDA]
Biological Process:
- Lysosomal acidification (GO:0007041) [Evidence: IMP]
- Autophagy (GO:0006914) [Evidence: IMP]
- Bone resorption (GO:0045453) [Evidence: IMP]
Cellular Component:
- Vacuolar proton-transporting V-type ATPase, V1 domain (GO:0000221) [Evidence: IDA]
分子功能:
- ATP水解活性(GO:0016887)[证据:IDA]
- 质子转运ATP酶活性(GO:0046961)[证据:IDA]
生物过程:
- 溶酶体酸化(GO:0007041)[证据:IMP]
- 自噬(GO:0006914)[证据:IMP]
- 骨吸收(GO:0045453)[证据:IMP]
细胞组分:
- 液泡质子转运V型ATP酶V1结构域(GO:0000221)[证据:IDA]
Pathway Involvement
通路参与情况
| Pathway | Database | Significance |
|---|---|---|
| Lysosome | KEGG hsa04142 | Core component |
| Phagosome | KEGG hsa04145 | Acidification |
| Autophagy - animal | Reactome R-HSA-9612973 | mTORC1 regulation |
Sources: GO Consortium, Reactome, KEGG
---| 通路 | 数据库 | 重要性 |
|---|---|---|
| 溶酶体 | KEGG hsa04142 | 核心组分 |
| 吞噬体 | KEGG hsa04145 | 酸化作用 |
| 自噬 - 动物 | Reactome R-HSA-9612973 | mTORC1调控 |
来源:GO联盟、Reactome、KEGG
---Phase 2: Literature Search (Internal Methodology)
阶段2:文献搜索(内部方法)
NOTE: This methodology is kept internal. The report shows findings, not process.
注意:此方法为内部内容,报告中仅展示研究结果,不展示过程。
2.1 Query Strategy: Collision-Aware Synonym Plan
2.1 查询策略:冲突感知同义词方案
Step 1: High-Precision Seed Queries (Build Mechanistic Core)
步骤1:高精度种子查询(构建机制研究核心)
Query 1: "[GENE_SYMBOL]"[Title] AND (mechanism OR function OR structure)
Query 2: "[FULL_PROTEIN_NAME]"[Title]
Query 3: "[UNIPROT_ID]" (catches supplementary materials)Purpose: Get 15-30 high-confidence, mechanistic papers that are definitely on-target.
查询1:"[基因符号]"[Title] AND (mechanism OR function OR structure)
查询2:"[完整蛋白名称]"[Title]
查询3:"[UniProt ID]"(可捕获补充材料)目的:获取15-30篇高置信度、明确针对目标的机制研究论文。
Step 2: Citation Network Expansion (Especially for Sparse Targets)
步骤2:引文网络扩展(尤其适用于文献稀少的靶点)
Once you have 5-15 core PMIDs:
PubMed_get_cited_by → Papers citing each seed
PubMed_get_related → Computationally related papers
EuropePMC_get_citations → Alternative citation source
EuropePMC_get_references → Backward citations from seedsCitation-network first option: For older targets with deprecated terminology, citation expansion often outperforms keyword searching.
一旦获得5-15篇核心PMID:
PubMed_get_cited_by → 引用每篇种子文献的论文
PubMed_get_related → 计算相关的论文
EuropePMC_get_citations → 备选引文来源
EuropePMC_get_references → 种子文献的参考文献优先使用引文网络的情况:对于使用过时术语的老靶点,引文扩展的效果通常优于关键词搜索。
Step 3: Collision-Filtered Broader Queries
步骤3:冲突过滤后的宽泛查询
Broader query: "[GENE_SYMBOL]" AND ([pathway1] OR [pathway2] OR [function])
Apply collision filter: NOT [collision_term1] NOT [collision_term2]Example for bacterial TraG collision:
"TRAG" AND (T-cell OR immune OR cancer) NOT plasmid NOT conjugation NOT bacterial宽泛查询:"[基因符号]" AND ([通路1] OR [通路2] OR [功能])
应用冲突过滤器:NOT [冲突术语1] NOT [冲突术语2]针对细菌TraG冲突的示例:
"TRAG" AND (T-cell OR immune OR cancer) NOT plasmid NOT conjugation NOT bacterial2.2 Database Tools
2.2 数据库工具
Literature Search (use all relevant):
- - Primary biomedical
PubMed_search_articles - - Full-text
PMC_search_papers - - European coverage
EuropePMC_search_articles - - Broad academic
openalex_literature_search - - DOI registry
Crossref_search_works - - AI-ranked
SemanticScholar_search_papers - /
BioRxiv_search_preprints- PreprintsMedRxiv_search_preprints
Citation Tools (with failure handling):
- - Primary (NCBI elink can be flaky)
PubMed_get_cited_by - - Fallback when PubMed fails
EuropePMC_get_citations - - Related articles
PubMed_get_related - - Reference lists
EuropePMC_get_references
Annotation Tools (not literature, but fill gaps):
- tools - Protein data
UniProt_* - - Domains
InterPro_get_protein_domains - tools - Expression
GTEx_* - tools - Human Protein Atlas
HPA_* - tools - Target-disease associations
OpenTargets_* - - GO terms
GO_get_annotations_for_gene
文献搜索(使用所有相关工具):
- - 主要生物医学文献
PubMed_search_articles - - 全文文献
PMC_search_papers - - 欧洲地区覆盖
EuropePMC_search_articles - - 广泛学术文献
openalex_literature_search - - DOI注册库
Crossref_search_works - - AI排序
SemanticScholar_search_papers - /
BioRxiv_search_preprints- 预印本MedRxiv_search_preprints
引文工具(含故障处理):
- - 主要工具(NCBI elink可能不稳定)
PubMed_get_cited_by - - 当PubMed失败时的备选工具
EuropePMC_get_citations - - 相关文章
PubMed_get_related - - 参考文献列表
EuropePMC_get_references
注释工具(非文献工具,用于填补信息空白):
- 系列工具 - 蛋白数据
UniProt_* - - 结构域信息
InterPro_get_protein_domains - 系列工具 - 表达数据
GTEx_* - 系列工具 - 人类蛋白图谱
HPA_* - 系列工具 - 靶点-疾病关联
OpenTargets_* - - GO术语
GO_get_annotations_for_gene
2.3 Full-Text Verification Strategy
2.3 全文验证策略
WHEN TO USE: Abstracts lack critical experimental details (exact drugs, cell lines, concentrations, specific protocols).
Three-Tier Strategy:
使用场景:摘要缺少关键实验细节(如具体药物、细胞系、浓度、特定方案)。
三级策略:
Tier 1: Auto-Snippet Mode (Europe PMC) - FASTEST
层级1:自动片段模式(Europe PMC)- 最快
Use for: Exploratory queries with 3-5 specific terms
python
results = EuropePMC_search_articles(
query="bacterial antibiotic resistance evolution",
limit=10,
extract_terms_from_fulltext=["ciprofloxacin", "meropenem", "A. baumannii", "MIC"]
)适用场景:包含3-5个特定术语的探索性查询
python
results = EuropePMC_search_articles(
query="bacterial antibiotic resistance evolution",
limit=10,
extract_terms_from_fulltext=["ciprofloxacin", "meropenem", "A. baumannii", "MIC"]
)Check which articles have full-text snippets
检查哪些文章有全文片段
for article in results:
if "fulltext_snippets" in article:
# Snippets automatically extracted from OA full text
for snippet in article["fulltext_snippets"]:
# Use snippet["term"] and snippet["snippet"] for verification
pass
**Advantages**:
- ✅ Single tool call (search + snippets)
- ✅ Bounded latency (max 3 OA articles, ~3-5 seconds total)
- ✅ No manual URL extraction
- ✅ Max 5 search terms
**Limitations**:
- ❌ Only works for OA articles with fullTextXML
- ❌ Limited to first 3 OA articles
- ❌ Europe PMC coverage only (~30-40% OA)
**When to use**: Initial exploration, quick verification of 1-2 papersfor article in results:
if "fulltext_snippets" in article:
# 片段自动从开放获取全文中提取
for snippet in article["fulltext_snippets"]:
# 使用snippet["term"]和snippet["snippet"]进行验证
pass
**优势**:
- ✅ 单次工具调用(搜索 + 片段提取)
- ✅ 延迟可控(最多3篇开放获取文章,总耗时约3-5秒)
- ✅ 无需手动提取URL
- ✅ 最多支持5个搜索术语
**局限性**:
- ❌ 仅适用于提供fullTextXML的开放获取文章
- ❌ 仅支持前3篇开放获取文章
- ❌ 仅覆盖Europe PMC(约30-40%的开放获取文献)
**使用时机**:初步探索、快速验证1-2篇文献Tier 2: Manual Two-Step (Semantic Scholar, ArXiv) - TARGETED
层级2:手动两步法(Semantic Scholar、ArXiv)- 精准
Use for: Specific high-value papers you identified from search
python
undefined适用场景:针对搜索到的特定高价值文献
python
undefinedStep 1: Search
步骤1:搜索
papers = SemanticScholar_search_papers(
query="machine learning interpretability",
limit=10
)
papers = SemanticScholar_search_papers(
query="machine learning interpretability",
limit=10
)
Step 2: Extract from specific OA papers
步骤2:从特定开放获取文献中提取片段
for paper in papers:
if paper.get("open_access_pdf_url"):
snippets = SemanticScholar_get_pdf_snippets(
open_access_pdf_url=paper["open_access_pdf_url"],
terms=["SHAP", "gradient attribution", "layer-wise relevance"],
window_chars=300
)
if snippets["status"] == "success":
# Process snippets["snippets"]
pass
**ArXiv variant** (100% OA, no paywall):
```pythonfor paper in papers:
if paper.get("open_access_pdf_url"):
snippets = SemanticScholar_get_pdf_snippets(
open_access_pdf_url=paper["open_access_pdf_url"],
terms=["SHAP", "gradient attribution", "layer-wise relevance"],
window_chars=300
)
if snippets["status"] == "success":
# 处理snippets["snippets"]
pass
**ArXiv变体**(100%开放获取,无付费墙):
```pythonAll arXiv papers are freely available
所有ArXiv文献均可免费获取
snippets = ArXiv_get_pdf_snippets(
arxiv_id="2301.12345",
terms=["attention mechanism", "self-attention", "layer normalization"],
max_snippets_per_term=5
)
**Advantages**:
- ✅ Full control over which papers to process
- ✅ Adjustable window size (20-2000 chars)
- ✅ Works for Semantic Scholar (~15-20% OA PDFs) and ArXiv (100%)
- ✅ Can process any number of papers
**Limitations**:
- ❌ Two tool calls per article (search → extract)
- ❌ Manual loop needed
- ❌ Slower than auto-snippet mode
**When to use**: Thorough review of key papers, preprint analysissnippets = ArXiv_get_pdf_snippets(
arxiv_id="2301.12345",
terms=["attention mechanism", "self-attention", "layer normalization"],
max_snippets_per_term=5
)
**优势**:
- ✅ 完全控制要处理的文献
- ✅ 可调整窗口大小(20-2000字符)
- ✅ 支持Semantic Scholar(约15-20%的开放获取PDF)和ArXiv(100%)
- ✅ 可处理任意数量的文献
**局限性**:
- ❌ 每篇文献需两次工具调用(搜索 → 提取)
- ❌ 需要手动循环处理
- ❌ 比自动片段模式慢
**使用时机**:对关键文献进行深入综述、预印本分析Tier 3: Manual Download + Parse (Fallback) - SLOWEST
层级3:手动下载 + 解析(备选)- 最慢
Use for: Paywalled content via institutional access
python
undefined适用场景:通过机构访问获取付费内容
python
undefinedFor paywalled PDFs accessible via institution
针对可通过机构访问的付费PDF
webpage_text = get_webpage_text_from_url(
url="https://doi.org/10.1016/...",
# Requires institutional proxy or VPN
)
webpage_text = get_webpage_text_from_url(
url="https://doi.org/10.1016/...",
# 需要机构代理或VPN
)
Extract relevant sections manually
手动提取相关章节
if "Methods" in webpage_text:
# Parse methods section
pass
**Limitations**:
- ❌ Requires institutional access
- ❌ No snippet extraction (full HTML)
- ❌ Quality varies by publisher
- ❌ Slowest approach
**When to use**: Last resort for critical paywalled papersif "Methods" in webpage_text:
# 解析方法章节
pass
**局限性**:
- ❌ 需要机构访问权限
- ❌ 无片段提取(仅完整HTML)
- ❌ 质量因出版商而异
- ❌ 最慢的方法
**使用时机**:关键付费文献的最后手段Decision Matrix
决策矩阵
| Scenario | Recommended Tier | Rationale |
|---|---|---|
| Quick verification ("Which antibiotic?") | Tier 1 (Auto-snippet) | Fast, single call |
| Preprint deep-dive (arXiv, bioRxiv) | Tier 2 (Manual ArXiv) | 100% coverage, no paywall |
| High-value paper deep analysis | Tier 2 (Manual S2) | Precise control |
| Systematic review (50+ papers) | Tier 1 + Tier 2 | Auto for OA, manual for key papers |
| Paywalled critical paper | Tier 3 (Manual download) | Only option |
| 场景 | 推荐层级 | 理由 |
|---|---|---|
| 快速验证(“哪种抗生素?”) | 层级1(自动片段) | 快速,单次调用 |
| 预印本深入分析(arXiv、bioRxiv) | 层级2(手动ArXiv) | 100%覆盖,无付费墙 |
| 高价值文献深度分析 | 层级2(手动S2) | 精准控制 |
| 系统综述(50+篇文献) | 层级1 + 层级2 | 自动处理开放获取文献,手动处理关键文献 |
| 关键付费文献 | 层级3(手动下载) | 唯一选项 |
Best Practices
最佳实践
1. Limit search terms to 3-5 specific keywords:
- ✅ Good:
["ciprofloxacin 5 μg/mL", "HEK293 cells", "RNA-seq"] - ❌ Bad: (too broad)
["drug", "method", "significant"]
2. Check OA status before extraction:
python
if article.get("open_access") and article.get("fulltext_xml_url"):
# Proceed with extraction
pass3. Adjust window size for context:
- Methods: 400-500 chars (full sentences)
- Quick verification: 150-200 chars
- Default: 220 chars (balanced)
4. Handle failures gracefully:
python
if "fulltext_snippets" not in article:
# Fallback: use abstract or skip
print(f"No full text available: {article['title']}")5. Document full-text sources in report:
markdown
undefined1. 将搜索术语限制为3-5个特定关键词:
- ✅ 推荐:
["ciprofloxacin 5 μg/mL", "HEK293 cells", "RNA-seq"] - ❌ 不推荐:(过于宽泛)
["drug", "method", "significant"]
2. 提取前检查开放获取状态:
python
if article.get("open_access") and article.get("fulltext_xml_url"):
# 继续提取
pass3. 根据上下文调整窗口大小:
- 方法章节:400-500字符(完整句子)
- 快速验证:150-200字符
- 默认:220字符(平衡上下文和精准度)
4. 优雅处理失败情况:
python
if "fulltext_snippets" not in article:
# 备选方案:使用摘要或跳过
print(f"无全文可用:{article['title']}")5. 在报告中记录全文来源:
markdown
undefinedMethods Verification
方法验证
Antibiotic concentrations (verified from full text):
- Study A: Ciprofloxacin 5 μg/mL [PMC12345, Methods section]
- Study B: Meropenem 8 μg/mL [arXiv:2301.12345, Experimental Design]
Note: Full-text verification performed on 8/15 OA papers (53% coverage)
undefined抗生素浓度(从全文验证):
- 研究A:环丙沙星5 μg/mL [PMC12345,方法章节]
- 研究B:美罗培南8 μg/mL [arXiv:2301.12345,实验设计]
注:对15篇开放获取文献中的8篇进行了全文验证(覆盖率53%)
undefined2.5 Tool Failure Handling
2.5 工具故障处理
Automatic retry strategy:
Attempt 1: Call tool
If timeout/error:
Wait 2 seconds
Attempt 2: Retry
If still fails:
Wait 5 seconds
Attempt 3: Try fallback tool
If fallback fails:
Document "Data unavailable" in reportFallback chains:
| Primary Tool | Fallback 1 | Fallback 2 |
|---|---|---|
| | OpenAlex citations |
| SemanticScholar recommendations | Manual keyword search |
| | Document as unavailable |
| Europe PMC OA flags | OpenAlex OA field |
自动重试策略:
尝试1:调用工具
如果超时/错误:
等待2秒
尝试2:重试
如果仍失败:
等待5秒
尝试3:使用备选工具
如果备选工具也失败:
在报告中记录“数据不可用”备选工具链:
| 主工具 | 备选1 | 备选2 |
|---|---|---|
| | OpenAlex引文 |
| SemanticScholar推荐文献 | 手动关键词搜索 |
| | 记录为不可用 |
| Europe PMC开放获取标记 | OpenAlex开放获取字段 |
2.6 Open Access Handling (Best-Effort)
2.6 开放获取处理(尽最大努力)
If Unpaywall email provided: Check OA status for all papers with DOIs
If no Unpaywall email: Use best-effort OA signals:
- Europe PMC: field
isOpenAccess - PMC: All PMC papers are OA
- OpenAlex: field
is_oa - DOAJ: All DOAJ papers are OA
Label in report:
markdown
*OA Status: Best-effort (Unpaywall not configured)*如果提供Unpaywall邮箱:检查所有带DOI文献的开放获取状态
如果未提供Unpaywall邮箱:使用尽最大努力的开放获取信号:
- Europe PMC:字段
isOpenAccess - PMC:所有PMC文献均为开放获取
- OpenAlex:字段
is_oa - DOAJ:所有DOAJ文献均为开放获取
报告中标记:
markdown
*开放获取状态:尽最大努力检测(未配置Unpaywall)*Phase 3: Evidence Grading
阶段3:证据分级
CRITICAL: Grade every claim by evidence strength to prevent low-signal mentions from diluting the report.
关键:为每个主张按证据强度分级,避免低价值信息稀释报告内容。
Evidence Tiers
证据层级
| Tier | Label | Description | Example |
|---|---|---|---|
| T1 | ★★★ Mechanistic | In-target mechanistic study with direct experimental evidence | CRISPR KO + rescue |
| T2 | ★★☆ Functional | Functional study showing role (may be in pathway context) | siRNA knockdown phenotype |
| T3 | ★☆☆ Association | Screen hit, GWAS association, correlation | High-throughput screen |
| T4 | ☆☆☆ Mention | Review mention, text-mined interaction, peripheral reference | Review article |
| 层级 | 标签 | 描述 | 示例 |
|---|---|---|---|
| T1 | ★★★ 机制研究 | 针对目标的机制研究,带有直接实验证据 | CRISPR敲除 + 回复实验 |
| T2 | ★★☆ 功能研究 | 展示作用的功能研究(可能在通路背景下) | siRNA敲低表型 |
| T3 | ★☆☆ 关联研究 | 筛选结果、GWAS关联、相关性 | 高通量筛选 |
| T4 | ☆☆☆ 提及 | 综述提及、文本挖掘的相互作用、外围参考文献 | 综述文章 |
How to Apply
分级应用方法
In report, label sections and claims:
markdown
undefined在报告中,为章节和主张添加标签:
markdown
undefinedMechanism of Action
作用机制
ATP6V1A is the catalytic subunit responsible for ATP hydrolysis in the V-ATPase
complex [★★★ Mechanistic: PMID:12345678]. Loss-of-function mutations cause
vacuolar pH dysregulation [★★★: PMID:23456789].
The target has been implicated in mTORC1 signaling through lysosomal amino acid
sensing [★★☆ Functional: PMID:34567890], though direct interaction data is limited.
A genome-wide screen identified ATP6V1A as essential in cancer cell lines
[★☆☆ Association: PMID:45678901, DepMap].
undefinedATP6V1A是V-ATPase复合物中负责ATP水解的催化亚基 [★★★ 机制研究:PMID:12345678]。功能缺失突变会导致液泡pH失调 [★★★:PMID:23456789]。
该靶点通过溶酶体氨基酸感应参与mTORC1信号通路 [★★☆ 功能研究:PMID:34567890],但直接相互作用数据有限。
全基因组筛选发现ATP6V1A在癌细胞系中是必需的 [★☆☆ 关联研究:PMID:45678901,DepMap]。
undefinedTheme-Level Grading
主题层面的分级
For each theme section, summarize evidence quality:
markdown
undefined针对每个主题章节,总结证据质量:
markdown
undefined3.1 Lysosomal Acidification (12 papers)
3.1 溶酶体酸化(12篇文献)
Evidence Quality: Strong (8 mechanistic, 3 functional, 1 association)
[Theme content...]
---证据质量:强(8篇机制研究、3篇功能研究、1篇关联研究)
[主题内容...]
---Report Structure: Mandatory Completeness Checklist
报告结构:强制完整性清单
CRITICAL: This checklist/template applies to Full Deep-Research Mode. For Factoid / Verification Mode, use a short fact-check report (see Phase 0) and do not force the full 15-section template.
关键:此清单/模板适用于完整深度研究模式。对于事实类/验证模式,使用简短的事实核查报告(见阶段0),无需使用完整的15章节模板。
Output Files
输出文件
- - Main narrative report (Full Deep-Research Mode)
[topic]_report.md - - Short verification report (Factoid / Verification Mode)
[topic]_factcheck_report.md - - Full deduplicated bibliography (always created)
[topic]_bibliography.json - - Methodology details (ONLY if user requests)
methods_appendix.md
- - 主要叙述性报告(完整深度研究模式)
[topic]_report.md - - 简短验证报告(事实类/验证模式)
[topic]_factcheck_report.md - - 完整去重参考文献列表(始终生成)
[topic]_bibliography.json - - 方法细节(仅在用户要求时提供)
methods_appendix.md
Report Template
报告模板
markdown
undefinedmarkdown
undefined[TARGET/TOPIC]: Comprehensive Research Report
[靶点/主题]:全面研究报告
Generated: [Date]
Evidence cutoff: [Date]
Total unique papers: [N]
生成日期:[日期]
证据截止日期:[日期]
总独特文献数:[N]
Executive Summary
执行摘要
[2-3 paragraphs synthesizing key findings across all sections]
Bottom Line: [One-sentence actionable conclusion]
[2-3段综合所有章节的关键发现]
核心结论:[一句话可执行结论]
1. Target Identity & Aliases
1. 靶点身份与别名
[MANDATORY - even for non-target topics, clarify scope]
[必填 - 即使是非靶点主题,也要明确范围]
1.1 Official Identifiers
1.1 官方标识符
[Table of IDs or scope definition]
[ID表格或范围定义]
1.2 Synonyms and Aliases
1.2 同义词与别名
[List all known names - critical for complete literature coverage]
[列出所有已知名称 - 对全面文献覆盖至关重要]
1.3 Known Naming Collisions
1.3 已知命名冲突
[Document collisions and how they were handled]
[记录冲突及处理方式]
2. Protein Architecture
2. 蛋白架构
[MANDATORY for protein targets; state "N/A - not a protein target" otherwise]
[蛋白靶点必填;非蛋白靶点请注明“不适用 - 非蛋白靶点”]
2.1 Domain Structure
2.1 结构域结构
[Table of domains with positions, InterPro IDs]
[带位置、InterPro ID的结构域表格]
2.2 Isoforms
2.2 异构体
[List isoforms, functional differences if known]
[列出异构体及已知功能差异]
2.3 Key Structural Features
2.3 关键结构特征
[Active sites, binding sites, PTMs]
[活性位点、结合位点、翻译后修饰]
2.4 Available Structures
2.4 可用结构
[PDB entries, AlphaFold availability]
[PDB条目、AlphaFold可用性]
3. Complexes & Interaction Partners
3. 复合物与相互作用伙伴
[MANDATORY]
[必填]
3.1 Known Complexes
3.1 已知复合物
[List complexes the protein participates in]
[列出蛋白参与的复合物]
3.2 Direct Interactors
3.2 直接相互作用物
[Table of top interactors with evidence type and scores]
[带证据类型和得分的主要相互作用物表格]
3.3 Functional Interaction Network
3.3 功能相互作用网络
[Describe network context]
[描述网络背景]
4. Subcellular Localization
4. 亚细胞定位
[MANDATORY]
[Table of locations with confidence levels and sources]
[必填]
[带置信度和来源的定位表格]
5. Expression Profile
5. 表达概况
[MANDATORY]
[必填]
5.1 Tissue Expression
5.1 组织表达
[Table of top tissues with TPM values]
[带TPM值的主要组织表格]
5.2 Cell-Type Expression
5.2 细胞类型表达
[If single-cell data available]
[如有单细胞数据]
5.3 Disease-Specific Expression
5.3 疾病特异性表达
[Expression changes in disease contexts]
[疾病背景下的表达变化]
6. Core Mechanisms
6. 核心机制
[MANDATORY - this is the heart of the report]
[必填 - 报告的核心内容]
6.1 Molecular Function
6.1 分子功能
[What the protein does biochemically]
Evidence Quality: [Strong/Moderate/Limited]
[蛋白的生化功能]
证据质量:[强/中等/有限]
6.2 Biological Role
6.2 生物学作用
[Role in cellular/organismal context]
Evidence Quality: [Strong/Moderate/Limited]
[在细胞/生物层面的作用]
证据质量:[强/中等/有限]
6.3 Key Pathways
6.3 关键通路
[Pathway involvement with evidence grades]
[参与的通路及证据分级]
6.4 Regulation
6.4 调控机制
[How the target is regulated]
[靶点的调控方式]
7. Model Organism Evidence
7. 模式生物证据
[MANDATORY]
[必填]
7.1 Mouse Models
7.1 小鼠模型
[Knockout/knockin phenotypes, if any]
[敲除/敲入表型(如有)]
7.2 Other Model Organisms
7.2 其他模式生物
[Yeast, fly, zebrafish, worm data if relevant]
[酵母、果蝇、斑马鱼、线虫数据(如相关)]
7.3 Cross-Species Conservation
7.3 跨物种保守性
[Conservation and functional studies]
[保守性及功能研究]
8. Human Genetics & Variants
8. 人类遗传学与变异
[MANDATORY]
[必填]
8.1 Constraint Scores
8.1 约束评分
[pLI, LOEUF, missense Z - with interpretation]
[pLI、LOEUF、错义Z值及解读]
8.2 Disease-Associated Variants
8.2 疾病相关变异
[ClinVar pathogenic variants]
[ClinVar致病性变异]
8.3 Population Variants
8.3 人群变异
[gnomAD notable variants]
[gnomAD显著变异]
8.4 GWAS Associations
8.4 GWAS关联
[Any GWAS hits for the locus]
[该位点的GWAS结果]
9. Disease Links
9. 疾病关联
[MANDATORY - include evidence strength]
[必填 - 包含证据强度]
9.1 Strong Evidence (Genetic + Functional)
9.1 强证据(遗传 + 功能)
[Diseases with causal evidence]
[有因果证据的疾病]
9.2 Moderate Evidence (Association + Mechanism)
9.2 中等证据(关联 + 机制)
[Diseases with supporting evidence]
[有支持性证据的疾病]
9.3 Weak Evidence (Association Only)
9.3 弱证据(仅关联)
[Diseases with correlation/association only]
[仅存在相关性/关联的疾病]
9.4 Evidence Summary Table
9.4 证据总结表格
| Disease | Evidence Type | Score | Key Papers | Grade |
|---|---|---|---|---|
| [Disease 1] | Genetic + Functional | 0.85 | PMID:xxx | ★★★ |
| [Disease 2] | GWAS + Expression | 0.45 | PMID:yyy | ★★☆ |
| 疾病 | 证据类型 | 得分 | 关键文献 | 分级 |
|---|---|---|---|---|
| [疾病1] | 遗传 + 功能 | 0.85 | PMID:xxx | ★★★ |
| [疾病2] | GWAS + 表达 | 0.45 | PMID:yyy | ★★☆ |
10. Pathogen Involvement
10. 病原体参与情况
[MANDATORY - state "None identified" if not applicable]
[必填 - 如不适用请注明“未发现相关参与”]
10.1 Viral Interactions
10.1 病毒相互作用
[Any viral exploitation or targeting]
[任何病毒利用或靶向情况]
10.2 Bacterial Interactions
10.2 细菌相互作用
[Any bacterial relevance]
[任何细菌相关情况]
10.3 Host Defense Role
10.3 宿主防御作用
[Role in immune response if any]
[如有免疫应答作用]
11. Key Assays & Readouts
11. 关键实验与读数
[MANDATORY]
[必填]
11.1 Biochemical Assays
11.1 生化实验
[Available assays for target activity]
[用于靶点活性的可用实验]
11.2 Cellular Readouts
11.2 细胞读数
[Cell-based assays and phenotypes]
[基于细胞的实验和表型]
11.3 In Vivo Models
11.3 体内模型
[Animal models and endpoints]
[动物模型及终点指标]
12. Research Themes
12. 研究主题
[MANDATORY - structured theme extraction]
[必填 - 结构化主题提取]
12.1 [Theme 1 Name] (N papers)
12.1 [主题1名称](N篇文献)
Evidence Quality: [Strong/Moderate/Limited]
Representative Papers: [≥3 papers or state "insufficient"]
[Theme description with evidence-graded citations]
证据质量:[强/中等/有限]
代表性文献:[≥3篇或注明“证据不足”]
[带证据分级引用的主题描述]
12.2 [Theme 2 Name] (N papers)
12.2 [主题2名称](N篇文献)
[Same structure]
[Continue for all themes - require ≥3 representative papers per theme, or state "limited evidence"]
[相同结构]
[继续所有主题 - 每个主题需≥3篇代表性文献,或注明“证据有限”]
13. Open Questions & Research Gaps
13. 开放问题与研究空白
[MANDATORY]
[必填]
13.1 Mechanistic Unknowns
13.1 机制未知点
[What we don't understand about the target]
[关于靶点的未知内容]
13.2 Therapeutic Unknowns
13.2 治疗未知点
[What we don't know for drug development]
[药物开发中的未知内容]
13.3 Suggested Priority Questions
13.3 建议优先研究问题
[Ranked list of important unanswered questions]
[按重要性排序的未解决问题列表]
14. Biological Model & Testable Hypotheses
14. 生物模型与可测试假设
[MANDATORY - synthesis section]
[必填 - 综合章节]
14.1 Integrated Biological Model
14.1 整合生物模型
[3-5 paragraph synthesis integrating all evidence into coherent model]
[3-5段综合所有证据的连贯模型]
14.2 Testable Hypotheses
14.2 可测试假设
| # | Hypothesis | Perturbation | Readout | Expected Result | Priority |
|---|---|---|---|---|---|
| 1 | [Hypothesis] | [Experiment] | [Measure] | [Prediction] | HIGH |
| 2 | [Hypothesis] | [Experiment] | [Measure] | [Prediction] | HIGH |
| 3 | [Hypothesis] | [Experiment] | [Measure] | [Prediction] | MEDIUM |
| # | 假设 | 扰动方式 | 读数 | 预期结果 | 优先级 |
|---|---|---|---|---|---|
| 1 | [假设] | [实验] | [测量指标] | [预测结果] | 高 |
| 2 | [假设] | [实验] | [测量指标] | [预测结果] | 高 |
| 3 | [假设] | [实验] | [测量指标] | [预测结果] | 中 |
14.3 Suggested Experiments
14.3 建议实验
[Brief description of key experiments to test hypotheses]
[用于验证假设的关键实验简要描述]
15. Conclusions & Recommendations
15. 结论与建议
[MANDATORY]
[必填]
15.1 Key Takeaways
15.1 核心发现
[Bullet points of most important findings]
[最重要发现的要点列表]
15.2 Confidence Assessment
15.2 置信度评估
[Overall confidence in the findings: High/Medium/Low with justification]
[对发现的整体置信度:高/中/低及理由]
15.3 Recommended Next Steps
15.3 推荐下一步行动
[Prioritized action items]
[按优先级排序的行动项]
References
参考文献
[Summary reference list in report - full bibliography in separate file]
[报告中的参考文献摘要 - 完整参考文献列表在单独文件中]
Key Papers (Must-Read)
关键文献(必读)
- [Citation with PMID] - [Why important] [Grade: ★★★]
- ...
- [带PMID的引用] - [重要性] [分级:★★★]
- ...
By Theme
按主题分类
[Organized reference lists]
[按主题组织的参考文献列表]
Data Limitations
数据局限性
- [Any databases that failed or returned no data]
- [Any known gaps in coverage]
- [OA status method used]
Full methodology available in methods_appendix.md upon request.
---- [任何调用失败或无返回数据的数据库]
- [任何已知的覆盖空白]
- [使用的开放获取状态检测方法]
完整方法可应要求提供于methods_appendix.md中。
---Bibliography File Format
参考文献文件格式
File:
[topic]_bibliography.jsonjson
{
"metadata": {
"generated": "2026-02-04",
"query": "ATP6V1A",
"total_papers": 342,
"unique_after_dedup": 287
},
"papers": [
{
"pmid": "12345678",
"doi": "10.1038/xxx",
"title": "Paper Title",
"authors": ["Smith A", "Jones B"],
"year": 2024,
"journal": "Nature",
"source_databases": ["PubMed", "OpenAlex"],
"evidence_tier": "T1",
"themes": ["lysosomal_acidification", "autophagy"],
"oa_status": "gold",
"oa_url": "https://...",
"citation_count": 45,
"in_core_set": true
}
]
}Also generate with same data in tabular format.
[topic]_bibliography.csv文件:
[topic]_bibliography.jsonjson
{
"metadata": {
"generated": "2026-02-04",
"query": "ATP6V1A",
"total_papers": 342,
"unique_after_dedup": 287
},
"papers": [
{
"pmid": "12345678",
"doi": "10.1038/xxx",
"title": "Paper Title",
"authors": ["Smith A", "Jones B"],
"year": 2024,
"journal": "Nature",
"source_databases": ["PubMed", "OpenAlex"],
"evidence_tier": "T1",
"themes": ["lysosomal_acidification", "autophagy"],
"oa_status": "gold",
"oa_url": "https://...",
"citation_count": 45,
"in_core_set": true
}
]
}同时生成,包含相同数据的表格格式。
[topic]_bibliography.csvTheme Extraction Protocol
主题提取协议
Standardized Theme Clustering
标准化主题聚类
- Extract keywords from titles and abstracts
- Cluster into themes using semantic similarity
- Require minimum N papers per theme (default N=3)
- Label themes with standardized names
- 提取关键词:从标题和摘要中提取关键词
- 聚类为主题:使用语义相似度进行聚类
- 主题最小文献数要求:每个主题至少N篇文献(默认N=3)
- 主题命名:使用标准化名称标记主题
Standard Theme Categories (adapt to target)
标准主题类别(根据靶点调整)
For V-ATPase target example:
- - Core function
lysosomal_acidification - - mTORC1 signaling
autophagy_regulation - - Osteoclast function
bone_resorption - - Tumor acidification
cancer_metabolism - - Viral entry mechanism
viral_infection - - Neuronal dysfunction
neurodegenerative - - Renal acid-base
kidney_function - - Assays/tools papers
methodology
以V-ATPase靶点为例:
- - 核心功能
lysosomal_acidification - - mTORC1信号通路
autophagy_regulation - - 破骨细胞功能
bone_resorption - - 肿瘤酸化
cancer_metabolism - - 病毒进入机制
viral_infection - - 神经元功能障碍
neurodegenerative - - 肾脏酸碱平衡
kidney_function - - 实验/工具类文献
methodology
Theme Quality Requirements
主题质量要求
| Papers | Theme Status |
|---|---|
| ≥10 | Major theme (full section) |
| 3-9 | Minor theme (subsection) |
| <3 | Insufficient (note in "limited evidence" or merge) |
| 文献数 | 主题状态 |
|---|---|
| ≥10 | 主要主题(完整章节) |
| 3-9 | 次要主题(子章节) |
| <3 | 证据不足(在“证据有限”中注明或合并) |
Completeness Checklist (Verify Before Delivery)
完整性清单(交付前验证)
ALL boxes must be checked or explicitly marked "N/A" or "Limited evidence"
所有选项必须勾选,或明确标记为“不适用”或“证据有限”
Identity & Context
身份与背景
- Official identifiers resolved (UniProt, Ensembl, NCBI, ChEMBL)
- All synonyms/aliases documented
- Naming collisions identified and handled
- Protein architecture described (or N/A stated)
- Subcellular localization documented
- Baseline expression profile included
- 已解析官方标识符(UniProt、Ensembl、NCBI、ChEMBL)
- 已记录所有同义词/别名
- 已识别并处理命名冲突
- 已描述蛋白架构(或注明不适用)
- 已记录亚细胞定位
- 已包含基础表达概况
Mechanism & Function
机制与功能
- Core mechanism section with evidence grades
- Pathway involvement documented
- Model organism evidence (or "none found")
- Complexes/interaction partners listed
- Key assays/readouts described
- 核心机制章节带证据分级
- 已记录通路参与情况
- 已包含模式生物证据(或“未发现”)
- 已列出复合物/相互作用伙伴
- 已描述关键实验/读数
Disease & Clinical
疾病与临床
- Human genetic variants documented
- Constraint scores with interpretation
- Disease links with evidence strength grades
- Pathogen involvement (or "none identified")
- 已记录人类遗传变异
- 已包含约束评分及解读
- 疾病关联带证据强度分级
- 已记录病原体参与情况(或“未发现相关参与”)
Synthesis
综合内容
- Research themes clustered with ≥3 papers each (or noted as limited)
- Open questions/gaps articulated
- Biological model synthesized
- ≥3 testable hypotheses with experiments
- Conclusions with confidence assessment
- 研究主题已聚类且每个主题≥3篇文献(或注明证据有限)
- 已明确开放问题/空白
- 已合成生物模型
- 已包含≥3个带实验方案的可测试假设
- 结论带置信度评估
Technical
技术要求
- All claims have source attribution
- Evidence grades applied throughout
- Bibliography file generated
- Data limitations documented
- 所有主张均有来源归因
- 全程应用证据分级
- 已生成参考文献文件
- 已记录数据局限性
Quick Reference: Tool Categories
工具分类速查
Literature Tools
文献工具
PubMed_search_articlesPMC_search_papersEuropePMC_search_articlesopenalex_literature_searchCrossref_search_worksSemanticScholar_search_papersBioRxiv_search_preprintsMedRxiv_search_preprintsPubMed_search_articlesPMC_search_papersEuropePMC_search_articlesopenalex_literature_searchCrossref_search_worksSemanticScholar_search_papersBioRxiv_search_preprintsMedRxiv_search_preprintsCitation Tools
引文工具
PubMed_get_cited_byPubMed_get_relatedEuropePMC_get_citationsEuropePMC_get_referencesPubMed_get_cited_byPubMed_get_relatedEuropePMC_get_citationsEuropePMC_get_referencesProtein/Gene Annotation Tools
蛋白/基因注释工具
UniProt_get_entry_by_accessionUniProt_searchUniProt_id_mappingInterPro_get_protein_domainsproteins_api_get_proteinUniProt_get_entry_by_accessionUniProt_searchUniProt_id_mappingInterPro_get_protein_domainsproteins_api_get_proteinExpression Tools
表达工具
GTEx_get_median_gene_expressionGTEx_get_gene_expressionHPA_get_rna_expression_by_sourceHPA_get_comprehensive_gene_details_by_ensembl_idHPA_get_subcellular_locationGTEx_get_median_gene_expressionGTEx_get_gene_expressionHPA_get_rna_expression_by_sourceHPA_get_comprehensive_gene_details_by_ensembl_idHPA_get_subcellular_locationVariant/Disease Tools
变异/疾病工具
gnomad_get_gene_constraintsgnomad_get_geneclinvar_search_variantsOpenTargets_get_diseases_phenotypes_by_target_ensemblgnomad_get_gene_constraintsgnomad_get_geneclinvar_search_variantsOpenTargets_get_diseases_phenotypes_by_target_ensemblPathway Tools
通路工具
GO_get_annotations_for_geneReactome_map_uniprot_to_pathwayskegg_get_gene_infoOpenTargets_get_target_gene_ontology_by_ensemblIDGO_get_annotations_for_geneReactome_map_uniprot_to_pathwayskegg_get_gene_infoOpenTargets_get_target_gene_ontology_by_ensemblIDInteraction Tools
相互作用工具
STRING_get_protein_interactionsintact_get_interactionsOpenTargets_get_target_interactions_by_ensemblIDSTRING_get_protein_interactionsintact_get_interactionsOpenTargets_get_target_interactions_by_ensemblIDOA Tools
开放获取工具
Unpaywall_check_oa_statusUnpaywall_check_oa_statusCommunication with User
与用户的沟通
During research (brief updates):
- "Resolving target identifiers and gathering baseline profile..."
- "Building core paper set with high-precision queries..."
- "Expanding via citation network..."
- "Clustering into themes and grading evidence..."
When the question looks like a factoid:
- Ask (once) if the user wants just the verified answer or a full deep-research report.
- If the user doesn’t specify, default to Factoid / Verification Mode and keep it short + source-backed.
DO NOT expose:
- Raw tool outputs
- Deduplication counts
- Search round details
- Database-by-database results
The report is the deliverable. Methodology stays internal.
研究过程中(简短更新):
- “正在解析靶点标识符并收集基础概况...”
- “正在使用高精度查询构建核心文献集...”
- “正在通过引文网络扩展文献...”
- “正在聚类主题并进行证据分级...”
当问题看起来是事实类时:
- 询问(一次)用户是想要仅经验证的答案还是完整深度研究报告。
- 如果用户未指定,默认使用事实类/验证模式,保持内容简短并附带来源。
禁止暴露:
- 原始工具输出
- 去重计数
- 搜索轮次细节
- 各数据库的单独结果
报告是交付物,方法为内部内容
Summary
总结
This skill produces comprehensive, evidence-graded research reports that:
- Start with disambiguation to prevent naming collisions and missing details
- Use annotation tools to fill gaps when literature is sparse
- Grade all evidence to separate signal from noise
- Require completeness even if stating "limited evidence"
- Synthesize into biological models with testable hypotheses
- Separate narrative from bibliography for scalability
- Keep methodology internal unless explicitly requested
The result is a detailed, actionable research report that reads like an expert synthesis, not a search log.
该技能可生成全面、经证据分级的研究报告,具备以下特点:
- 从消歧起步:避免命名冲突和遗漏细节
- 使用注释工具:在文献稀少时填补空白
- 所有证据分级:区分有效信息与干扰信息
- 强制完整性:即使注明“证据有限”也要确保完整
- 合成生物模型:附带可测试假设
- 叙事与参考文献分离:提升可扩展性
- 方法内部化:除非明确要求,否则不对外展示方法
最终产出的详细、可执行研究报告,读起来像是专家的综合内容,而非搜索日志。