tooluniverse-target-research
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseComprehensive Target Intelligence Gatherer
全面靶点情报收集工具
Gather complete target intelligence by exploring 9 parallel research paths. Supports targets identified by gene symbol, UniProt accession, Ensembl ID, or gene name.
KEY PRINCIPLES:
- Report-first approach - Create report file FIRST, then populate progressively
- Tool parameter verification - Verify params via before calling unfamiliar tools
get_tool_info - Evidence grading - Grade all claims by evidence strength (T1-T4)
- Citation requirements - Every fact must have inline source attribution
- Mandatory completeness - All sections must exist with data minimums or explicit "No data" notes
- Disambiguation first - Resolve all identifiers before research
- Negative results documented - "No drugs found" is data; empty sections are failures
- Collision-aware literature search - Detect and filter naming collisions
- English-first queries - Always use English terms in tool calls, even if the user writes in another language. Translate gene names, disease names, and search terms to English. Only try original-language terms as a fallback if English returns no results. Respond in the user's language
通过探索9条并行研究路径,收集完整的靶点情报。支持通过基因符号、UniProt登录号、Ensembl ID或基因名称识别靶点。
核心原则:
- 先报告后执行 - 先创建报告文件,再逐步填充内容
- 工具参数验证 - 在调用不熟悉的工具前,通过验证参数
get_tool_info - 证据分级 - 按证据强度(T1-T4)为所有结论分级
- 引用要求 - 每个事实必须附带内联来源标注
- 强制完整性 - 所有章节必须存在,要么包含最低要求的数据,要么明确标注“无数据”
- 先消歧再研究 - 在开展研究前解析所有标识符
- 记录阴性结果 - “未找到药物”属于有效数据;空白章节视为失败
- 碰撞感知文献搜索 - 检测并过滤命名冲突
- 优先英文查询 - 即使用户使用其他语言,工具调用也始终使用英文术语。将基因名称、疾病名称和搜索术语翻译成英文。仅当英文查询无结果时,才尝试使用原语言术语作为备选。最终以用户使用的语言回复
Phase 0: Tool Parameter Verification (CRITICAL)
阶段0:工具参数验证(关键步骤)
BEFORE calling ANY tool for the first time, verify its parameters:
python
undefined在首次调用任何工具之前,必须验证其参数:
python
undefinedAlways check tool params to prevent silent failures
始终检查工具参数以避免静默失败
tool_info = tu.tools.get_tool_info(tool_name="Reactome_map_uniprot_to_pathways")
tool_info = tu.tools.get_tool_info(tool_name="Reactome_map_uniprot_to_pathways")
Reveals: takes id
not uniprot_id
iduniprot_id结果显示:该工具接受id
参数,而非uniprot_id
iduniprot_idundefinedundefinedKnown Parameter Corrections (Updated)
已知参数修正(更新版)
| Tool | WRONG Parameter | CORRECT Parameter |
|---|---|---|
| | |
| | |
| | |
| | |
| 工具 | 错误参数 | 正确参数 |
|---|---|---|
| | |
| | |
| 仅 | |
| | |
GTEx Versioned ID Fallback (CRITICAL)
GTEx版本化ID备选方案(关键步骤)
GTEx often requires versioned Ensembl IDs. If returns empty:
ENSG00000123456python
undefinedGTEx通常需要带版本的Ensembl ID。如果返回空结果:
ENSG00000123456python
undefinedStep 1: Get gene info with version
步骤1:获取带版本的基因信息
gene_info = tu.tools.ensembl_lookup_gene(id=ensembl_id, species="human")
version = gene_info.get('version', 1)
gene_info = tu.tools.ensembl_lookup_gene(id=ensembl_id, species="human")
version = gene_info.get('version', 1)
Step 2: Try versioned ID
步骤2:尝试使用版本化ID
versioned_id = f"{ensembl_id}.{version}" # e.g., "ENSG00000123456.12"
result = tu.tools.GTEx_get_median_gene_expression(
gencode_id=versioned_id,
operation="median"
)
---versioned_id = f"{ensembl_id}.{version}" # 示例:"ENSG00000123456.12"
result = tu.tools.GTEx_get_median_gene_expression(
gencode_id=versioned_id,
operation="median"
)
---When to Use This Skill
何时使用该技能
Apply when users:
- Ask about a drug target, protein, or gene
- Need target validation or assessment
- Request druggability analysis
- Want comprehensive target profiling
- Ask "what do we know about [target]?"
- Need target-disease associations
- Request safety profile for a target
当用户有以下需求时适用:
- 询问药物靶点、蛋白质或基因相关信息
- 需要靶点验证或评估
- 请求成药性分析
- 想要全面的靶点分析
- 询问“关于[靶点]我们了解哪些信息?”
- 需要靶点-疾病关联数据
- 请求靶点的安全性分析
Critical Workflow Requirements
关键工作流要求
1. Report-First Approach (MANDATORY)
1. 先报告后执行(强制要求)
DO NOT show the search process or tool outputs to the user. Instead:
-
Create the report file FIRST - Before any data collection:
- File name:
[TARGET]_target_report.md - Initialize with all 14 section headers
- Add placeholder: in each section
[Researching...]
- File name:
-
Progressively update the report - As you gather data:
- Update each section immediately after retrieving data
- Replace with actual content
[Researching...] - Include "No data returned" when tools return empty results
-
Methodology in appendix only - If user requests methodology details, create separate
[TARGET]_methods_appendix.md
禁止向用户展示搜索过程或工具输出。正确流程如下:
-
先创建报告文件 - 在收集任何数据之前:
- 文件名:
[TARGET]_target_report.md - 初始化所有14个章节标题
- 在每个章节中添加占位符:
[研究中...]
- 文件名:
-
逐步更新报告 - 收集到数据后立即更新:
- 获取数据后立即更新对应章节
- 将替换为实际内容
[研究中...] - 当工具返回空结果时,标注“未返回数据”
-
方法论仅放在附录 - 如果用户请求方法论细节,创建单独的文件
[TARGET]_methods_appendix.md
2. Evidence Grading System (MANDATORY)
2. 证据分级系统(强制要求)
CRITICAL: Grade every claim by evidence strength.
关键:为每个结论按证据强度分级。
Evidence Tiers
证据层级
| Tier | Symbol | Criteria | Examples |
|---|---|---|---|
| T1 | ★★★ | Direct mechanistic evidence, human genetic proof | CRISPR KO, patient mutations, crystal structure with mechanism |
| T2 | ★★☆ | Functional studies, model organism validation | siRNA phenotype, mouse KO, biochemical assay |
| T3 | ★☆☆ | Association, screen hits, computational | GWAS hit, DepMap essentiality, expression correlation |
| T4 | ☆☆☆ | Mention, review, text-mined, predicted | Review article, database annotation, computational prediction |
| 层级 | 符号 | 标准 | 示例 |
|---|---|---|---|
| T1 | ★★★ | 直接机制证据、人类遗传学证明 | CRISPR敲除、患者突变、带机制解析的晶体结构 |
| T2 | ★★☆ | 功能研究、模式生物验证 | siRNA表型、小鼠敲除、生化分析 |
| T3 | ★☆☆ | 关联、筛选命中、计算预测 | GWAS命中、DepMap必需性、表达相关性 |
| T4 | ☆☆☆ | 提及、综述、文本挖掘、预测 | 综述文章、数据库注释、计算预测 |
Required Evidence Grading Locations
证据分级必填位置
Evidence grades MUST appear in:
- Executive Summary - Key disease claims graded
- Section 8.2 Disease Associations - Every disease link graded with source type
- Section 11 Literature - Key papers table with evidence tier
- Section 13 Recommendations - Scorecard items reference evidence quality
证据等级必须出现在:
- 执行摘要 - 关键疾病结论需分级
- 8.2疾病关联 - 每个疾病关联需标注来源类型和分级
- 11.文献 - 关键论文表格需包含证据层级
- 13.建议 - 评分卡条目需引用证据质量
Per-Section Evidence Summary
按章节的证据摘要
markdown
---
**Evidence Quality for this Section**: Strong
- Mechanistic (T1): 12 papers
- Functional (T2): 8 papers
- Association (T3): 15 papers
- Mention (T4): 23 papers
**Data Gaps**: No CRISPR data; mouse KO phenotypes limited
---markdown
---
**本章证据质量**:强
- 机制证据(T1):12篇论文
- 功能证据(T2):8篇论文
- 关联证据(T3):15篇论文
- 提及证据(T4):23篇论文
**数据缺口**:无CRISPR数据;小鼠敲除表型数据有限
---3. Citation Requirements (MANDATORY)
3. 引用要求(强制要求)
Every piece of information MUST include its source:
markdown
EGFR mutations cause lung adenocarcinoma [★★★: PMID:15118125, activating mutations
in patients]. *Source: ClinVar, CIViC*每条信息必须包含来源:
markdown
EGFR突变导致肺腺癌 [★★★: PMID:15118125, 患者体内的激活突变]。*来源:ClinVar, CIViC*Core Strategy: 9 Research Paths
核心策略:9条研究路径
Execute 9 research paths (Path 0 is always first):
Target Query (e.g., "EGFR" or "P00533")
│
├─ IDENTIFIER RESOLUTION (always first)
│ └─ Check if GPCR → GPCRdb_get_protein
│
├─ PATH 0: Open Targets Foundation (ALWAYS FIRST - fills gaps in all other paths)
│
├─ PATH 1: Core Identity (names, IDs, sequence, organism)
│ └─ InterProScan_scan_sequence for novel domain prediction (NEW)
├─ PATH 2: Structure & Domains (3D structure, domains, binding sites)
│ └─ If GPCR: GPCRdb_get_structures (active/inactive states)
├─ PATH 3: Function & Pathways (GO terms, pathways, biological role)
├─ PATH 4: Protein Interactions (PPI network, complexes)
├─ PATH 5: Expression Profile (tissue expression, single-cell)
├─ PATH 6: Variants & Disease (mutations, clinical significance)
│ └─ DisGeNET_search_gene for curated gene-disease associations
├─ PATH 7: Drug Interactions (known drugs, druggability, safety)
│ ├─ Pharos_get_target for TDL classification (Tclin/Tchem/Tbio/Tdark)
│ ├─ BindingDB_get_ligands_by_uniprot for known ligands (NEW)
│ ├─ PubChem_search_assays_by_target_gene for HTS data (NEW)
│ ├─ If GPCR: GPCRdb_get_ligands (curated agonists/antagonists)
│ └─ DepMap_get_gene_dependencies for target essentiality
└─ PATH 8: Literature & Research (publications, trends)执行9条研究路径(路径0始终优先执行):
靶点查询(例如:"EGFR"或"P00533")
│
├─ 标识符解析(始终第一步)
│ └─ 检查是否为GPCR → 调用GPCRdb_get_protein
│
├─ 路径0:Open Targets基础数据(始终优先执行 - 填补其他路径的缺口)
│
├─ 路径1:核心身份(名称、ID、序列、物种)
│ └─ 调用InterProScan_scan_sequence进行新结构域预测(新增)
├─ 路径2:结构与结构域(3D结构、结构域、结合位点)
│ └─ 如果是GPCR:调用GPCRdb_get_structures获取激活/非激活状态结构
├─ 路径3:功能与通路(GO术语、通路、生物学角色)
├─ 路径4:蛋白质相互作用(PPI网络、复合物)
├─ 路径5:表达谱(组织表达、单细胞表达)
├─ 路径6:变异与疾病(突变、临床意义)
│ └─ 调用DisGeNET_search_gene获取 curated 基因-疾病关联
├─ 路径7:药物相互作用(已知药物、成药性、安全性)
│ ├─ 调用Pharos_get_target获取TDL分类(Tclin/Tchem/Tbio/Tdark)
│ ├─ 调用BindingDB_get_ligands_by_uniprot获取已知配体(新增)
│ ├─ 调用PubChem_search_assays_by_target_gene获取HTS数据(新增)
│ ├─ 如果是GPCR:调用GPCRdb_get_ligands获取 curated 激动剂/拮抗剂
│ └─ 调用DepMap_get_gene_dependencies获取靶点必需性
└─ 路径8:文献与研究(出版物、趋势)Identifier Resolution (Phase 1)
标识符解析(阶段1)
CRITICAL: Resolve ALL identifiers before any research path.
python
def resolve_target_ids(tu, query):
"""
Resolve target query to ALL needed identifiers.
Returns dict with: query, uniprot, ensembl, ensembl_version, symbol,
entrez, chembl_target, hgnc
"""
ids = {
'query': query,
'uniprot': None,
'ensembl': None,
'ensembl_versioned': None, # For GTEx
'symbol': None,
'entrez': None,
'chembl_target': None,
'hgnc': None,
'full_name': None,
'synonyms': []
}
# [Resolution logic based on input type]
# ... (see current implementation)
# CRITICAL: Get versioned Ensembl ID for GTEx
if ids['ensembl']:
gene_info = tu.tools.ensembl_lookup_gene(id=ids['ensembl'], species="human")
if gene_info and gene_info.get('version'):
ids['ensembl_versioned'] = f"{ids['ensembl']}.{gene_info['version']}"
# Also get synonyms for literature collision detection
ids['full_name'] = gene_info.get('description', '').split(' [')[0]
# Get UniProt alternative names for synonyms
if ids['uniprot']:
alt_names = tu.tools.UniProt_get_alternative_names_by_accession(accession=ids['uniprot'])
if alt_names:
ids['synonyms'].extend(alt_names)
return ids关键:在开展任何研究路径之前,解析所有标识符。
python
def resolve_target_ids(tu, query):
"""
将靶点查询解析为所有所需标识符。
返回包含以下字段的字典:query, uniprot, ensembl, ensembl_version, symbol,
entrez, chembl_target, hgnc
"""
ids = {
'query': query,
'uniprot': None,
'ensembl': None,
'ensembl_versioned': None, # 用于GTEx
'symbol': None,
'entrez': None,
'chembl_target': None,
'hgnc': None,
'full_name': None,
'synonyms': []
}
# [基于输入类型的解析逻辑]
# ...(参见当前实现)
# 关键:获取带版本的Ensembl ID用于GTEx
if ids['ensembl']:
gene_info = tu.tools.ensembl_lookup_gene(id=ids['ensembl'], species="human")
if gene_info and gene_info.get('version'):
ids['ensembl_versioned'] = f"{ids['ensembl']}.{gene_info['version']}"
# 同时获取同义词用于文献碰撞检测
ids['full_name'] = gene_info.get('description', '').split(' [')[0]
# 获取UniProt别名作为同义词
if ids['uniprot']:
alt_names = tu.tools.UniProt_get_alternative_names_by_accession(accession=ids['uniprot'])
if alt_names:
ids['synonyms'].extend(alt_names)
return idsGPCR Target Detection (NEW)
GPCR靶点检测(新增)
~35% of approved drugs target GPCRs. After identifier resolution, check if target is a GPCR:
python
def check_gpcr_target(tu, ids):
"""
Check if target is a GPCR and retrieve specialized data.
Call after identifier resolution.
"""
symbol = ids.get('symbol', '')
# Build GPCRdb entry name
entry_name = f"{symbol.lower()}_human"
gpcr_info = tu.tools.GPCRdb_get_protein(
operation="get_protein",
protein=entry_name
)
if gpcr_info.get('status') == 'success':
# Target is a GPCR - get specialized data
# Get structures with receptor state
structures = tu.tools.GPCRdb_get_structures(
operation="get_structures",
protein=entry_name
)
# Get known ligands (critical for binder projects)
ligands = tu.tools.GPCRdb_get_ligands(
operation="get_ligands",
protein=entry_name
)
# Get mutation data
mutations = tu.tools.GPCRdb_get_mutations(
operation="get_mutations",
protein=entry_name
)
return {
'is_gpcr': True,
'gpcr_family': gpcr_info['data'].get('family'),
'gpcr_class': gpcr_info['data'].get('receptor_class'),
'structures': structures.get('data', {}).get('structures', []),
'ligands': ligands.get('data', {}).get('ligands', []),
'mutations': mutations.get('data', {}).get('mutations', []),
'ballesteros_numbering': True # GPCRdb provides this
}
return {'is_gpcr': False}GPCRdb Report Section (add to Section 2 for GPCR targets):
markdown
undefined约35%的获批药物靶点为GPCR。完成标识符解析后,检查靶点是否为GPCR:
python
def check_gpcr_target(tu, ids):
"""
检查靶点是否为GPCR并获取专用数据。
在标识符解析后调用。
"""
symbol = ids.get('symbol', '')
# 构建GPCRdb条目名称
entry_name = f"{symbol.lower()}_human"
gpcr_info = tu.tools.GPCRdb_get_protein(
operation="get_protein",
protein=entry_name
)
if gpcr_info.get('status') == 'success':
# 靶点为GPCR - 获取专用数据
# 获取带受体状态的结构
structures = tu.tools.GPCRdb_get_structures(
operation="get_structures",
protein=entry_name
)
# 获取已知配体(对结合物项目至关重要)
ligands = tu.tools.GPCRdb_get_ligands(
operation="get_ligands",
protein=entry_name
)
# 获取突变数据
mutations = tu.tools.GPCRdb_get_mutations(
operation="get_mutations",
protein=entry_name
)
return {
'is_gpcr': True,
'gpcr_family': gpcr_info['data'].get('family'),
'gpcr_class': gpcr_info['data'].get('receptor_class'),
'structures': structures.get('data', {}).get('structures', []),
'ligands': ligands.get('data', {}).get('ligands', []),
'mutations': mutations.get('data', {}).get('mutations', []),
'ballesteros_numbering': True # GPCRdb提供该编号
}
return {'is_gpcr': False}GPCRdb报告章节(为GPCR靶点添加到章节2):
markdown
undefined2.x GPCR-Specific Data (GPCRdb)
2.x GPCR专用数据(GPCRdb)
Receptor Class: Class A (Rhodopsin-like)
GPCR Family: Adrenoceptors
GPCR Family: Adrenoceptors
Structures by State:
| PDB ID | State | Resolution | Ligand | Year |
|---|---|---|---|---|
| 3SN6 | Active | 3.2Å | Agonist (BI-167107) | 2011 |
| 2RH1 | Inactive | 2.4Å | Antagonist (carazolol) | 2007 |
Known Ligands: 45 agonists, 32 antagonists, 8 allosteric modulators
Key Binding Site Residues (Ballesteros-Weinstein): 3.32, 5.42, 6.48, 7.39
Key Binding Site Residues (Ballesteros-Weinstein): 3.32, 5.42, 6.48, 7.39
undefined受体类别:A类(视紫红质样)
GPCR家族:肾上腺素能受体
GPCR家族:肾上腺素能受体
按状态分类的结构:
| PDB ID | 状态 | 分辨率 | 配体 | 年份 |
|---|---|---|---|---|
| 3SN6 | 激活态 | 3.2Å | 激动剂(BI-167107) | 2011 |
| 2RH1 | 非激活态 | 2.4Å | 拮抗剂(卡拉洛尔) | 2007 |
已知配体:45种激动剂、32种拮抗剂、8种变构调节剂
关键结合位点残基(Ballesteros-Weinstein编号):3.32, 5.42, 6.48, 7.39
关键结合位点残基(Ballesteros-Weinstein编号):3.32, 5.42, 6.48, 7.39
undefinedCollision Detection for Literature Search
文献搜索的碰撞检测
Before literature search, detect naming collisions:
python
def detect_collisions(tu, symbol, full_name):
"""
Detect if gene symbol has naming collisions in literature.
Returns negative filter terms if collisions found.
"""
# Search by symbol in title
results = tu.tools.PubMed_search_articles(
query=f'"{symbol}"[Title]',
limit=20
)
# Check if >20% are off-topic
off_topic_terms = []
for paper in results.get('articles', []):
title = paper.get('title', '').lower()
# Check if title mentions biology/protein/gene context
bio_terms = ['protein', 'gene', 'cell', 'expression', 'mutation', 'kinase', 'receptor']
if not any(term in title for term in bio_terms):
# Extract potential collision terms
# e.g., "JAK" might collide with "Just Another Kinase" jokes
# e.g., "WDR7" might collide with other WDR family members in certain contexts
pass
# Build negative filter
collision_filter = ""
if off_topic_terms:
collision_filter = " NOT " + " NOT ".join(off_topic_terms)
return collision_filter在文献搜索前,检测命名冲突:
python
def detect_collisions(tu, symbol, full_name):
"""
检测基因符号在文献中是否存在命名冲突。
如果发现冲突,返回负面过滤术语。
"""
# 按标题中的符号搜索
results = tu.tools.PubMed_search_articles(
query=f'"{symbol}"[Title]',
limit=20
)
# 检查是否超过20%的结果偏离主题
off_topic_terms = []
for paper in results.get('articles', []):
title = paper.get('title', '').lower()
# 检查标题是否提及生物学/蛋白质/基因相关语境
bio_terms = ['protein', 'gene', 'cell', 'expression', 'mutation', 'kinase', 'receptor']
if not any(term in title for term in bio_terms):
# 提取潜在冲突术语
# 例如:"JAK"可能与"Just Another Kinase"玩笑冲突
# 例如:"WDR7"在某些语境下可能与其他WDR家族成员冲突
pass
# 构建负面过滤器
collision_filter = ""
if off_topic_terms:
collision_filter = " NOT " + " NOT ".join(off_topic_terms)
return collision_filterPATH 0: Open Targets Foundation (ALWAYS FIRST)
路径0:Open Targets基础数据(始终优先执行)
Objective: Populate baseline data for Sections 5, 8, 9, 10, 11 before specialized queries.
CRITICAL: Open Targets provides the most comprehensive aggregated data. Query ALL these endpoints:
| Endpoint | Section | Data Type |
|---|---|---|
| 8 | Diseases/phenotypes |
| 9 | Druggability assessment |
| 10 | Safety liabilities |
| 6 | PPI network |
| 5 | GO annotations |
| 11 | Literature |
| 8/10 | Mouse KO phenotypes |
| 9 | Chemical probes |
| 9 | Known drugs |
目标:在进行专用查询前,填充章节5、8、9、10、11的基线数据。
关键:Open Targets提供最全面的聚合数据。需查询所有以下端点:
| 端点 | 章节 | 数据类型 |
|---|---|---|
| 8 | 疾病/表型 |
| 9 | 成药性评估 |
| 10 | 安全性风险 |
| 6 | PPI网络 |
| 5 | GO注释 |
| 11 | 文献 |
| 8/10 | 小鼠敲除表型 |
| 9 | 化学探针 |
| 9 | 已知药物 |
Path 0 Implementation
路径0实现
python
def path_0_open_targets(tu, ids):
"""
Open Targets foundation data - fills gaps for sections 5, 6, 8, 9, 10, 11.
ALWAYS run this first.
"""
ensembl_id = ids['ensembl']
if not ensembl_id:
return {'status': 'skipped', 'reason': 'No Ensembl ID'}
results = {}
# 1. Diseases & Phenotypes (Section 8)
diseases = tu.tools.OpenTargets_get_diseases_phenotypes_by_target_ensemblId(
ensemblId=ensembl_id
)
results['diseases'] = diseases if diseases else {'note': 'No disease associations returned'}
# 2. Tractability (Section 9)
tractability = tu.tools.OpenTargets_get_target_tractability_by_ensemblId(
ensemblId=ensembl_id
)
results['tractability'] = tractability if tractability else {'note': 'No tractability data returned'}
# 3. Safety Profile (Section 10)
safety = tu.tools.OpenTargets_get_target_safety_profile_by_ensemblId(
ensemblId=ensembl_id
)
results['safety'] = safety if safety else {'note': 'No safety liabilities identified'}
# 4. Interactions (Section 6)
interactions = tu.tools.OpenTargets_get_target_interactions_by_ensemblId(
ensemblId=ensembl_id
)
results['interactions'] = interactions if interactions else {'note': 'No interactions returned'}
# 5. GO Annotations (Section 5)
go_terms = tu.tools.OpenTargets_get_target_gene_ontology_by_ensemblId(
ensemblId=ensembl_id
)
results['go_terms'] = go_terms if go_terms else {'note': 'No GO annotations returned'}
# 6. Publications (Section 11)
publications = tu.tools.OpenTargets_get_publications_by_target_ensemblId(
ensemblId=ensembl_id
)
results['publications'] = publications if publications else {'note': 'No publications returned'}
# 7. Mouse Models (Section 8/10)
mouse_models = tu.tools.OpenTargets_get_biological_mouse_models_by_ensemblId(
ensemblId=ensembl_id
)
results['mouse_models'] = mouse_models if mouse_models else {'note': 'No mouse model data returned'}
# 8. Chemical Probes (Section 9)
probes = tu.tools.OpenTargets_get_chemical_probes_by_target_ensemblId(
ensemblId=ensembl_id
)
results['chemical_probes'] = probes if probes else {'note': 'No chemical probes available'}
# 9. Associated Drugs (Section 9)
drugs = tu.tools.OpenTargets_get_associated_drugs_by_target_ensemblId(
ensemblId=ensembl_id
)
results['drugs'] = drugs if drugs else {'note': 'No approved/trial drugs found'}
return resultspython
def path_0_open_targets(tu, ids):
"""
Open Targets基础数据 - 填补章节5、6、8、9、10、11的缺口。
始终优先运行该路径。
"""
ensembl_id = ids['ensembl']
if not ensembl_id:
return {'status': 'skipped', 'reason': '无Ensembl ID'}
results = {}
# 1. 疾病与表型(章节8)
diseases = tu.tools.OpenTargets_get_diseases_phenotypes_by_target_ensemblId(
ensemblId=ensembl_id
)
results['diseases'] = diseases if diseases else {'note': '未返回疾病关联数据'}
# 2. 成药性(章节9)
tractability = tu.tools.OpenTargets_get_target_tractability_by_ensemblId(
ensemblId=ensembl_id
)
results['tractability'] = tractability if tractability else {'note': '未返回成药性数据'}
# 3. 安全性概况(章节10)
safety = tu.tools.OpenTargets_get_target_safety_profile_by_ensemblId(
ensemblId=ensembl_id
)
results['safety'] = safety if safety else {'note': '未识别到安全性风险'}
# 4. 相互作用(章节6)
interactions = tu.tools.OpenTargets_get_target_interactions_by_ensemblId(
ensemblId=ensembl_id
)
results['interactions'] = interactions if interactions else {'note': '未返回相互作用数据'}
# 5. GO注释(章节5)
go_terms = tu.tools.OpenTargets_get_target_gene_ontology_by_ensemblId(
ensemblId=ensembl_id
)
results['go_terms'] = go_terms if go_terms else {'note': '未返回GO注释数据'}
# 6. 出版物(章节11)
publications = tu.tools.OpenTargets_get_publications_by_target_ensemblId(
ensemblId=ensembl_id
)
results['publications'] = publications if publications else {'note': '未返回出版物数据'}
# 7. 小鼠模型(章节8/10)
mouse_models = tu.tools.OpenTargets_get_biological_mouse_models_by_ensemblId(
ensemblId=ensembl_id
)
results['mouse_models'] = mouse_models if mouse_models else {'note': '未返回小鼠模型数据'}
# 8. 化学探针(章节9)
probes = tu.tools.OpenTargets_get_chemical_probes_by_target_ensemblId(
ensemblId=ensembl_id
)
results['chemical_probes'] = probes if probes else {'note': '无可用化学探针'}
# 9. 关联药物(章节9)
drugs = tu.tools.OpenTargets_get_associated_drugs_by_target_ensemblId(
ensemblId=ensembl_id
)
results['drugs'] = drugs if drugs else {'note': '未找到获批/临床试验药物'}
return resultsNegative Results Are Data
阴性结果也是有效数据
CRITICAL: Always document when a query returns empty:
markdown
undefined关键:始终记录查询返回空结果的情况:
markdown
undefined9.3 Chemical Probes
9.3化学探针
Status: No validated chemical probes available for this target.
Source: OpenTargets_get_chemical_probes_by_target_ensemblId returned empty
Implication: Tool compound development would be needed for chemical biology studies.
---状态:该靶点暂无经过验证的化学探针。
来源:OpenTargets_get_chemical_probes_by_target_ensemblId返回空结果
影响:化学生物学研究需要开发工具化合物。
---PATH 2: Structure & Domains (Enhanced)
路径2:结构与结构域(增强版)
Objective: Robust structure coverage using 3-step chain.
目标:通过三步流程实现可靠的结构覆盖。
3-Step Structure Search Chain
三步结构搜索流程
Do NOT rely solely on PDB text search. Use this chain:
python
def path_structure_robust(tu, ids):
"""
Robust structure search using 3-step chain.
"""
structures = {'pdb': [], 'alphafold': None, 'domains': [], 'method_notes': []}
# STEP 1: UniProt PDB Cross-References (most reliable)
if ids['uniprot']:
entry = tu.tools.UniProt_get_entry_by_accession(accession=ids['uniprot'])
pdb_xrefs = [x for x in entry.get('uniProtKBCrossReferences', [])
if x.get('database') == 'PDB']
for xref in pdb_xrefs:
pdb_id = xref.get('id')
# Get details for each PDB
pdb_info = tu.tools.get_protein_metadata_by_pdb_id(pdb_id=pdb_id)
if pdb_info:
structures['pdb'].append(pdb_info)
structures['method_notes'].append(f"Step 1: {len(pdb_xrefs)} PDB cross-refs from UniProt")
# STEP 2: Sequence-based PDB Search (catches missing annotations)
if ids['uniprot'] and len(structures['pdb']) < 5:
sequence = tu.tools.UniProt_get_sequence_by_accession(accession=ids['uniprot'])
if sequence and len(sequence) < 1000: # Reasonable length for search
similar = tu.tools.PDB_search_similar_structures(
sequence=sequence[:500], # Use first 500 AA if long
identity_cutoff=0.7
)
if similar:
for hit in similar[:10]: # Top 10 similar
if hit['pdb_id'] not in [s.get('pdb_id') for s in structures['pdb']]:
structures['pdb'].append(hit)
structures['method_notes'].append(f"Step 2: Sequence search (identity ≥70%)")
# STEP 3: Domain-based Search (for multi-domain proteins)
if ids['uniprot']:
domains = tu.tools.InterPro_get_protein_domains(uniprot_accession=ids['uniprot'])
structures['domains'] = domains if domains else []
# For large proteins with domains, search by domain sequence windows
if len(structures['pdb']) < 3 and domains:
for domain in domains[:3]: # Top 3 domains
domain_name = domain.get('name', '')
# Could search PDB by domain name
domain_hits = tu.tools.PDB_search_by_keyword(query=domain_name, limit=5)
if domain_hits:
structures['method_notes'].append(f"Step 3: Domain '{domain_name}' search")
# AlphaFold (always check)
alphafold = tu.tools.alphafold_get_prediction(uniprot_accession=ids['uniprot'])
structures['alphafold'] = alphafold if alphafold else {'note': 'No AlphaFold prediction'}
# IMPORTANT: Document limitations
if not structures['pdb']:
structures['limitation'] = "No direct PDB hit does NOT mean no structure exists. Check: (1) structures under different UniProt entries, (2) homolog structures, (3) domain-only structures."
return structures不要仅依赖PDB文本搜索。使用以下流程:
python
def path_structure_robust(tu, ids):
"""
使用三步流程进行可靠的结构搜索。
"""
structures = {'pdb': [], 'alphafold': None, 'domains': [], 'method_notes': []}
# 步骤1:UniProt PDB交叉引用(最可靠)
if ids['uniprot']:
entry = tu.tools.UniProt_get_entry_by_accession(accession=ids['uniprot'])
pdb_xrefs = [x for x in entry.get('uniProtKBCrossReferences', [])
if x.get('database') == 'PDB']
for xref in pdb_xrefs:
pdb_id = xref.get('id')
# 获取每个PDB的详细信息
pdb_info = tu.tools.get_protein_metadata_by_pdb_id(pdb_id=pdb_id)
if pdb_info:
structures['pdb'].append(pdb_info)
structures['method_notes'].append(f"步骤1:从UniProt获取{len(pdb_xrefs)}条PDB交叉引用")
# 步骤2:基于序列的PDB搜索(捕获缺失的注释)
if ids['uniprot'] and len(structures['pdb']) < 5:
sequence = tu.tools.UniProt_get_sequence_by_accession(accession=ids['uniprot'])
if sequence and len(sequence) < 1000: # 序列长度适合搜索
similar = tu.tools.PDB_search_similar_structures(
sequence=sequence[:500], # 如果序列过长,使用前500个氨基酸
identity_cutoff=0.7
)
if similar:
for hit in similar[:10]: # 前10个相似结构
if hit['pdb_id'] not in [s.get('pdb_id') for s in structures['pdb']]:
structures['pdb'].append(hit)
structures['method_notes'].append(f"步骤2:序列搜索(一致性≥70%)")
# 步骤3:基于结构域的搜索(针对多结构域蛋白质)
if ids['uniprot']:
domains = tu.tools.InterPro_get_protein_domains(uniprot_accession=ids['uniprot'])
structures['domains'] = domains if domains else []
# 对于带结构域的大蛋白,按结构域序列窗口搜索
if len(structures['pdb']) < 3 and domains:
for domain in domains[:3]: # 前3个结构域
domain_name = domain.get('name', '')
# 可按结构域名称搜索PDB
domain_hits = tu.tools.PDB_search_by_keyword(query=domain_name, limit=5)
if domain_hits:
structures['method_notes'].append(f"步骤3:结构域'{domain_name}'搜索")
# AlphaFold(始终检查)
alphafold = tu.tools.alphafold_get_prediction(uniprot_accession=ids['uniprot'])
structures['alphafold'] = alphafold if alphafold else {'note': '无AlphaFold预测结果'}
# 重要:记录局限性
if not structures['pdb']:
structures['limitation'] = "无直接PDB命中并不意味着不存在结构。请检查:(1) 不同UniProt条目下的结构,(2) 同源结构,(3) 仅含结构域的结构。"
return structuresStructure Section Output Format
结构章节输出格式
markdown
undefinedmarkdown
undefined4.1 Experimental Structures (PDB)
4.1实验结构(PDB)
Total PDB Entries: 23 structures (Source: UniProt cross-references)
Search Method: 3-step chain (UniProt xrefs → sequence search → domain search)
| PDB ID | Resolution | Method | Ligand | Coverage | Year |
|---|---|---|---|---|---|
| 1M17 | 2.6Å | X-ray | Erlotinib | 672-998 | 2002 |
| 3POZ | 2.8Å | X-ray | Gefitinib | 696-1022 | 2010 |
Note: "No direct PDB hit" ≠ "no structure exists". Check homologs and domain structures.
---PDB条目总数:23个结构 来源:UniProt交叉引用
搜索方法:三步流程(UniProt交叉引用→序列搜索→结构域搜索)
| PDB ID | 分辨率 | 方法 | 配体 | 覆盖范围 | 年份 |
|---|---|---|---|---|---|
| 1M17 | 2.6Å | X射线 | 厄洛替尼 | 672-998 | 2002 |
| 3POZ | 2.8Å | X射线 | 吉非替尼 | 696-1022 | 2010 |
注意:"无直接PDB命中"≠"不存在结构"。请检查同源结构和结构域结构。
---PATH 5: Expression Profile (Enhanced)
路径5:表达谱(增强版)
GTEx with Versioned ID Fallback
带版本化ID备选方案的GTEx
python
def path_expression(tu, ids):
"""
Expression data with GTEx versioned ID fallback.
"""
results = {'gtex': None, 'hpa': None, 'failed_tools': []}
# GTEx with fallback
ensembl_id = ids['ensembl']
versioned_id = ids.get('ensembl_versioned')
# Try unversioned first
gtex_result = tu.tools.GTEx_get_median_gene_expression(
gencode_id=ensembl_id,
operation="median"
)
# Fallback to versioned if empty
if not gtex_result or gtex_result.get('data') == []:
if versioned_id:
gtex_result = tu.tools.GTEx_get_median_gene_expression(
gencode_id=versioned_id,
operation="median"
)
if gtex_result and gtex_result.get('data'):
results['gtex'] = gtex_result
results['gtex_note'] = f"Used versioned ID: {versioned_id}"
if not results.get('gtex'):
results['failed_tools'].append({
'tool': 'GTEx_get_median_gene_expression',
'tried': [ensembl_id, versioned_id],
'fallback': 'See HPA data below'
})
else:
results['gtex'] = gtex_result
# HPA (always query as backup)
hpa_result = tu.tools.HPA_get_rna_expression_by_source(ensembl_id=ensembl_id)
results['hpa'] = hpa_result if hpa_result else {'note': 'No HPA RNA data'}
return resultspython
def path_expression(tu, ids):
"""
带GTEx版本化ID备选方案的表达数据。
"""
results = {'gtex': None, 'hpa': None, 'failed_tools': []}
# 带备选方案的GTEx
ensembl_id = ids['ensembl']
versioned_id = ids.get('ensembl_versioned')
# 先尝试未版本化ID
gtex_result = tu.tools.GTEx_get_median_gene_expression(
gencode_id=ensembl_id,
operation="median"
)
# 如果返回空结果,备选使用版本化ID
if not gtex_result or gtex_result.get('data') == []:
if versioned_id:
gtex_result = tu.tools.GTEx_get_median_gene_expression(
gencode_id=versioned_id,
operation="median"
)
if gtex_result and gtex_result.get('data'):
results['gtex'] = gtex_result
results['gtex_note'] = f"使用版本化ID:{versioned_id}"
if not results.get('gtex'):
results['failed_tools'].append({
'tool': 'GTEx_get_median_gene_expression',
'tried': [ensembl_id, versioned_id],
'fallback': '参见下方HPA数据'
})
else:
results['gtex'] = gtex_result
# HPA(始终作为备份查询)
hpa_result = tu.tools.HPA_get_rna_expression_by_source(ensembl_id=ensembl_id)
results['hpa'] = hpa_result if hpa_result else {'note': '无HPA RNA数据'}
return resultsHuman Protein Atlas - Extended Expression (NEW)
人类蛋白质图谱 - 扩展表达数据(新增)
HPA provides comprehensive protein expression data including tissue-level, cell-level, and cell line expression.
python
def get_hpa_comprehensive_expression(tu, gene_symbol):
"""
Get comprehensive expression data from Human Protein Atlas.
Provides:
- Tissue expression (protein and RNA)
- Subcellular localization
- Cell line expression comparison
- Tissue specificity
"""
# 1. Search for gene to get IDs
gene_info = tu.tools.HPA_search_genes_by_query(search_query=gene_symbol)
if not gene_info:
return {'error': f'Gene {gene_symbol} not found in HPA'}
# 2. Get tissue expression with specificity
tissue_search = tu.tools.HPA_generic_search(
search_query=gene_symbol,
columns="g,gs,rnat,rnatsm,scml,scal", # Gene, synonyms, tissue specificity, subcellular
format="json"
)
# 3. Compare expression in cancer cell lines vs normal tissue
cell_lines = ['a549', 'mcf7', 'hela', 'hepg2', 'pc3']
cell_line_expression = {}
for cell_line in cell_lines:
try:
expr = tu.tools.HPA_get_comparative_expression_by_gene_and_cellline(
gene_name=gene_symbol,
cell_line=cell_line
)
cell_line_expression[cell_line] = expr
except:
continue
return {
'gene_info': gene_info,
'tissue_data': tissue_search,
'cell_line_expression': cell_line_expression,
'source': 'Human Protein Atlas'
}HPA Expression Output for Report:
markdown
undefinedHPA提供全面的蛋白质表达数据,包括组织水平、细胞水平和细胞系表达。
python
def get_hpa_comprehensive_expression(tu, gene_symbol):
"""
从人类蛋白质图谱获取全面的表达数据。
提供:
- 组织表达(蛋白质和RNA)
- 亚细胞定位
- 细胞系表达比较
- 组织特异性
"""
# 1. 搜索基因以获取ID
gene_info = tu.tools.HPA_search_genes_by_query(search_query=gene_symbol)
if not gene_info:
return {'error': f'在HPA中未找到基因{gene_symbol}'}
# 2. 获取带特异性的组织表达数据
tissue_search = tu.tools.HPA_generic_search(
search_query=gene_symbol,
columns="g,gs,rnat,rnatsm,scml,scal", # 基因、同义词、组织特异性、亚细胞定位
format="json"
)
# 3. 比较癌细胞系与正常组织的表达
cell_lines = ['a549', 'mcf7', 'hela', 'hepg2', 'pc3']
cell_line_expression = {}
for cell_line in cell_lines:
try:
expr = tu.tools.HPA_get_comparative_expression_by_gene_and_cellline(
gene_name=gene_symbol,
cell_line=cell_line
)
cell_line_expression[cell_line] = expr
except:
continue
return {
'gene_info': gene_info,
'tissue_data': tissue_search,
'cell_line_expression': cell_line_expression,
'source': '人类蛋白质图谱'
}报告中的HPA表达输出:
markdown
undefinedTissue Expression Profile (Human Protein Atlas)
组织表达谱(人类蛋白质图谱)
| Tissue | Protein Level | RNA nTPM | Specificity |
|---|---|---|---|
| Brain | High | 45.2 | Enriched |
| Liver | Medium | 23.1 | Enhanced |
| Kidney | Low | 8.4 | Not detected |
Subcellular Localization: Cytoplasm, Plasma membrane
| 组织 | 蛋白质水平 | RNA nTPM | 特异性 |
|---|---|---|---|
| 脑 | 高 | 45.2 | 富集 |
| 肝 | 中 | 23.1 | 增强 |
| 肾 | 低 | 8.4 | 未检测到 |
亚细胞定位:细胞质、质膜
Cancer Cell Line Expression
癌细胞系表达
| Cell Line | Cancer Type | Expression | vs Normal |
|---|---|---|---|
| A549 | Lung | High | Elevated |
| MCF7 | Breast | Medium | Similar |
| HeLa | Cervical | High | Elevated |
Source: Human Protein Atlas via ,
HPA_search_genes_by_queryHPA_get_comparative_expression_by_gene_and_cellline
**Why HPA for Target Research**:
- **Drug target validation** - Confirm expression in target tissue
- **Safety assessment** - Expression in essential organs
- **Biomarker potential** - Tissue-specific expression
- **Cell line selection** - Choose appropriate models
---| 细胞系 | 癌症类型 | 表达水平 | 与正常组织对比 |
|---|---|---|---|
| A549 | 肺癌 | 高 | 上调 |
| MCF7 | 乳腺癌 | 中 | 相似 |
| HeLa | 宫颈癌 | 高 | 上调 |
来源:人类蛋白质图谱,通过、获取
HPA_search_genes_by_queryHPA_get_comparative_expression_by_gene_and_cellline
**为何针对靶点研究使用HPA**:
- **药物靶点验证** - 确认靶点在目标组织中的表达
- **安全性评估** - 在重要器官中的表达情况
- **生物标志物潜力** - 组织特异性表达
- **细胞系选择** - 选择合适的模型
---PATH 6: Variants & Disease (Enhanced)
路径6:变异与疾病(增强版)
6.1 ClinVar SNV vs CNV Separation
6.1 ClinVar SNV与CNV分离
markdown
undefinedmarkdown
undefined8.3 Clinical Variants (ClinVar)
8.3临床变异(ClinVar)
Single Nucleotide Variants (SNVs)
单核苷酸变异(SNVs)
| Variant | Clinical Significance | Condition | Review Status | PMID |
|---|---|---|---|---|
| p.L858R | Pathogenic | Lung cancer | 4 stars | 15118125 |
| p.T790M | Pathogenic | Drug resistance | 4 stars | 15737014 |
Total Pathogenic SNVs: 47
| 变异 | 临床意义 | 病症 | 评审状态 | PMID |
|---|---|---|---|---|
| p.L858R | 致病性 | 肺癌 | 4星 | 15118125 |
| p.T790M | 致病性 | 耐药性 | 4星 | 15737014 |
致病性SNV总数:47个
Copy Number Variants (CNVs) - Reported Separately
拷贝数变异(CNVs)- 单独报告
| Type | Region | Clinical Significance | Frequency |
|---|---|---|---|
| Amplification | 7p11.2 | Pathogenic | Common in cancer |
Note: CNV data separated as it represents different mutation mechanism
undefined| 类型 | 区域 | 临床意义 | 频率 |
|---|---|---|---|
| 扩增 | 7p11.2 | 致病性 | 在癌症中常见 |
注意:CNV数据单独报告,因为其代表不同的突变机制
undefined6.2 DisGeNET Integration (NEW)
6.2 DisGeNET整合(新增)
DisGeNET provides curated gene-disease associations with evidence scores. Requires:
DISGENET_API_KEYpython
def get_disgenet_associations(tu, ids):
"""
Get gene-disease associations from DisGeNET.
Complements Open Targets with curated association scores.
"""
symbol = ids.get('symbol')
if not symbol:
return {'status': 'skipped', 'reason': 'No gene symbol'}
# Get all disease associations for gene
gda = tu.tools.DisGeNET_search_gene(
operation="search_gene",
gene=symbol,
limit=50
)
if gda.get('status') != 'success':
return {'status': 'error', 'message': 'DisGeNET query failed'}
associations = gda.get('data', {}).get('associations', [])
# Categorize by evidence strength
strong = [] # score >= 0.7
moderate = [] # score 0.4-0.7
weak = [] # score < 0.4
for assoc in associations:
score = assoc.get('score', 0)
disease_name = assoc.get('disease_name', '')
umls_cui = assoc.get('disease_id', '')
entry = {
'disease': disease_name,
'umls_cui': umls_cui,
'score': score,
'evidence_index': assoc.get('ei'),
'dsi': assoc.get('dsi'), # Disease Specificity Index
'dpi': assoc.get('dpi') # Disease Pleiotropy Index
}
if score >= 0.7:
strong.append(entry)
elif score >= 0.4:
moderate.append(entry)
else:
weak.append(entry)
return {
'total_associations': len(associations),
'strong_associations': strong,
'moderate_associations': moderate,
'weak_associations': weak[:10], # Limit weak
'disease_pleiotropy': len(associations) # How many diseases linked
}DisGeNET Report Section (add to Section 8 - Disease Associations):
markdown
undefinedDisGeNET提供带证据评分的curated基因-疾病关联数据。需要:
DISGENET_API_KEYpython
def get_disgenet_associations(tu, ids):
"""
从DisGeNET获取基因-疾病关联数据。
用curated关联分数补充Open Targets数据。
"""
symbol = ids.get('symbol')
if not symbol:
return {'status': 'skipped', 'reason': '无基因符号'}
# 获取基因的所有疾病关联
gda = tu.tools.DisGeNET_search_gene(
operation="search_gene",
gene=symbol,
limit=50
)
if gda.get('status') != 'success':
return {'status': 'error', 'message': 'DisGeNET查询失败'}
associations = gda.get('data', {}).get('associations', [])
# 按证据强度分类
strong = [] # 评分≥0.7
moderate = [] # 评分0.4-0.7
weak = [] # 评分<0.4
for assoc in associations:
score = assoc.get('score', 0)
disease_name = assoc.get('disease_name', '')
umls_cui = assoc.get('disease_id', '')
entry = {
'disease': disease_name,
'umls_cui': umls_cui,
'score': score,
'evidence_index': assoc.get('ei'),
'dsi': assoc.get('dsi'), # 疾病特异性指数
'dpi': assoc.get('dpi') # 疾病多效性指数
}
if score >= 0.7:
strong.append(entry)
elif score >= 0.4:
moderate.append(entry)
else:
weak.append(entry)
return {
'total_associations': len(associations),
'strong_associations': strong,
'moderate_associations': moderate,
'weak_associations': weak[:10], # 限制弱关联数量
'disease_pleiotropy': len(associations) # 关联的疾病数量
}DisGeNET报告章节(添加到章节8 - 疾病关联):
markdown
undefined8.x DisGeNET Gene-Disease Associations (NEW)
8.x DisGeNET基因-疾病关联(新增)
Total Diseases Associated: 47
Disease Pleiotropy Index: High (gene linked to many disease types)
Disease Pleiotropy Index: High (gene linked to many disease types)
关联疾病总数:47种
疾病多效性指数:高(该基因与多种疾病类型相关)
疾病多效性指数:高(该基因与多种疾病类型相关)
Strong Associations (Score ≥0.7)
强关联(评分≥0.7)
| Disease | UMLS CUI | Score | Evidence Index |
|---|---|---|---|
| Non-small cell lung cancer | C0007131 | 0.85 | 0.92 |
| Glioblastoma | C0017636 | 0.78 | 0.88 |
| 疾病 | UMLS CUI | 评分 | 证据指数 |
|---|---|---|---|
| 非小细胞肺癌 | C0007131 | 0.85 | 0.92 |
| 胶质母细胞瘤 | C0017636 | 0.78 | 0.88 |
Moderate Associations (Score 0.4-0.7)
中等关联(评分0.4-0.7)
| Disease | UMLS CUI | Score | DSI |
|---|---|---|---|
| Breast cancer | C0006142 | 0.62 | 0.45 |
Note: DisGeNET score integrates curated databases, GWAS, animal models, and literature
**Evidence Tier Assignment**:
- DisGeNET Score ≥0.7 → Consider T2 evidence (multiple validated sources)
- DisGeNET Score 0.4-0.7 → Consider T3 evidence
- DisGeNET Score <0.4 → T4 evidence only
---| 疾病 | UMLS CUI | 评分 | DSI |
|---|---|---|---|
| 乳腺癌 | C0006142 | 0.62 | 0.45 |
注意:DisGeNET评分整合了curated数据库、GWAS、动物模型和文献数据
**证据层级分配**:
- DisGeNET评分≥0.7 → 视为T2证据(多个验证来源)
- DisGeNET评分0.4-0.7 → 视为T3证据
- DisGeNET评分<0.4 → 仅视为T4证据
---PATH 7: Druggability & Target Validation (ENHANCED)
路径7:成药性与靶点验证(增强版)
7.1 Pharos/TCRD - Target Development Level (NEW)
7.1 Pharos/TCRD - 靶点开发水平(新增)
NIH's Illuminating the Druggable Genome (IDG) portal provides TDL classification for all human proteins:
python
def get_pharos_target_info(tu, ids):
"""
Get Pharos/TCRD target development level and druggability.
TDL Classification:
- Tclin: Approved drug targets
- Tchem: Targets with small molecule activities (IC50 < 30nM)
- Tbio: Targets with biological annotations
- Tdark: Understudied proteins
"""
gene_symbol = ids.get('symbol')
uniprot = ids.get('uniprot')
# Try by gene symbol first
if gene_symbol:
result = tu.tools.Pharos_get_target(
gene=gene_symbol
)
elif uniprot:
result = tu.tools.Pharos_get_target(
uniprot=uniprot
)
else:
return {'status': 'error', 'message': 'Need gene symbol or UniProt'}
if result.get('status') == 'success' and result.get('data'):
target = result['data']
return {
'name': target.get('name'),
'symbol': target.get('sym'),
'tdl': target.get('tdl'), # Tclin/Tchem/Tbio/Tdark
'family': target.get('fam'), # Kinase, GPCR, etc.
'novelty': target.get('novelty'),
'description': target.get('description'),
'publications': target.get('publicationCount'),
'interpretation': interpret_tdl(target.get('tdl'))
}
return None
def interpret_tdl(tdl):
"""Interpret Target Development Level for druggability."""
interpretations = {
'Tclin': 'Approved drug target - highest confidence for druggability',
'Tchem': 'Small molecule active - good chemical tractability',
'Tbio': 'Biologically characterized - may require novel modalities',
'Tdark': 'Understudied - limited data, high novelty potential'
}
return interpretations.get(tdl, 'Unknown')
def search_disease_targets(tu, disease_name):
"""Find targets associated with a disease via Pharos."""
result = tu.tools.Pharos_get_disease_targets(
disease=disease_name,
top=50
)
if result.get('status') == 'success':
targets = result['data'].get('targets', [])
# Group by TDL for prioritization
by_tdl = {'Tclin': [], 'Tchem': [], 'Tbio': [], 'Tdark': []}
for t in targets:
tdl = t.get('tdl', 'Unknown')
if tdl in by_tdl:
by_tdl[tdl].append(t)
return by_tdl
return NonePharos Report Section (add to Section 9 - Druggability):
markdown
undefinedNIH的照亮可成药基因组(IDG)门户为所有人类蛋白质提供TDL分类:
python
def get_pharos_target_info(tu, ids):
"""
获取Pharos/TCRD靶点开发水平和成药性数据。
TDL分类:
- Tclin:已获批药物靶点
- Tchem:具有小分子活性的靶点(IC50 < 30nM)
- Tbio:具有生物学注释的靶点
- Tdark:研究不足的蛋白质
"""
gene_symbol = ids.get('symbol')
uniprot = ids.get('uniprot')
# 先尝试按基因符号查询
if gene_symbol:
result = tu.tools.Pharos_get_target(
gene=gene_symbol
)
elif uniprot:
result = tu.tools.Pharos_get_target(
uniprot=uniprot
)
else:
return {'status': 'error', 'message': '需要基因符号或UniProt登录号'}
if result.get('status') == 'success' and result.get('data'):
target = result['data']
return {
'name': target.get('name'),
'symbol': target.get('sym'),
'tdl': target.get('tdl'), # Tclin/Tchem/Tbio/Tdark
'family': target.get('fam'), # 激酶、GPCR等
'novelty': target.get('novelty'),
'description': target.get('description'),
'publications': target.get('publicationCount'),
'interpretation': interpret_tdl(target.get('tdl'))
}
return None
def interpret_tdl(tdl):
"""为成药性解读靶点开发水平。"""
interpretations = {
'Tclin': '已获批药物靶点 - 成药性置信度最高',
'Tchem': '具有小分子活性 - 化学成药性良好',
'Tbio': '已进行生物学表征 - 可能需要新的药物形式',
'Tdark': '研究不足 - 数据有限,具有高新颖性潜力'
}
return interpretations.get(tdl, '未知')
def search_disease_targets(tu, disease_name):
"""通过Pharos查找与疾病相关的靶点。"""
result = tu.tools.Pharos_get_disease_targets(
disease=disease_name,
top=50
)
if result.get('status') == 'success':
targets = result['data'].get('targets', [])
# 按TDL分组以优先排序
by_tdl = {'Tclin': [], 'Tchem': [], 'Tbio': [], 'Tdark': []}
for t in targets:
tdl = t.get('tdl', 'Unknown')
if tdl in by_tdl:
by_tdl[tdl].append(t)
return by_tdl
return NonePharos报告章节(添加到章节9 - 成药性):
markdown
undefined9.x Pharos/TCRD Target Classification (NEW)
9.x Pharos/TCRD靶点分类(新增)
Target Development Level: Tchem
Protein Family: Kinase
Novelty Score: 0.35 (moderately studied)
Publication Count: 12,456
Protein Family: Kinase
Novelty Score: 0.35 (moderately studied)
Publication Count: 12,456
TDL Interpretation: Target has validated small molecule activities with IC50 < 30nM. Good chemical starting points exist.
Disease Targets Analysis (for disease-centric queries):
| TDL | Count | Examples |
|---|---|---|
| Tclin | 12 | EGFR, ALK, RET |
| Tchem | 45 | KRAS, SHP2, CDK4 |
| Tbio | 78 | Novel kinases |
| Tdark | 23 | Understudied |
Source: Pharos/TCRD via
Pharos_get_targetundefined靶点开发水平:Tchem
蛋白质家族:激酶
新颖性评分:0.35(中等研究程度)
出版物数量:12,456篇
蛋白质家族:激酶
新颖性评分:0.35(中等研究程度)
出版物数量:12,456篇
TDL解读:该靶点具有经过验证的小分子活性,IC50 < 30nM。存在良好的化学起始点。
疾病靶点分析(针对疾病中心型查询):
| TDL | 数量 | 示例 |
|---|---|---|
| Tclin | 12 | EGFR, ALK, RET |
| Tchem | 45 | KRAS, SHP2, CDK4 |
| Tbio | 78 | 新型激酶 |
| Tdark | 23 | 研究不足的靶点 |
来源:Pharos/TCRD,通过获取
Pharos_get_targetundefined7.2 DepMap - Target Essentiality Validation (NEW)
7.2 DepMap - 靶点必需性验证(新增)
CRISPR knockout data from cancer cell lines to validate target essentiality:
python
def assess_target_essentiality(tu, ids):
"""
Is this target essential for cancer cell survival?
Negative effect scores = gene is essential (cells die upon KO)
"""
gene_symbol = ids.get('symbol')
if not gene_symbol:
return {'status': 'error', 'message': 'Need gene symbol'}
deps = tu.tools.DepMap_get_gene_dependencies(
gene_symbol=gene_symbol
)
if deps.get('status') == 'success':
return {
'gene': gene_symbol,
'data': deps.get('data', {}),
'interpretation': 'Negative scores indicate gene is essential for cell survival',
'note': 'Score < -0.5 is strongly essential, < -1.0 is extremely essential'
}
return None
def get_cancer_type_essentiality(tu, gene_symbol, cancer_type):
"""Check if gene is essential in specific cancer type."""
# Get cell lines for cancer type
cell_lines = tu.tools.DepMap_get_cell_lines(
cancer_type=cancer_type,
page_size=20
)
return {
'gene': gene_symbol,
'cancer_type': cancer_type,
'cell_lines': cell_lines.get('data', {}).get('cell_lines', []),
'note': 'Query individual cell lines for dependency scores via DepMap portal'
}DepMap Report Section (add to Section 9 - Druggability):
markdown
undefined来自癌细胞系的CRISPR敲除数据,用于验证靶点必需性:
python
def assess_target_essentiality(tu, ids):
"""
该靶点对癌细胞存活是否必需?
负效应评分 = 基因是必需的(敲除后细胞死亡)
"""
gene_symbol = ids.get('symbol')
if not gene_symbol:
return {'status': 'error', 'message': '需要基因符号'}
deps = tu.tools.DepMap_get_gene_dependencies(
gene_symbol=gene_symbol
)
if deps.get('status') == 'success':
return {
'gene': gene_symbol,
'data': deps.get('data', {}),
'interpretation': '负评分表明基因对细胞存活是必需的',
'note': '评分< -0.5表示强必需,< -1.0表示极强必需'
}
return None
def get_cancer_type_essentiality(tu, gene_symbol, cancer_type):
"""检查基因在特定癌症类型中是否必需。"""
# 获取该癌症类型的细胞系
cell_lines = tu.tools.DepMap_get_cell_lines(
cancer_type=cancer_type,
page_size=20
)
return {
'gene': gene_symbol,
'cancer_type': cancer_type,
'cell_lines': cell_lines.get('data', {}).get('cell_lines', []),
'note': '通过DepMap门户查询单个细胞系的依赖评分'
}DepMap报告章节(添加到章节9 - 成药性):
markdown
undefined9.x Target Essentiality (DepMap) (NEW)
9.x靶点必需性(DepMap)(新增)
Gene Essentiality Assessment:
| Context | Effect Score | Interpretation |
|---|---|---|
| Pan-cancer | -0.42 | Moderately essential |
| Lung cancer | -0.78 | Strongly essential |
| Breast cancer | -0.21 | Weakly essential |
Selectivity: Differential essentiality suggests cancer-type selective target
Cell Lines Tested: 1,054 cancer cell lines from DepMap
Interpretation: Score < -0.5 indicates strong dependency. This target is more essential in lung cancer than other cancer types - suggesting lung-selective targeting may be feasible.
Source: DepMap via
DepMap_get_gene_dependenciesundefined基因必需性评估:
| 场景 | 效应评分 | 解读 |
|---|---|---|
| 泛癌症 | -0.42 | 中等必需 |
| 肺癌 | -0.78 | 强必需 |
| 乳腺癌 | -0.21 | 弱必需 |
选择性:差异必需性表明该靶点具有癌症类型选择性
测试细胞系:来自DepMap的1,054个癌细胞系
解读:评分< -0.5表示强依赖。该靶点在肺癌中比其他癌症类型更必需 - 表明肺癌选择性靶向是可行的。
来源:DepMap,通过获取
DepMap_get_gene_dependenciesundefined7.3 InterProScan - Novel Domain Prediction (NEW)
7.3 InterProScan - 新结构域预测(新增)
For uncharacterized proteins, run InterProScan to predict domains and function:
python
def predict_protein_domains(tu, sequence, title="Query protein"):
"""
Run InterProScan for de novo domain prediction.
Use when:
- Protein has no InterPro annotations
- Novel/uncharacterized protein
- Custom sequence analysis
"""
result = tu.tools.InterProScan_scan_sequence(
sequence=sequence,
title=title,
go_terms=True,
pathways=True
)
if result.get('status') == 'success':
data = result.get('data', {})
# Job may still be running
if data.get('job_status') == 'RUNNING':
return {
'job_id': data.get('job_id'),
'status': 'running',
'note': 'Use InterProScan_get_job_results to retrieve when ready'
}
# Parse completed results
return {
'domains': data.get('domains', []),
'domain_count': data.get('domain_count', 0),
'go_annotations': data.get('go_annotations', []),
'pathways': data.get('pathways', []),
'sequence_length': data.get('sequence_length')
}
return None
def check_interproscan_job(tu, job_id):
"""Check status and get results for InterProScan job."""
status = tu.tools.InterProScan_get_job_status(job_id=job_id)
if status.get('data', {}).get('is_finished'):
results = tu.tools.InterProScan_get_job_results(job_id=job_id)
return results.get('data', {})
return status.get('data', {})When to use InterProScan:
- Novel/uncharacterized proteins (Tdark in Pharos)
- Custom sequences (e.g., protein variants)
- Proteins with outdated/sparse InterPro annotations
- Validating domain predictions
InterProScan Report Section (for novel proteins):
markdown
undefined针对未表征的蛋白质,运行InterProScan以预测结构域和功能:
python
def predict_protein_domains(tu, sequence, title="Query protein"):
"""
运行InterProScan进行从头结构域预测。
使用场景:
- 蛋白质无InterPro注释
- 新型/未表征蛋白质
- 自定义序列分析
"""
result = tu.tools.InterProScan_scan_sequence(
sequence=sequence,
title=title,
go_terms=True,
pathways=True
)
if result.get('status') == 'success':
data = result.get('data', {})
# 任务可能仍在运行
if data.get('job_status') == 'RUNNING':
return {
'job_id': data.get('job_id'),
'status': 'running',
'note': '使用InterProScan_get_job_results在任务完成后获取结果'
}
# 解析已完成的结果
return {
'domains': data.get('domains', []),
'domain_count': data.get('domain_count', 0),
'go_annotations': data.get('go_annotations', []),
'pathways': data.get('pathways', []),
'sequence_length': data.get('sequence_length')
}
return None
def check_interproscan_job(tu, job_id):
"""检查InterProScan任务状态并获取结果。"""
status = tu.tools.InterProScan_get_job_status(job_id=job_id)
if status.get('data', {}).get('is_finished'):
results = tu.tools.InterProScan_get_job_results(job_id=job_id)
return results.get('data', {})
return status.get('data', {})何时使用InterProScan:
- 新型/未表征蛋白质(Pharos中的Tdark)
- 自定义序列(例如:蛋白质变异体)
- InterPro注释过时/稀疏的蛋白质
- 验证结构域预测
InterProScan报告章节(针对新型蛋白质):
markdown
undefinedDomain Prediction (InterProScan) (NEW)
结构域预测(InterProScan)(新增)
Used for uncharacterized protein analysis
Predicted Domains:
| Domain | Database | Start-End | E-value | InterPro Entry |
|---|---|---|---|---|
| Protein kinase domain | Pfam | 45-305 | 1.2e-89 | IPR000719 |
| SH2 domain | SMART | 320-410 | 3.4e-45 | IPR000980 |
Predicted GO Terms:
- GO:0004672 protein kinase activity
- GO:0005524 ATP binding
Predicted Pathways:
- Reactome: Signal Transduction
Source: InterProScan via
InterProScan_scan_sequenceundefined用于未表征蛋白质分析
预测的结构域:
| 结构域 | 数据库 | 起始-终止位置 | E值 | InterPro条目 |
|---|---|---|---|---|
| 蛋白激酶结构域 | Pfam | 45-305 | 1.2e-89 | IPR000719 |
| SH2结构域 | SMART | 320-410 | 3.4e-45 | IPR000980 |
预测的GO术语:
- GO:0004672 蛋白激酶活性
- GO:0005524 ATP结合
预测的通路:
- Reactome: 信号转导
来源:InterProScan,通过获取
InterProScan_scan_sequenceundefined7.4 BindingDB - Known Ligands & Binding Data (NEW)
7.4 BindingDB - 已知配体与结合数据(新增)
BindingDB provides experimental binding affinity data (Ki, IC50, Kd) for target-ligand pairs:
python
def get_bindingdb_ligands(tu, uniprot_id, affinity_cutoff=10000):
"""
Get ligands with measured binding affinities from BindingDB.
Critical for:
- Identifying chemical starting points
- Understanding existing chemical matter
- Assessing tractability with small molecules
Args:
uniprot_id: UniProt accession (e.g., P00533 for EGFR)
affinity_cutoff: Maximum affinity in nM (lower = more potent)
"""
# Get ligands by UniProt
result = tu.tools.BindingDB_get_ligands_by_uniprot(
uniprot=uniprot_id,
affinity_cutoff=affinity_cutoff
)
if result:
ligands = []
for entry in result:
ligands.append({
'smiles': entry.get('smile'),
'affinity_type': entry.get('affinity_type'), # Ki, IC50, Kd
'affinity_nM': entry.get('affinity'),
'monomer_id': entry.get('monomerid'),
'pmid': entry.get('pmid')
})
# Sort by affinity (most potent first)
ligands.sort(key=lambda x: float(x['affinity_nM']) if x['affinity_nM'] else float('inf'))
return {
'total_ligands': len(ligands),
'ligands': ligands[:20], # Top 20 most potent
'best_affinity': ligands[0]['affinity_nM'] if ligands else None
}
return {'total_ligands': 0, 'ligands': [], 'note': 'No ligands found in BindingDB'}
def get_ligands_by_structure(tu, pdb_id, affinity_cutoff=10000):
"""Get ligands for a protein by PDB structure ID."""
result = tu.tools.BindingDB_get_ligands_by_pdb(
pdb_ids=pdb_id,
affinity_cutoff=affinity_cutoff,
sequence_identity=100
)
return result
def find_compound_targets(tu, smiles, similarity_cutoff=0.85):
"""Find other targets for a compound (polypharmacology)."""
result = tu.tools.BindingDB_get_targets_by_compound(
smiles=smiles,
similarity_cutoff=similarity_cutoff
)
return resultBindingDB Report Section (add to Section 9 - Druggability):
markdown
undefinedBindingDB提供靶点-配体对的实验结合亲和力数据(Ki、IC50、Kd):
python
def get_bindingdb_ligands(tu, uniprot_id, affinity_cutoff=10000):
"""
从BindingDB获取具有测量结合亲和力的配体。
关键用途:
- 识别化学起始点
- 了解现有化学物质
- 评估小分子成药性
参数:
uniprot_id: UniProt登录号(例如:EGFR的P00533)
affinity_cutoff: 最大亲和力(单位:nM,值越小表示活性越强)
"""
# 按UniProt获取配体
result = tu.tools.BindingDB_get_ligands_by_uniprot(
uniprot=uniprot_id,
affinity_cutoff=affinity_cutoff
)
if result:
ligands = []
for entry in result:
ligands.append({
'smiles': entry.get('smile'),
'affinity_type': entry.get('affinity_type'), # Ki、IC50、Kd
'affinity_nM': entry.get('affinity'),
'monomer_id': entry.get('monomerid'),
'pmid': entry.get('pmid')
})
# 按亲和力排序(活性最强的在前)
ligands.sort(key=lambda x: float(x['affinity_nM']) if x['affinity_nM'] else float('inf'))
return {
'total_ligands': len(ligands),
'ligands': ligands[:20], # 前20个活性最强的配体
'best_affinity': ligands[0]['affinity_nM'] if ligands else None
}
return {'total_ligands': 0, 'ligands': [], 'note': '在BindingDB中未找到配体'}
def get_ligands_by_structure(tu, pdb_id, affinity_cutoff=10000):
"""按PDB结构ID获取蛋白质的配体。"""
result = tu.tools.BindingDB_get_ligands_by_pdb(
pdb_ids=pdb_id,
affinity_cutoff=affinity_cutoff,
sequence_identity=100
)
return result
def find_compound_targets(tu, smiles, similarity_cutoff=0.85):
"""找到化合物的其他靶点(多药理学)。"""
result = tu.tools.BindingDB_get_targets_by_compound(
smiles=smiles,
similarity_cutoff=similarity_cutoff
)
return resultBindingDB报告章节(添加到章节9 - 成药性):
markdown
undefinedKnown Ligands (BindingDB) (NEW)
已知配体(BindingDB)(新增)
Total Ligands with Binding Data: 156
Best Reported Affinity: 0.3 nM (Ki)
具有结合数据的配体总数:156个
最佳报告亲和力:0.3 nM(Ki)
Most Potent Ligands
活性最强的配体
| SMILES | Affinity Type | Value (nM) | Source PMID |
|---|---|---|---|
| CC(=O)Nc1ccc(cc1)c2... | Ki | 0.3 | 15737014 |
| CN(C)C/C=C/C(=O)Nc1... | IC50 | 0.8 | 15896103 |
| COc1cc2ncnc(Nc3ccc... | Kd | 2.1 | 16460808 |
Chemical Tractability Assessment:
- ✅ Tchem-level target: Multiple ligands with <30nM affinity
- ✅ Diverse chemotypes: Multiple scaffolds identified
- ✅ Published literature: Ligands have PMID references
Source: BindingDB via
BindingDB_get_ligands_by_uniprot
**Affinity Interpretation for Druggability**:
| Affinity Range | Interpretation | Drug Development Potential |
|----------------|----------------|---------------------------|
| <1 nM | Ultra-potent | Clinical compound likely |
| 1-10 nM | Highly potent | Drug-like |
| 10-100 nM | Potent | Good starting point |
| 100-1000 nM | Moderate | Needs optimization |
| >1000 nM | Weak | Early hit only || SMILES | 亲和力类型 | 值(nM) | 来源PMID |
|---|---|---|---|
| CC(=O)Nc1ccc(cc1)c2... | Ki | 0.3 | 15737014 |
| CN(C)C/C=C/C(=O)Nc1... | IC50 | 0.8 | 15896103 |
| COc1cc2ncnc(Nc3ccc... | Kd | 2.1 | 16460808 |
化学成药性评估:
- ✅ Tchem级靶点:多个配体亲和力<30nM
- ✅ 多样的化学类型:识别到多个骨架
- ✅ 已发表文献:配体具有PMID参考文献
来源:BindingDB,通过获取
BindingDB_get_ligands_by_uniprot
**亲和力成药性解读**:
| 亲和力范围 | 解读 | 药物开发潜力 |
|----------------|----------------|---------------------------|
| <1 nM | 超活性 | 可能为临床化合物 |
| 1-10 nM | 高活性 | 类药物 |
| 10-100 nM | 活性良好 | 良好的起始点 |
| 100-1000 nM | 中等活性 | 需要优化 |
| >1000 nM | 弱活性 | 仅为早期命中 |7.5 PubChem BioAssay - Screening Data (NEW)
7.5 PubChem生物分析 - 筛选数据(新增)
PubChem BioAssay provides HTS screening data and dose-response curves:
python
def get_pubchem_assays_for_target(tu, gene_symbol):
"""
Get bioassays targeting a gene from PubChem.
Provides:
- HTS screening results
- Dose-response data (IC50/EC50)
- Active compound counts
"""
# Search assays by target gene
assays = tu.tools.PubChem_search_assays_by_target_gene(
gene_symbol=gene_symbol
)
assay_info = []
if assays.get('data', {}).get('aids'):
for aid in assays['data']['aids'][:10]: # Top 10 assays
# Get assay details
summary = tu.tools.PubChem_get_assay_summary(aid=aid)
targets = tu.tools.PubChem_get_assay_targets(aid=aid)
assay_info.append({
'aid': aid,
'summary': summary.get('data', {}),
'targets': targets.get('data', {})
})
return {
'total_assays': len(assays.get('data', {}).get('aids', [])),
'assay_details': assay_info
}
def get_active_compounds_from_assay(tu, aid):
"""Get active compounds from a specific bioassay."""
actives = tu.tools.PubChem_get_assay_active_compounds(aid=aid)
return {
'aid': aid,
'active_cids': actives.get('data', {}).get('cids', []),
'count': len(actives.get('data', {}).get('cids', []))
}PubChem BioAssay Report Section:
markdown
undefinedPubChem生物分析提供HTS筛选数据和剂量反应曲线:
python
def get_pubchem_assays_for_target(tu, gene_symbol):
"""
从PubChem获取针对基因的生物分析数据。
提供:
- HTS筛选结果
- 剂量反应数据(IC50/EC50)
- 活性化合物数量
"""
# 按靶点基因搜索分析
assays = tu.tools.PubChem_search_assays_by_target_gene(
gene_symbol=gene_symbol
)
assay_info = []
if assays.get('data', {}).get('aids'):
for aid in assays['data']['aids'][:10]: # 前10个分析
# 获取分析详情
summary = tu.tools.PubChem_get_assay_summary(aid=aid)
targets = tu.tools.PubChem_get_assay_targets(aid=aid)
assay_info.append({
'aid': aid,
'summary': summary.get('data', {}),
'targets': targets.get('data', {})
})
return {
'total_assays': len(assays.get('data', {}).get('aids', [])),
'assay_details': assay_info
}
def get_active_compounds_from_assay(tu, aid):
"""从特定生物分析中获取活性化合物。"""
actives = tu.tools.PubChem_get_assay_active_compounds(aid=aid)
return {
'aid': aid,
'active_cids': actives.get('data', {}).get('cids', []),
'count': len(actives.get('data', {}).get('cids', []))
}PubChem生物分析报告章节:
markdown
undefinedPubChem BioAssay Data (NEW)
PubChem生物分析数据(新增)
Assays Targeting This Gene: 45
| AID | Assay Type | Active Compounds | Target Info |
|---|---|---|---|
| 1053104 | Dose-response | 12 | EGFR kinase |
| 504526 | HTS | 234 | EGFR binding |
| 651564 | Confirmatory | 8 | EGFR cellular |
Total Active Compounds Across Assays: ~500
Source: PubChem via ,
PubChem_search_assays_by_target_genePubChem_get_assay_active_compounds
---针对该基因的分析数量:45个
| AID | 分析类型 | 活性化合物数量 | 靶点信息 |
|---|---|---|---|
| 1053104 | 剂量反应 | 12 | EGFR激酶 |
| 504526 | HTS | 234 | EGFR结合 |
| 651564 | 确证性分析 | 8 | EGFR细胞水平分析 |
所有分析中的活性化合物总数:约500个
来源:PubChem,通过、获取
PubChem_search_assays_by_target_genePubChem_get_assay_active_compounds
---PATH 8: Literature & Research (Collision-Aware)
路径8:文献与研究(碰撞感知)
Collision-Aware Query Strategy
碰撞感知查询策略
python
def path_literature_collision_aware(tu, ids):
"""
Literature search with collision detection and filtering.
"""
symbol = ids['symbol']
full_name = ids.get('full_name', '')
uniprot = ids['uniprot']
synonyms = ids.get('synonyms', [])
# Step 1: Detect collisions
collision_filter = detect_collisions(tu, symbol, full_name)
# Step 2: Build high-precision seed queries
seed_queries = [
f'"{symbol}"[Title] AND (protein OR gene OR expression)', # Symbol in title
f'"{full_name}"[Title]' if full_name else None, # Full name in title
f'"UniProt:{uniprot}"' if uniprot else None, # UniProt accession
]
seed_queries = [q for q in seed_queries if q]
# Add key synonyms
for syn in synonyms[:3]:
seed_queries.append(f'"{syn}"[Title]')
# Step 3: Execute seed queries and collect PMIDs
seed_pmids = set()
for query in seed_queries:
if collision_filter:
query = f"({query}){collision_filter}"
results = tu.tools.PubMed_search_articles(query=query, limit=30)
for article in results.get('articles', []):
seed_pmids.add(article.get('pmid'))
# Step 4: Expand via citation network (for sparse targets)
if len(seed_pmids) < 30:
expanded_pmids = set()
for pmid in list(seed_pmids)[:10]: # Top 10 seeds
# Get related articles
related = tu.tools.PubMed_get_related(pmid=pmid, limit=20)
for r in related.get('articles', []):
expanded_pmids.add(r.get('pmid'))
# Get citing articles
citing = tu.tools.EuropePMC_get_citations(pmid=pmid, limit=20)
for c in citing.get('citations', []):
expanded_pmids.add(c.get('pmid'))
seed_pmids.update(expanded_pmids)
# Step 5: Classify papers by evidence tier
papers_by_tier = {'T1': [], 'T2': [], 'T3': [], 'T4': []}
# ... classification logic based on title/abstract keywords
return {
'total_papers': len(seed_pmids),
'collision_filter_applied': collision_filter if collision_filter else 'None needed',
'seed_queries': seed_queries,
'papers_by_tier': papers_by_tier
}python
def path_literature_collision_aware(tu, ids):
"""
带碰撞检测和过滤的文献搜索。
"""
symbol = ids['symbol']
full_name = ids.get('full_name', '')
uniprot = ids['uniprot']
synonyms = ids.get('synonyms', [])
# 步骤1:检测冲突
collision_filter = detect_collisions(tu, symbol, full_name)
# 步骤2:构建高精度种子查询
seed_queries = [
f'"{symbol}"[Title] AND (protein OR gene OR expression)', # 标题中的符号
f'"{full_name}"[Title]' if full_name else None, # 标题中的全名
f'"UniProt:{uniprot}"' if uniprot else None, # UniProt登录号
]
seed_queries = [q for q in seed_queries if q]
# 添加关键同义词
for syn in synonyms[:3]:
seed_queries.append(f'"{syn}"[Title]')
# 步骤3:执行种子查询并收集PMID
seed_pmids = set()
for query in seed_queries:
if collision_filter:
query = f"({query}){collision_filter}"
results = tu.tools.PubMed_search_articles(query=query, limit=30)
for article in results.get('articles', []):
seed_pmids.add(article.get('pmid'))
# 步骤4:通过引用网络扩展(针对稀疏靶点)
if len(seed_pmids) < 30:
expanded_pmids = set()
for pmid in list(seed_pmids)[:10]: # 前10个种子
# 获取相关文章
related = tu.tools.PubMed_get_related(pmid=pmid, limit=20)
for r in related.get('articles', []):
expanded_pmids.add(r.get('pmid'))
# 获取引用文章
citing = tu.tools.EuropePMC_get_citations(pmid=pmid, limit=20)
for c in citing.get('citations', []):
expanded_pmids.add(c.get('pmid'))
seed_pmids.update(expanded_pmids)
# 步骤5:按证据层级分类论文
papers_by_tier = {'T1': [], 'T2': [], 'T3': [], 'T4': []}
# ... 基于标题/摘要关键词的分类逻辑
return {
'total_papers': len(seed_pmids),
'collision_filter_applied': collision_filter if collision_filter else '无需过滤',
'seed_queries': seed_queries,
'papers_by_tier': papers_by_tier
}Retry Logic & Fallback Chains
重试逻辑与备选流程
Retry Policy
重试策略
For each critical tool, implement retry with exponential backoff:
python
def call_with_retry(tu, tool_name, params, max_retries=3):
"""
Call tool with retry logic.
"""
for attempt in range(max_retries):
try:
result = getattr(tu.tools, tool_name)(**params)
if result and not result.get('error'):
return result
except Exception as e:
if attempt < max_retries - 1:
time.sleep(2 ** attempt) # Exponential backoff
else:
return {'error': str(e), 'tool': tool_name, 'attempts': max_retries}
return None针对每个关键工具,实现带指数退避的重试:
python
def call_with_retry(tu, tool_name, params, max_retries=3):
"""
带重试逻辑的工具调用。
"""
for attempt in range(max_retries):
try:
result = getattr(tu.tools, tool_name)(**params)
if result and not result.get('error'):
return result
except Exception as e:
if attempt < max_retries - 1:
time.sleep(2 ** attempt) # 指数退避
else:
return {'error': str(e), 'tool': tool_name, 'attempts': max_retries}
return NoneFallback Chains (CRITICAL)
备选流程(关键)
| Primary Tool | Fallback 1 | Fallback 2 | Failure Action |
|---|---|---|---|
| | | Note in report |
| | | Note in report |
| | | Note in report |
| | Note as unavailable | Document in report |
| | - | Note in report |
| | | Note in report |
| 主工具 | 备选1 | 备选2 | 失败操作 |
|---|---|---|---|
| | | 在报告中注明 |
| | | 在报告中注明 |
| | | 在报告中注明 |
| | - | 在报告中注明不可用 |
| | - | 在报告中注明 |
| | | 在报告中注明 |
Failure Surfacing Rule
失败披露规则
NEVER silently skip failed tools. Always document:
markdown
undefined永远不要静默跳过失败的工具。始终记录:
markdown
undefined7.1 Tissue Expression
7.1组织表达
GTEx Data: Unavailable (API timeout after 3 attempts)
Fallback Data (HPA):
| Tissue | Expression Level | Specificity |
|---|---|---|
| Liver | High | Enhanced |
| Kidney | Medium | - |
Note: For complete GTEx data, query directly at gtexportal.org
---GTEx数据:不可用(3次尝试后API超时)
备选数据(HPA):
| 组织 | 表达水平 | 特异性 |
|---|---|---|
| 肝 | 高 | 增强 |
| 肾 | 中 | - |
注意:如需完整GTEx数据,请直接在gtexportal.org查询
---Per-Section Data Minimums & Completeness Audit
按章节数据最小值与完整性审计
Minimum Data Requirements (Enforced)
最低数据要求(强制执行)
| Section | Minimum Data | If Not Met |
|---|---|---|
| 6. PPIs | ≥20 interactors | Document which tools failed + why |
| 7. Expression | Top 10 tissues with TPM + HPA RNA summary | Note "limited data" with specific gaps |
| 8. Disease | Top 10 OT diseases + gnomAD constraints + ClinVar summary | Separate SNV/CNV; note if constraint unavailable |
| 9. Druggability | OT tractability + probes + drugs + DGIdb + GtoPdb fallback | "No drugs/probes" is valid data |
| 11. Literature | Total count + 5-year trend + 3-5 key papers with evidence tiers | Note if sparse (<50 papers) |
| 章节 | 最低数据要求 | 未满足时的操作 |
|---|---|---|
| 6.蛋白质相互作用 | ≥20个相互作用蛋白 | 记录哪些工具失败及原因 |
| 7.表达 | 前10个带TPM值的组织 + HPA RNA摘要 | 注明“数据有限”及具体缺口 |
| 8.疾病 | 前10个Open Targets疾病 + gnomAD约束 + ClinVar摘要 | 分离SNV/CNV;如约束不可用则注明 |
| 9.成药性 | Open Targets成药性 + 探针 + 药物 + DGIdb + GtoPdb备选 | “无药物/探针”属于有效数据 |
| 11.文献 | 总数 + 5年趋势 + 3-5篇带证据层级的关键论文 | 如文献稀疏(<50篇)则注明 |
Post-Run Completeness Audit
运行后完整性审计
Before finalizing the report, run this checklist:
markdown
undefined在最终确定报告前,运行以下检查清单:
markdown
undefinedCompleteness Audit (REQUIRED)
完整性审计(必填)
Data Minimums Check
数据最小值检查
- PPIs: ≥20 interactors OR explanation why fewer
- Expression: Top 10 tissues with values OR explicit "unavailable"
- Diseases: Top 10 associations with scores OR "no associations"
- Constraints: All 4 scores (pLI, LOEUF, missense Z, pRec) OR "unavailable"
- Druggability: All modalities assessed; probes + drugs listed OR "none"
- 蛋白质相互作用:≥20个相互作用蛋白或解释原因
- 表达:前10个带数值的组织或明确标注“不可用”
- 疾病:前10个带评分的关联或“无关联”
- 约束:所有4个评分(pLI、LOEUF、错义Z、pRec)或“不可用”
- 成药性:评估所有模态;列出探针+药物或“无”
Negative Results Documented
阴性结果记录
- Empty tool results noted explicitly (not left blank)
- Failed tools with fallbacks documented
- "No data" sections have implications noted
- 空工具结果已明确注明(未留空)
- 失败工具及备选方案已记录
- “无数据”章节已注明影响
Evidence Quality
证据质量
- T1-T4 grades in Executive Summary disease claims
- T1-T4 grades in Disease Associations table
- Key papers table has evidence tiers
- Per-section evidence summaries included
- 执行摘要中的疾病结论带有T1-T4分级
- 疾病关联表格带有T1-T4分级
- 关键论文表格带有证据层级
- 包含按章节的证据摘要
Source Attribution
来源标注
- Every data point has source tool/database cited
- Section-end source summaries present
undefined- 每个数据点都标注了来源工具/数据库
- 章节末尾有来源摘要
undefinedData Gap Table (Required if minimums not met)
数据缺口表格(未满足最小值时必填)
markdown
undefinedmarkdown
undefined15. Data Gaps & Limitations
15.数据缺口与局限性
| Section | Expected Data | Actual | Reason | Alternative Source |
|---|---|---|---|---|
| 6. PPIs | ≥20 interactors | 8 | Novel target, limited studies | Literature review needed |
| 7. Expression | GTEx TPM | None | Versioned ID not recognized | See HPA data |
| 9. Probes | Chemical probes | None | No validated probes exist | Consider tool compound dev |
Recommendations for Data Gaps:
- For PPIs: Query BioGRID with broader parameters; check yeast-2-hybrid studies
- For Expression: Query GEO directly for tissue-specific datasets
---| 章节 | 预期数据 | 实际数据 | 原因 | 替代来源 |
|---|---|---|---|---|
| 6.蛋白质相互作用 | ≥20个相互作用蛋白 | 8个 | 新型靶点,研究有限 | 需要文献综述 |
| 7.表达 | GTEx TPM | 无 | 版本化ID未被识别 | 参见HPA数据 |
| 9.探针 | 化学探针 | 无 | 无经过验证的探针 | 考虑开发工具化合物 |
数据缺口建议:
- 针对蛋白质相互作用:使用更广泛的参数查询BioGRID;检查酵母双杂交研究
- 针对表达:直接查询GEO获取组织特异性数据集
---Report Template (Initial File)
报告模板(初始文件)
File:
[TARGET]_target_report.mdmarkdown
undefined文件:
[TARGET]_target_report.mdmarkdown
undefinedTarget Intelligence Report: [TARGET NAME]
靶点情报报告:[靶点名称]
Generated: [Date] | Query: [Original query] | Status: In Progress
生成时间:[日期] | 查询内容:[原始查询] | 状态:研究中
1. Executive Summary
1.执行摘要
[Researching...]
<!-- REQUIRED: 2-3 sentences, disease claims must have T1-T4 grades -->[研究中...]
<!-- 必填:2-3句话,疾病结论必须带有T1-T4分级 -->2. Target Identifiers
2.靶点标识符
[Researching...]
<!-- REQUIRED: UniProt, Ensembl (versioned), Entrez, ChEMBL, HGNC, Symbol -->[研究中...]
<!-- 必填:UniProt、Ensembl(带版本)、Entrez、ChEMBL、HGNC、符号 -->3. Basic Information
3.基本信息
3.1 Protein Description
3.1蛋白质描述
[Researching...]
[研究中...]
3.2 Protein Function
3.2蛋白质功能
[Researching...]
[研究中...]
3.3 Subcellular Localization
3.3亚细胞定位
[Researching...]
[研究中...]
4. Structural Biology
4.结构生物学
4.1 Experimental Structures (PDB)
4.1实验结构(PDB)
[Researching...]
<!-- METHOD: 3-step chain (UniProt xrefs → sequence search → domain search) -->[研究中...]
<!-- 方法:三步流程(UniProt交叉引用→序列搜索→结构域搜索) -->4.2 AlphaFold Prediction
4.2 AlphaFold预测
[Researching...]
[研究中...]
4.3 Domain Architecture
4.3结构域架构
[Researching...]
[研究中...]
4.4 Key Structural Features
4.4关键结构特征
[Researching...]
[研究中...]
5. Function & Pathways
5.功能与通路
5.1 Gene Ontology Annotations
5.1基因本体注释
[Researching...]
<!-- REQUIRED: Evidence codes mapped to T1-T4 -->[研究中...]
<!-- 必填:证据代码映射到T1-T4 -->5.2 Pathway Involvement
5.2通路参与
[Researching...]
[研究中...]
6. Protein-Protein Interactions
6.蛋白质-蛋白质相互作用
[Researching...]
<!-- MINIMUM: ≥20 interactors OR explanation -->[研究中...]
<!-- 最小值:≥20个相互作用蛋白或解释原因 -->7. Expression Profile
7.表达谱
7.1 Tissue Expression (GTEx/HPA)
7.1组织表达(GTEx/HPA)
[Researching...]
<!-- NOTE: Use versioned Ensembl ID for GTEx if needed -->[研究中...]
<!-- 注意:如需GTEx数据,尝试使用版本化Ensembl ID -->7.2 Tissue Specificity
7.2组织特异性
[Researching...]
<!-- MINIMUM: Top 10 tissues with TPM values -->[研究中...]
<!-- 最小值:前10个带TPM值的组织 -->8. Genetic Variation & Disease
8.遗传变异与疾病
8.1 Constraint Scores
8.1约束评分
[Researching...]
<!-- REQUIRED: pLI, LOEUF, missense Z, pRec with interpretations -->[研究中...]
<!-- 必填:pLI、LOEUF、错义Z、pRec及解读 -->8.2 Disease Associations
8.2疾病关联
[Researching...]
<!-- REQUIRED: Top 10 with OT scores; T1-T4 evidence grades -->[研究中...]
<!-- 必填:前10个带Open Targets评分的关联;T1-T4证据分级 -->8.3 Clinical Variants (ClinVar)
8.3临床变异(ClinVar)
[Researching...]
<!-- REQUIRED: Separate SNV and CNV tables -->[研究中...]
<!-- 必填:分离SNV和CNV表格 -->8.4 Mouse Model Phenotypes
8.4小鼠模型表型
[Researching...]
[研究中...]
9. Druggability & Pharmacology
9.成药性与药理学
9.1 Tractability Assessment
9.1成药性评估
[Researching...]
<!-- REQUIRED: All modalities (SM, Ab, PROTAC, other) -->[研究中...]
<!-- 必填:所有模态(小分子、抗体、PROTAC、其他) -->9.2 Known Drugs
9.2已知药物
[Researching...]
[研究中...]
9.3 Chemical Probes
9.3化学探针
[Researching...]
<!-- NOTE: "No probes" is valid data - document explicitly -->[研究中...]
<!-- 注意:“无探针”属于有效数据 - 明确记录 -->9.4 Clinical Pipeline
9.4临床管线
[Researching...]
[研究中...]
9.5 ChEMBL Bioactivity
9.5 ChEMBL生物活性
[Researching...]
[研究中...]
10. Safety Profile
10.安全性概况
10.1 Safety Liabilities
10.1安全性风险
[Researching...]
[研究中...]
10.2 Expression-Based Toxicity Risk
10.2基于表达的毒性风险
[Researching...]
[研究中...]
10.3 Mouse KO Phenotypes
10.3小鼠敲除表型
[Researching...]
[研究中...]
11. Literature & Research Landscape
11.文献与研究态势
11.1 Publication Metrics
11.1出版物指标
[Researching...]
<!-- REQUIRED: Total, 5y, 1y, drug-related, clinical -->[研究中...]
<!-- 必填:总数、5年、1年、药物相关、临床相关 -->11.2 Research Trend
11.2研究趋势
[Researching...]
[研究中...]
11.3 Key Publications
11.3关键出版物
[Researching...]
<!-- REQUIRED: Table with PMID, title, year, evidence tier -->[研究中...]
<!-- 必填:带PMID、标题、年份、证据层级的表格 -->11.4 Evidence Summary by Theme
11.4按主题的证据摘要
[Researching...]
<!-- REQUIRED: T1-T4 breakdown per research theme -->[研究中...]
<!-- 必填:按研究主题的T1-T4细分 -->12. Competitive Landscape
12.竞争态势
[Researching...]
[研究中...]
13. Summary & Recommendations
13.总结与建议
13.1 Target Validation Scorecard
13.1靶点验证评分卡
[Researching...]
<!-- REQUIRED: 6 criteria, 1-5 scores, evidence quality noted -->[研究中...]
<!-- 必填:6个标准、1-5分、注明证据质量 -->13.2 Strengths
13.2优势
[Researching...]
[研究中...]
13.3 Challenges & Risks
13.3挑战与风险
[Researching...]
[研究中...]
13.4 Recommendations
13.4建议
[Researching...]
<!-- REQUIRED: ≥3 prioritized (HIGH/MEDIUM/LOW) -->[研究中...]
<!-- 必填:≥3个优先级(高/中/低) -->14. Data Sources & Methodology
14.数据来源与方法
[Will be populated as research progresses...]
[将随着研究进展填充...]
15. Data Gaps & Limitations
15.数据缺口与局限性
[To be populated post-audit...]
---[审计后填充...]
---Quick Reference: Tool Parameters
快速参考:工具参数
| Tool | Parameter | Notes |
|---|---|---|
| | NOT |
| | NOT |
| | Try versioned ID if empty |
| | camelCase, not |
| | List format for IDs |
| | UniProt accession |
| 工具 | 参数 | 注意事项 |
|---|---|---|
| | 不是 |
| | 不是 |
| | 如返回空结果,尝试版本化ID |
| | 小驼峰命名,不是 |
| | ID为列表格式 |
| | UniProt登录号 |
When NOT to Use This Skill
何时不使用该技能
- Simple protein lookup → Use directly
UniProt_get_entry_by_accession - Drug information only → Use drug-focused tools
- Disease-centric query → Use disease-intelligence-gatherer skill
- Sequence retrieval → Use sequence-retrieval skill
- Structure download → Use protein-structure-retrieval skill
Use this skill for comprehensive, multi-angle target analysis with guaranteed data completeness.
- 简单蛋白质查询 → 直接使用
UniProt_get_entry_by_accession - 仅需药物信息 → 使用药物专用工具
- 疾病中心型查询 → 使用疾病情报收集技能
- 序列检索 → 使用序列检索技能
- 结构下载 → 使用蛋白质结构检索技能
当需要全面、多角度的靶点分析并保证数据完整性时,使用该技能。