tooluniverse-target-research

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Comprehensive Target Intelligence Gatherer

全面靶点情报收集工具

Gather complete target intelligence by exploring 9 parallel research paths. Supports targets identified by gene symbol, UniProt accession, Ensembl ID, or gene name.
KEY PRINCIPLES:
  1. Report-first approach - Create report file FIRST, then populate progressively
  2. Tool parameter verification - Verify params via
    get_tool_info
    before calling unfamiliar tools
  3. Evidence grading - Grade all claims by evidence strength (T1-T4)
  4. Citation requirements - Every fact must have inline source attribution
  5. Mandatory completeness - All sections must exist with data minimums or explicit "No data" notes
  6. Disambiguation first - Resolve all identifiers before research
  7. Negative results documented - "No drugs found" is data; empty sections are failures
  8. Collision-aware literature search - Detect and filter naming collisions
  9. English-first queries - Always use English terms in tool calls, even if the user writes in another language. Translate gene names, disease names, and search terms to English. Only try original-language terms as a fallback if English returns no results. Respond in the user's language

通过探索9条并行研究路径,收集完整的靶点情报。支持通过基因符号、UniProt登录号、Ensembl ID或基因名称识别靶点。
核心原则:
  1. 先报告后执行 - 先创建报告文件,再逐步填充内容
  2. 工具参数验证 - 在调用不熟悉的工具前,通过
    get_tool_info
    验证参数
  3. 证据分级 - 按证据强度(T1-T4)为所有结论分级
  4. 引用要求 - 每个事实必须附带内联来源标注
  5. 强制完整性 - 所有章节必须存在,要么包含最低要求的数据,要么明确标注“无数据”
  6. 先消歧再研究 - 在开展研究前解析所有标识符
  7. 记录阴性结果 - “未找到药物”属于有效数据;空白章节视为失败
  8. 碰撞感知文献搜索 - 检测并过滤命名冲突
  9. 优先英文查询 - 即使用户使用其他语言,工具调用也始终使用英文术语。将基因名称、疾病名称和搜索术语翻译成英文。仅当英文查询无结果时,才尝试使用原语言术语作为备选。最终以用户使用的语言回复

Phase 0: Tool Parameter Verification (CRITICAL)

阶段0:工具参数验证(关键步骤)

BEFORE calling ANY tool for the first time, verify its parameters:
python
undefined
在首次调用任何工具之前,必须验证其参数:
python
undefined

Always check tool params to prevent silent failures

始终检查工具参数以避免静默失败

tool_info = tu.tools.get_tool_info(tool_name="Reactome_map_uniprot_to_pathways")
tool_info = tu.tools.get_tool_info(tool_name="Reactome_map_uniprot_to_pathways")

Reveals: takes
id
not
uniprot_id

结果显示:该工具接受
id
参数,而非
uniprot_id

undefined
undefined

Known Parameter Corrections (Updated)

已知参数修正(更新版)

ToolWRONG ParameterCORRECT Parameter
Reactome_map_uniprot_to_pathways
uniprot_id
id
ensembl_get_xrefs
gene_id
id
GTEx_get_median_gene_expression
gencode_id
only
gencode_id
+
operation="median"
OpenTargets_*
ensemblID
ensemblId
(camelCase)
工具错误参数正确参数
Reactome_map_uniprot_to_pathways
uniprot_id
id
ensembl_get_xrefs
gene_id
id
GTEx_get_median_gene_expression
gencode_id
gencode_id
+
operation="median"
OpenTargets_*
ensemblID
ensemblId
(小驼峰命名)

GTEx Versioned ID Fallback (CRITICAL)

GTEx版本化ID备选方案(关键步骤)

GTEx often requires versioned Ensembl IDs. If
ENSG00000123456
returns empty:
python
undefined
GTEx通常需要带版本的Ensembl ID。如果
ENSG00000123456
返回空结果:
python
undefined

Step 1: Get gene info with version

步骤1:获取带版本的基因信息

gene_info = tu.tools.ensembl_lookup_gene(id=ensembl_id, species="human") version = gene_info.get('version', 1)
gene_info = tu.tools.ensembl_lookup_gene(id=ensembl_id, species="human") version = gene_info.get('version', 1)

Step 2: Try versioned ID

步骤2:尝试使用版本化ID

versioned_id = f"{ensembl_id}.{version}" # e.g., "ENSG00000123456.12" result = tu.tools.GTEx_get_median_gene_expression( gencode_id=versioned_id, operation="median" )

---
versioned_id = f"{ensembl_id}.{version}" # 示例:"ENSG00000123456.12" result = tu.tools.GTEx_get_median_gene_expression( gencode_id=versioned_id, operation="median" )

---

When to Use This Skill

何时使用该技能

Apply when users:
  • Ask about a drug target, protein, or gene
  • Need target validation or assessment
  • Request druggability analysis
  • Want comprehensive target profiling
  • Ask "what do we know about [target]?"
  • Need target-disease associations
  • Request safety profile for a target

当用户有以下需求时适用:
  • 询问药物靶点、蛋白质或基因相关信息
  • 需要靶点验证或评估
  • 请求成药性分析
  • 想要全面的靶点分析
  • 询问“关于[靶点]我们了解哪些信息?”
  • 需要靶点-疾病关联数据
  • 请求靶点的安全性分析

Critical Workflow Requirements

关键工作流要求

1. Report-First Approach (MANDATORY)

1. 先报告后执行(强制要求)

DO NOT show the search process or tool outputs to the user. Instead:
  1. Create the report file FIRST - Before any data collection:
    • File name:
      [TARGET]_target_report.md
    • Initialize with all 14 section headers
    • Add placeholder:
      [Researching...]
      in each section
  2. Progressively update the report - As you gather data:
    • Update each section immediately after retrieving data
    • Replace
      [Researching...]
      with actual content
    • Include "No data returned" when tools return empty results
  3. Methodology in appendix only - If user requests methodology details, create separate
    [TARGET]_methods_appendix.md
禁止向用户展示搜索过程或工具输出。正确流程如下:
  1. 先创建报告文件 - 在收集任何数据之前:
    • 文件名:
      [TARGET]_target_report.md
    • 初始化所有14个章节标题
    • 在每个章节中添加占位符:
      [研究中...]
  2. 逐步更新报告 - 收集到数据后立即更新:
    • 获取数据后立即更新对应章节
    • [研究中...]
      替换为实际内容
    • 当工具返回空结果时,标注“未返回数据”
  3. 方法论仅放在附录 - 如果用户请求方法论细节,创建单独的
    [TARGET]_methods_appendix.md
    文件

2. Evidence Grading System (MANDATORY)

2. 证据分级系统(强制要求)

CRITICAL: Grade every claim by evidence strength.
关键:为每个结论按证据强度分级。

Evidence Tiers

证据层级

TierSymbolCriteriaExamples
T1★★★Direct mechanistic evidence, human genetic proofCRISPR KO, patient mutations, crystal structure with mechanism
T2★★☆Functional studies, model organism validationsiRNA phenotype, mouse KO, biochemical assay
T3★☆☆Association, screen hits, computationalGWAS hit, DepMap essentiality, expression correlation
T4☆☆☆Mention, review, text-mined, predictedReview article, database annotation, computational prediction
层级符号标准示例
T1★★★直接机制证据、人类遗传学证明CRISPR敲除、患者突变、带机制解析的晶体结构
T2★★☆功能研究、模式生物验证siRNA表型、小鼠敲除、生化分析
T3★☆☆关联、筛选命中、计算预测GWAS命中、DepMap必需性、表达相关性
T4☆☆☆提及、综述、文本挖掘、预测综述文章、数据库注释、计算预测

Required Evidence Grading Locations

证据分级必填位置

Evidence grades MUST appear in:
  1. Executive Summary - Key disease claims graded
  2. Section 8.2 Disease Associations - Every disease link graded with source type
  3. Section 11 Literature - Key papers table with evidence tier
  4. Section 13 Recommendations - Scorecard items reference evidence quality
证据等级必须出现在:
  1. 执行摘要 - 关键疾病结论需分级
  2. 8.2疾病关联 - 每个疾病关联需标注来源类型和分级
  3. 11.文献 - 关键论文表格需包含证据层级
  4. 13.建议 - 评分卡条目需引用证据质量

Per-Section Evidence Summary

按章节的证据摘要

markdown
---
**Evidence Quality for this Section**: Strong
- Mechanistic (T1): 12 papers
- Functional (T2): 8 papers
- Association (T3): 15 papers
- Mention (T4): 23 papers
**Data Gaps**: No CRISPR data; mouse KO phenotypes limited
---
markdown
---
**本章证据质量**:强
- 机制证据(T1):12篇论文
- 功能证据(T2):8篇论文
- 关联证据(T3):15篇论文
- 提及证据(T4):23篇论文
**数据缺口**:无CRISPR数据;小鼠敲除表型数据有限
---

3. Citation Requirements (MANDATORY)

3. 引用要求(强制要求)

Every piece of information MUST include its source:
markdown
EGFR mutations cause lung adenocarcinoma [★★★: PMID:15118125, activating mutations 
in patients]. *Source: ClinVar, CIViC*

每条信息必须包含来源:
markdown
EGFR突变导致肺腺癌 [★★★: PMID:15118125, 患者体内的激活突变]。*来源:ClinVar, CIViC*

Core Strategy: 9 Research Paths

核心策略:9条研究路径

Execute 9 research paths (Path 0 is always first):
Target Query (e.g., "EGFR" or "P00533")
├─ IDENTIFIER RESOLUTION (always first)
│   └─ Check if GPCR → GPCRdb_get_protein
├─ PATH 0: Open Targets Foundation (ALWAYS FIRST - fills gaps in all other paths)
├─ PATH 1: Core Identity (names, IDs, sequence, organism)
│   └─ InterProScan_scan_sequence for novel domain prediction (NEW)
├─ PATH 2: Structure & Domains (3D structure, domains, binding sites)
│   └─ If GPCR: GPCRdb_get_structures (active/inactive states)
├─ PATH 3: Function & Pathways (GO terms, pathways, biological role)
├─ PATH 4: Protein Interactions (PPI network, complexes)
├─ PATH 5: Expression Profile (tissue expression, single-cell)
├─ PATH 6: Variants & Disease (mutations, clinical significance)
│   └─ DisGeNET_search_gene for curated gene-disease associations
├─ PATH 7: Drug Interactions (known drugs, druggability, safety)
│   ├─ Pharos_get_target for TDL classification (Tclin/Tchem/Tbio/Tdark)
│   ├─ BindingDB_get_ligands_by_uniprot for known ligands (NEW)
│   ├─ PubChem_search_assays_by_target_gene for HTS data (NEW)
│   ├─ If GPCR: GPCRdb_get_ligands (curated agonists/antagonists)
│   └─ DepMap_get_gene_dependencies for target essentiality
└─ PATH 8: Literature & Research (publications, trends)

执行9条研究路径(路径0始终优先执行):
靶点查询(例如:"EGFR"或"P00533")
├─ 标识符解析(始终第一步)
│   └─ 检查是否为GPCR → 调用GPCRdb_get_protein
├─ 路径0:Open Targets基础数据(始终优先执行 - 填补其他路径的缺口)
├─ 路径1:核心身份(名称、ID、序列、物种)
│   └─ 调用InterProScan_scan_sequence进行新结构域预测(新增)
├─ 路径2:结构与结构域(3D结构、结构域、结合位点)
│   └─ 如果是GPCR:调用GPCRdb_get_structures获取激活/非激活状态结构
├─ 路径3:功能与通路(GO术语、通路、生物学角色)
├─ 路径4:蛋白质相互作用(PPI网络、复合物)
├─ 路径5:表达谱(组织表达、单细胞表达)
├─ 路径6:变异与疾病(突变、临床意义)
│   └─ 调用DisGeNET_search_gene获取 curated 基因-疾病关联
├─ 路径7:药物相互作用(已知药物、成药性、安全性)
│   ├─ 调用Pharos_get_target获取TDL分类(Tclin/Tchem/Tbio/Tdark)
│   ├─ 调用BindingDB_get_ligands_by_uniprot获取已知配体(新增)
│   ├─ 调用PubChem_search_assays_by_target_gene获取HTS数据(新增)
│   ├─ 如果是GPCR:调用GPCRdb_get_ligands获取 curated 激动剂/拮抗剂
│   └─ 调用DepMap_get_gene_dependencies获取靶点必需性
└─ 路径8:文献与研究(出版物、趋势)

Identifier Resolution (Phase 1)

标识符解析(阶段1)

CRITICAL: Resolve ALL identifiers before any research path.
python
def resolve_target_ids(tu, query):
    """
    Resolve target query to ALL needed identifiers.
    Returns dict with: query, uniprot, ensembl, ensembl_version, symbol, 
    entrez, chembl_target, hgnc
    """
    ids = {
        'query': query, 
        'uniprot': None, 
        'ensembl': None, 
        'ensembl_versioned': None,  # For GTEx
        'symbol': None,
        'entrez': None,
        'chembl_target': None,
        'hgnc': None,
        'full_name': None,
        'synonyms': []
    }
    
    # [Resolution logic based on input type]
    # ... (see current implementation)
    
    # CRITICAL: Get versioned Ensembl ID for GTEx
    if ids['ensembl']:
        gene_info = tu.tools.ensembl_lookup_gene(id=ids['ensembl'], species="human")
        if gene_info and gene_info.get('version'):
            ids['ensembl_versioned'] = f"{ids['ensembl']}.{gene_info['version']}"
        
        # Also get synonyms for literature collision detection
        ids['full_name'] = gene_info.get('description', '').split(' [')[0]
    
    # Get UniProt alternative names for synonyms
    if ids['uniprot']:
        alt_names = tu.tools.UniProt_get_alternative_names_by_accession(accession=ids['uniprot'])
        if alt_names:
            ids['synonyms'].extend(alt_names)
    
    return ids
关键:在开展任何研究路径之前,解析所有标识符。
python
def resolve_target_ids(tu, query):
    """
    将靶点查询解析为所有所需标识符。
    返回包含以下字段的字典:query, uniprot, ensembl, ensembl_version, symbol, 
    entrez, chembl_target, hgnc
    """
    ids = {
        'query': query, 
        'uniprot': None, 
        'ensembl': None, 
        'ensembl_versioned': None,  # 用于GTEx
        'symbol': None,
        'entrez': None,
        'chembl_target': None,
        'hgnc': None,
        'full_name': None,
        'synonyms': []
    }
    
    # [基于输入类型的解析逻辑]
    # ...(参见当前实现)
    
    # 关键:获取带版本的Ensembl ID用于GTEx
    if ids['ensembl']:
        gene_info = tu.tools.ensembl_lookup_gene(id=ids['ensembl'], species="human")
        if gene_info and gene_info.get('version'):
            ids['ensembl_versioned'] = f"{ids['ensembl']}.{gene_info['version']}"
        
        # 同时获取同义词用于文献碰撞检测
        ids['full_name'] = gene_info.get('description', '').split(' [')[0]
    
    # 获取UniProt别名作为同义词
    if ids['uniprot']:
        alt_names = tu.tools.UniProt_get_alternative_names_by_accession(accession=ids['uniprot'])
        if alt_names:
            ids['synonyms'].extend(alt_names)
    
    return ids

GPCR Target Detection (NEW)

GPCR靶点检测(新增)

~35% of approved drugs target GPCRs. After identifier resolution, check if target is a GPCR:
python
def check_gpcr_target(tu, ids):
    """
    Check if target is a GPCR and retrieve specialized data.
    Call after identifier resolution.
    """
    symbol = ids.get('symbol', '')
    
    # Build GPCRdb entry name
    entry_name = f"{symbol.lower()}_human"
    
    gpcr_info = tu.tools.GPCRdb_get_protein(
        operation="get_protein",
        protein=entry_name
    )
    
    if gpcr_info.get('status') == 'success':
        # Target is a GPCR - get specialized data
        
        # Get structures with receptor state
        structures = tu.tools.GPCRdb_get_structures(
            operation="get_structures",
            protein=entry_name
        )
        
        # Get known ligands (critical for binder projects)
        ligands = tu.tools.GPCRdb_get_ligands(
            operation="get_ligands",
            protein=entry_name
        )
        
        # Get mutation data
        mutations = tu.tools.GPCRdb_get_mutations(
            operation="get_mutations",
            protein=entry_name
        )
        
        return {
            'is_gpcr': True,
            'gpcr_family': gpcr_info['data'].get('family'),
            'gpcr_class': gpcr_info['data'].get('receptor_class'),
            'structures': structures.get('data', {}).get('structures', []),
            'ligands': ligands.get('data', {}).get('ligands', []),
            'mutations': mutations.get('data', {}).get('mutations', []),
            'ballesteros_numbering': True  # GPCRdb provides this
        }
    
    return {'is_gpcr': False}
GPCRdb Report Section (add to Section 2 for GPCR targets):
markdown
undefined
约35%的获批药物靶点为GPCR。完成标识符解析后,检查靶点是否为GPCR:
python
def check_gpcr_target(tu, ids):
    """
    检查靶点是否为GPCR并获取专用数据。
    在标识符解析后调用。
    """
    symbol = ids.get('symbol', '')
    
    # 构建GPCRdb条目名称
    entry_name = f"{symbol.lower()}_human"
    
    gpcr_info = tu.tools.GPCRdb_get_protein(
        operation="get_protein",
        protein=entry_name
    )
    
    if gpcr_info.get('status') == 'success':
        # 靶点为GPCR - 获取专用数据
        
        # 获取带受体状态的结构
        structures = tu.tools.GPCRdb_get_structures(
            operation="get_structures",
            protein=entry_name
        )
        
        # 获取已知配体(对结合物项目至关重要)
        ligands = tu.tools.GPCRdb_get_ligands(
            operation="get_ligands",
            protein=entry_name
        )
        
        # 获取突变数据
        mutations = tu.tools.GPCRdb_get_mutations(
            operation="get_mutations",
            protein=entry_name
        )
        
        return {
            'is_gpcr': True,
            'gpcr_family': gpcr_info['data'].get('family'),
            'gpcr_class': gpcr_info['data'].get('receptor_class'),
            'structures': structures.get('data', {}).get('structures', []),
            'ligands': ligands.get('data', {}).get('ligands', []),
            'mutations': mutations.get('data', {}).get('mutations', []),
            'ballesteros_numbering': True  # GPCRdb提供该编号
        }
    
    return {'is_gpcr': False}
GPCRdb报告章节(为GPCR靶点添加到章节2):
markdown
undefined

2.x GPCR-Specific Data (GPCRdb)

2.x GPCR专用数据(GPCRdb)

Receptor Class: Class A (Rhodopsin-like)
GPCR Family: Adrenoceptors
Structures by State:
PDB IDStateResolutionLigandYear
3SN6Active3.2ÅAgonist (BI-167107)2011
2RH1Inactive2.4ÅAntagonist (carazolol)2007
Known Ligands: 45 agonists, 32 antagonists, 8 allosteric modulators
Key Binding Site Residues (Ballesteros-Weinstein): 3.32, 5.42, 6.48, 7.39
undefined
受体类别:A类(视紫红质样)
GPCR家族:肾上腺素能受体
按状态分类的结构:
PDB ID状态分辨率配体年份
3SN6激活态3.2Å激动剂(BI-167107)2011
2RH1非激活态2.4Å拮抗剂(卡拉洛尔)2007
已知配体:45种激动剂、32种拮抗剂、8种变构调节剂
关键结合位点残基(Ballesteros-Weinstein编号):3.32, 5.42, 6.48, 7.39
undefined

Collision Detection for Literature Search

文献搜索的碰撞检测

Before literature search, detect naming collisions:
python
def detect_collisions(tu, symbol, full_name):
    """
    Detect if gene symbol has naming collisions in literature.
    Returns negative filter terms if collisions found.
    """
    # Search by symbol in title
    results = tu.tools.PubMed_search_articles(
        query=f'"{symbol}"[Title]',
        limit=20
    )
    
    # Check if >20% are off-topic
    off_topic_terms = []
    for paper in results.get('articles', []):
        title = paper.get('title', '').lower()
        # Check if title mentions biology/protein/gene context
        bio_terms = ['protein', 'gene', 'cell', 'expression', 'mutation', 'kinase', 'receptor']
        if not any(term in title for term in bio_terms):
            # Extract potential collision terms
            # e.g., "JAK" might collide with "Just Another Kinase" jokes
            # e.g., "WDR7" might collide with other WDR family members in certain contexts
            pass
    
    # Build negative filter
    collision_filter = ""
    if off_topic_terms:
        collision_filter = " NOT " + " NOT ".join(off_topic_terms)
    
    return collision_filter

在文献搜索前,检测命名冲突:
python
def detect_collisions(tu, symbol, full_name):
    """
    检测基因符号在文献中是否存在命名冲突。
    如果发现冲突,返回负面过滤术语。
    """
    # 按标题中的符号搜索
    results = tu.tools.PubMed_search_articles(
        query=f'"{symbol}"[Title]',
        limit=20
    )
    
    # 检查是否超过20%的结果偏离主题
    off_topic_terms = []
    for paper in results.get('articles', []):
        title = paper.get('title', '').lower()
        # 检查标题是否提及生物学/蛋白质/基因相关语境
        bio_terms = ['protein', 'gene', 'cell', 'expression', 'mutation', 'kinase', 'receptor']
        if not any(term in title for term in bio_terms):
            # 提取潜在冲突术语
            # 例如:"JAK"可能与"Just Another Kinase"玩笑冲突
            # 例如:"WDR7"在某些语境下可能与其他WDR家族成员冲突
            pass
    
    # 构建负面过滤器
    collision_filter = ""
    if off_topic_terms:
        collision_filter = " NOT " + " NOT ".join(off_topic_terms)
    
    return collision_filter

PATH 0: Open Targets Foundation (ALWAYS FIRST)

路径0:Open Targets基础数据(始终优先执行)

Objective: Populate baseline data for Sections 5, 8, 9, 10, 11 before specialized queries.
CRITICAL: Open Targets provides the most comprehensive aggregated data. Query ALL these endpoints:
EndpointSectionData Type
OpenTargets_get_diseases_phenotypes_by_target_ensemblId
8Diseases/phenotypes
OpenTargets_get_target_tractability_by_ensemblId
9Druggability assessment
OpenTargets_get_target_safety_profile_by_ensemblId
10Safety liabilities
OpenTargets_get_target_interactions_by_ensemblId
6PPI network
OpenTargets_get_target_gene_ontology_by_ensemblId
5GO annotations
OpenTargets_get_publications_by_target_ensemblId
11Literature
OpenTargets_get_biological_mouse_models_by_ensemblId
8/10Mouse KO phenotypes
OpenTargets_get_chemical_probes_by_target_ensemblId
9Chemical probes
OpenTargets_get_associated_drugs_by_target_ensemblId
9Known drugs
目标:在进行专用查询前,填充章节5、8、9、10、11的基线数据。
关键:Open Targets提供最全面的聚合数据。需查询所有以下端点:
端点章节数据类型
OpenTargets_get_diseases_phenotypes_by_target_ensemblId
8疾病/表型
OpenTargets_get_target_tractability_by_ensemblId
9成药性评估
OpenTargets_get_target_safety_profile_by_ensemblId
10安全性风险
OpenTargets_get_target_interactions_by_ensemblId
6PPI网络
OpenTargets_get_target_gene_ontology_by_ensemblId
5GO注释
OpenTargets_get_publications_by_target_ensemblId
11文献
OpenTargets_get_biological_mouse_models_by_ensemblId
8/10小鼠敲除表型
OpenTargets_get_chemical_probes_by_target_ensemblId
9化学探针
OpenTargets_get_associated_drugs_by_target_ensemblId
9已知药物

Path 0 Implementation

路径0实现

python
def path_0_open_targets(tu, ids):
    """
    Open Targets foundation data - fills gaps for sections 5, 6, 8, 9, 10, 11.
    ALWAYS run this first.
    """
    ensembl_id = ids['ensembl']
    if not ensembl_id:
        return {'status': 'skipped', 'reason': 'No Ensembl ID'}
    
    results = {}
    
    # 1. Diseases & Phenotypes (Section 8)
    diseases = tu.tools.OpenTargets_get_diseases_phenotypes_by_target_ensemblId(
        ensemblId=ensembl_id
    )
    results['diseases'] = diseases if diseases else {'note': 'No disease associations returned'}
    
    # 2. Tractability (Section 9)
    tractability = tu.tools.OpenTargets_get_target_tractability_by_ensemblId(
        ensemblId=ensembl_id
    )
    results['tractability'] = tractability if tractability else {'note': 'No tractability data returned'}
    
    # 3. Safety Profile (Section 10)
    safety = tu.tools.OpenTargets_get_target_safety_profile_by_ensemblId(
        ensemblId=ensembl_id
    )
    results['safety'] = safety if safety else {'note': 'No safety liabilities identified'}
    
    # 4. Interactions (Section 6)
    interactions = tu.tools.OpenTargets_get_target_interactions_by_ensemblId(
        ensemblId=ensembl_id
    )
    results['interactions'] = interactions if interactions else {'note': 'No interactions returned'}
    
    # 5. GO Annotations (Section 5)
    go_terms = tu.tools.OpenTargets_get_target_gene_ontology_by_ensemblId(
        ensemblId=ensembl_id
    )
    results['go_terms'] = go_terms if go_terms else {'note': 'No GO annotations returned'}
    
    # 6. Publications (Section 11)
    publications = tu.tools.OpenTargets_get_publications_by_target_ensemblId(
        ensemblId=ensembl_id
    )
    results['publications'] = publications if publications else {'note': 'No publications returned'}
    
    # 7. Mouse Models (Section 8/10)
    mouse_models = tu.tools.OpenTargets_get_biological_mouse_models_by_ensemblId(
        ensemblId=ensembl_id
    )
    results['mouse_models'] = mouse_models if mouse_models else {'note': 'No mouse model data returned'}
    
    # 8. Chemical Probes (Section 9)
    probes = tu.tools.OpenTargets_get_chemical_probes_by_target_ensemblId(
        ensemblId=ensembl_id
    )
    results['chemical_probes'] = probes if probes else {'note': 'No chemical probes available'}
    
    # 9. Associated Drugs (Section 9)
    drugs = tu.tools.OpenTargets_get_associated_drugs_by_target_ensemblId(
        ensemblId=ensembl_id
    )
    results['drugs'] = drugs if drugs else {'note': 'No approved/trial drugs found'}
    
    return results
python
def path_0_open_targets(tu, ids):
    """
    Open Targets基础数据 - 填补章节5、6、8、9、10、11的缺口。
    始终优先运行该路径。
    """
    ensembl_id = ids['ensembl']
    if not ensembl_id:
        return {'status': 'skipped', 'reason': '无Ensembl ID'}
    
    results = {}
    
    # 1. 疾病与表型(章节8)
    diseases = tu.tools.OpenTargets_get_diseases_phenotypes_by_target_ensemblId(
        ensemblId=ensembl_id
    )
    results['diseases'] = diseases if diseases else {'note': '未返回疾病关联数据'}
    
    # 2. 成药性(章节9)
    tractability = tu.tools.OpenTargets_get_target_tractability_by_ensemblId(
        ensemblId=ensembl_id
    )
    results['tractability'] = tractability if tractability else {'note': '未返回成药性数据'}
    
    # 3. 安全性概况(章节10)
    safety = tu.tools.OpenTargets_get_target_safety_profile_by_ensemblId(
        ensemblId=ensembl_id
    )
    results['safety'] = safety if safety else {'note': '未识别到安全性风险'}
    
    # 4. 相互作用(章节6)
    interactions = tu.tools.OpenTargets_get_target_interactions_by_ensemblId(
        ensemblId=ensembl_id
    )
    results['interactions'] = interactions if interactions else {'note': '未返回相互作用数据'}
    
    # 5. GO注释(章节5)
    go_terms = tu.tools.OpenTargets_get_target_gene_ontology_by_ensemblId(
        ensemblId=ensembl_id
    )
    results['go_terms'] = go_terms if go_terms else {'note': '未返回GO注释数据'}
    
    # 6. 出版物(章节11)
    publications = tu.tools.OpenTargets_get_publications_by_target_ensemblId(
        ensemblId=ensembl_id
    )
    results['publications'] = publications if publications else {'note': '未返回出版物数据'}
    
    # 7. 小鼠模型(章节8/10)
    mouse_models = tu.tools.OpenTargets_get_biological_mouse_models_by_ensemblId(
        ensemblId=ensembl_id
    )
    results['mouse_models'] = mouse_models if mouse_models else {'note': '未返回小鼠模型数据'}
    
    # 8. 化学探针(章节9)
    probes = tu.tools.OpenTargets_get_chemical_probes_by_target_ensemblId(
        ensemblId=ensembl_id
    )
    results['chemical_probes'] = probes if probes else {'note': '无可用化学探针'}
    
    # 9. 关联药物(章节9)
    drugs = tu.tools.OpenTargets_get_associated_drugs_by_target_ensemblId(
        ensemblId=ensembl_id
    )
    results['drugs'] = drugs if drugs else {'note': '未找到获批/临床试验药物'}
    
    return results

Negative Results Are Data

阴性结果也是有效数据

CRITICAL: Always document when a query returns empty:
markdown
undefined
关键:始终记录查询返回空结果的情况:
markdown
undefined

9.3 Chemical Probes

9.3化学探针

Status: No validated chemical probes available for this target. Source: OpenTargets_get_chemical_probes_by_target_ensemblId returned empty
Implication: Tool compound development would be needed for chemical biology studies.

---
状态:该靶点暂无经过验证的化学探针。 来源:OpenTargets_get_chemical_probes_by_target_ensemblId返回空结果
影响:化学生物学研究需要开发工具化合物。

---

PATH 2: Structure & Domains (Enhanced)

路径2:结构与结构域(增强版)

Objective: Robust structure coverage using 3-step chain.
目标:通过三步流程实现可靠的结构覆盖。

3-Step Structure Search Chain

三步结构搜索流程

Do NOT rely solely on PDB text search. Use this chain:
python
def path_structure_robust(tu, ids):
    """
    Robust structure search using 3-step chain.
    """
    structures = {'pdb': [], 'alphafold': None, 'domains': [], 'method_notes': []}
    
    # STEP 1: UniProt PDB Cross-References (most reliable)
    if ids['uniprot']:
        entry = tu.tools.UniProt_get_entry_by_accession(accession=ids['uniprot'])
        pdb_xrefs = [x for x in entry.get('uniProtKBCrossReferences', []) 
                    if x.get('database') == 'PDB']
        for xref in pdb_xrefs:
            pdb_id = xref.get('id')
            # Get details for each PDB
            pdb_info = tu.tools.get_protein_metadata_by_pdb_id(pdb_id=pdb_id)
            if pdb_info:
                structures['pdb'].append(pdb_info)
        structures['method_notes'].append(f"Step 1: {len(pdb_xrefs)} PDB cross-refs from UniProt")
    
    # STEP 2: Sequence-based PDB Search (catches missing annotations)
    if ids['uniprot'] and len(structures['pdb']) < 5:
        sequence = tu.tools.UniProt_get_sequence_by_accession(accession=ids['uniprot'])
        if sequence and len(sequence) < 1000:  # Reasonable length for search
            similar = tu.tools.PDB_search_similar_structures(
                sequence=sequence[:500],  # Use first 500 AA if long
                identity_cutoff=0.7
            )
            if similar:
                for hit in similar[:10]:  # Top 10 similar
                    if hit['pdb_id'] not in [s.get('pdb_id') for s in structures['pdb']]:
                        structures['pdb'].append(hit)
        structures['method_notes'].append(f"Step 2: Sequence search (identity ≥70%)")
    
    # STEP 3: Domain-based Search (for multi-domain proteins)
    if ids['uniprot']:
        domains = tu.tools.InterPro_get_protein_domains(uniprot_accession=ids['uniprot'])
        structures['domains'] = domains if domains else []
        
        # For large proteins with domains, search by domain sequence windows
        if len(structures['pdb']) < 3 and domains:
            for domain in domains[:3]:  # Top 3 domains
                domain_name = domain.get('name', '')
                # Could search PDB by domain name
                domain_hits = tu.tools.PDB_search_by_keyword(query=domain_name, limit=5)
                if domain_hits:
                    structures['method_notes'].append(f"Step 3: Domain '{domain_name}' search")
    
    # AlphaFold (always check)
    alphafold = tu.tools.alphafold_get_prediction(uniprot_accession=ids['uniprot'])
    structures['alphafold'] = alphafold if alphafold else {'note': 'No AlphaFold prediction'}
    
    # IMPORTANT: Document limitations
    if not structures['pdb']:
        structures['limitation'] = "No direct PDB hit does NOT mean no structure exists. Check: (1) structures under different UniProt entries, (2) homolog structures, (3) domain-only structures."
    
    return structures
不要仅依赖PDB文本搜索。使用以下流程:
python
def path_structure_robust(tu, ids):
    """
    使用三步流程进行可靠的结构搜索。
    """
    structures = {'pdb': [], 'alphafold': None, 'domains': [], 'method_notes': []}
    
    # 步骤1:UniProt PDB交叉引用(最可靠)
    if ids['uniprot']:
        entry = tu.tools.UniProt_get_entry_by_accession(accession=ids['uniprot'])
        pdb_xrefs = [x for x in entry.get('uniProtKBCrossReferences', []) 
                    if x.get('database') == 'PDB']
        for xref in pdb_xrefs:
            pdb_id = xref.get('id')
            # 获取每个PDB的详细信息
            pdb_info = tu.tools.get_protein_metadata_by_pdb_id(pdb_id=pdb_id)
            if pdb_info:
                structures['pdb'].append(pdb_info)
        structures['method_notes'].append(f"步骤1:从UniProt获取{len(pdb_xrefs)}条PDB交叉引用")
    
    # 步骤2:基于序列的PDB搜索(捕获缺失的注释)
    if ids['uniprot'] and len(structures['pdb']) < 5:
        sequence = tu.tools.UniProt_get_sequence_by_accession(accession=ids['uniprot'])
        if sequence and len(sequence) < 1000:  # 序列长度适合搜索
            similar = tu.tools.PDB_search_similar_structures(
                sequence=sequence[:500],  # 如果序列过长,使用前500个氨基酸
                identity_cutoff=0.7
            )
            if similar:
                for hit in similar[:10]:  # 前10个相似结构
                    if hit['pdb_id'] not in [s.get('pdb_id') for s in structures['pdb']]:
                        structures['pdb'].append(hit)
        structures['method_notes'].append(f"步骤2:序列搜索(一致性≥70%)")
    
    # 步骤3:基于结构域的搜索(针对多结构域蛋白质)
    if ids['uniprot']:
        domains = tu.tools.InterPro_get_protein_domains(uniprot_accession=ids['uniprot'])
        structures['domains'] = domains if domains else []
        
        # 对于带结构域的大蛋白,按结构域序列窗口搜索
        if len(structures['pdb']) < 3 and domains:
            for domain in domains[:3]:  # 前3个结构域
                domain_name = domain.get('name', '')
                # 可按结构域名称搜索PDB
                domain_hits = tu.tools.PDB_search_by_keyword(query=domain_name, limit=5)
                if domain_hits:
                    structures['method_notes'].append(f"步骤3:结构域'{domain_name}'搜索")
    
    # AlphaFold(始终检查)
    alphafold = tu.tools.alphafold_get_prediction(uniprot_accession=ids['uniprot'])
    structures['alphafold'] = alphafold if alphafold else {'note': '无AlphaFold预测结果'}
    
    # 重要:记录局限性
    if not structures['pdb']:
        structures['limitation'] = "无直接PDB命中并不意味着不存在结构。请检查:(1) 不同UniProt条目下的结构,(2) 同源结构,(3) 仅含结构域的结构。"
    
    return structures

Structure Section Output Format

结构章节输出格式

markdown
undefined
markdown
undefined

4.1 Experimental Structures (PDB)

4.1实验结构(PDB)

Total PDB Entries: 23 structures (Source: UniProt cross-references) Search Method: 3-step chain (UniProt xrefs → sequence search → domain search)
PDB IDResolutionMethodLigandCoverageYear
1M172.6ÅX-rayErlotinib672-9982002
3POZ2.8ÅX-rayGefitinib696-10222010
Note: "No direct PDB hit" ≠ "no structure exists". Check homologs and domain structures.

---
PDB条目总数:23个结构 来源:UniProt交叉引用 搜索方法:三步流程(UniProt交叉引用→序列搜索→结构域搜索)
PDB ID分辨率方法配体覆盖范围年份
1M172.6ÅX射线厄洛替尼672-9982002
3POZ2.8ÅX射线吉非替尼696-10222010
注意:"无直接PDB命中"≠"不存在结构"。请检查同源结构和结构域结构。

---

PATH 5: Expression Profile (Enhanced)

路径5:表达谱(增强版)

GTEx with Versioned ID Fallback

带版本化ID备选方案的GTEx

python
def path_expression(tu, ids):
    """
    Expression data with GTEx versioned ID fallback.
    """
    results = {'gtex': None, 'hpa': None, 'failed_tools': []}
    
    # GTEx with fallback
    ensembl_id = ids['ensembl']
    versioned_id = ids.get('ensembl_versioned')
    
    # Try unversioned first
    gtex_result = tu.tools.GTEx_get_median_gene_expression(
        gencode_id=ensembl_id,
        operation="median"
    )
    
    # Fallback to versioned if empty
    if not gtex_result or gtex_result.get('data') == []:
        if versioned_id:
            gtex_result = tu.tools.GTEx_get_median_gene_expression(
                gencode_id=versioned_id,
                operation="median"
            )
            if gtex_result and gtex_result.get('data'):
                results['gtex'] = gtex_result
                results['gtex_note'] = f"Used versioned ID: {versioned_id}"
        
        if not results.get('gtex'):
            results['failed_tools'].append({
                'tool': 'GTEx_get_median_gene_expression',
                'tried': [ensembl_id, versioned_id],
                'fallback': 'See HPA data below'
            })
    else:
        results['gtex'] = gtex_result
    
    # HPA (always query as backup)
    hpa_result = tu.tools.HPA_get_rna_expression_by_source(ensembl_id=ensembl_id)
    results['hpa'] = hpa_result if hpa_result else {'note': 'No HPA RNA data'}
    
    return results
python
def path_expression(tu, ids):
    """
    带GTEx版本化ID备选方案的表达数据。
    """
    results = {'gtex': None, 'hpa': None, 'failed_tools': []}
    
    # 带备选方案的GTEx
    ensembl_id = ids['ensembl']
    versioned_id = ids.get('ensembl_versioned')
    
    # 先尝试未版本化ID
    gtex_result = tu.tools.GTEx_get_median_gene_expression(
        gencode_id=ensembl_id,
        operation="median"
    )
    
    # 如果返回空结果,备选使用版本化ID
    if not gtex_result or gtex_result.get('data') == []:
        if versioned_id:
            gtex_result = tu.tools.GTEx_get_median_gene_expression(
                gencode_id=versioned_id,
                operation="median"
            )
            if gtex_result and gtex_result.get('data'):
                results['gtex'] = gtex_result
                results['gtex_note'] = f"使用版本化ID:{versioned_id}"
        
        if not results.get('gtex'):
            results['failed_tools'].append({
                'tool': 'GTEx_get_median_gene_expression',
                'tried': [ensembl_id, versioned_id],
                'fallback': '参见下方HPA数据'
            })
    else:
        results['gtex'] = gtex_result
    
    # HPA(始终作为备份查询)
    hpa_result = tu.tools.HPA_get_rna_expression_by_source(ensembl_id=ensembl_id)
    results['hpa'] = hpa_result if hpa_result else {'note': '无HPA RNA数据'}
    
    return results

Human Protein Atlas - Extended Expression (NEW)

人类蛋白质图谱 - 扩展表达数据(新增)

HPA provides comprehensive protein expression data including tissue-level, cell-level, and cell line expression.
python
def get_hpa_comprehensive_expression(tu, gene_symbol):
    """
    Get comprehensive expression data from Human Protein Atlas.
    
    Provides:
    - Tissue expression (protein and RNA)
    - Subcellular localization
    - Cell line expression comparison
    - Tissue specificity
    """
    
    # 1. Search for gene to get IDs
    gene_info = tu.tools.HPA_search_genes_by_query(search_query=gene_symbol)
    
    if not gene_info:
        return {'error': f'Gene {gene_symbol} not found in HPA'}
    
    # 2. Get tissue expression with specificity
    tissue_search = tu.tools.HPA_generic_search(
        search_query=gene_symbol,
        columns="g,gs,rnat,rnatsm,scml,scal",  # Gene, synonyms, tissue specificity, subcellular
        format="json"
    )
    
    # 3. Compare expression in cancer cell lines vs normal tissue
    cell_lines = ['a549', 'mcf7', 'hela', 'hepg2', 'pc3']
    cell_line_expression = {}
    
    for cell_line in cell_lines:
        try:
            expr = tu.tools.HPA_get_comparative_expression_by_gene_and_cellline(
                gene_name=gene_symbol,
                cell_line=cell_line
            )
            cell_line_expression[cell_line] = expr
        except:
            continue
    
    return {
        'gene_info': gene_info,
        'tissue_data': tissue_search,
        'cell_line_expression': cell_line_expression,
        'source': 'Human Protein Atlas'
    }
HPA Expression Output for Report:
markdown
undefined
HPA提供全面的蛋白质表达数据,包括组织水平、细胞水平和细胞系表达。
python
def get_hpa_comprehensive_expression(tu, gene_symbol):
    """
    从人类蛋白质图谱获取全面的表达数据。
    
    提供:
    - 组织表达(蛋白质和RNA)
    - 亚细胞定位
    - 细胞系表达比较
    - 组织特异性
    """
    
    # 1. 搜索基因以获取ID
    gene_info = tu.tools.HPA_search_genes_by_query(search_query=gene_symbol)
    
    if not gene_info:
        return {'error': f'在HPA中未找到基因{gene_symbol}'}
    
    # 2. 获取带特异性的组织表达数据
    tissue_search = tu.tools.HPA_generic_search(
        search_query=gene_symbol,
        columns="g,gs,rnat,rnatsm,scml,scal",  # 基因、同义词、组织特异性、亚细胞定位
        format="json"
    )
    
    # 3. 比较癌细胞系与正常组织的表达
    cell_lines = ['a549', 'mcf7', 'hela', 'hepg2', 'pc3']
    cell_line_expression = {}
    
    for cell_line in cell_lines:
        try:
            expr = tu.tools.HPA_get_comparative_expression_by_gene_and_cellline(
                gene_name=gene_symbol,
                cell_line=cell_line
            )
            cell_line_expression[cell_line] = expr
        except:
            continue
    
    return {
        'gene_info': gene_info,
        'tissue_data': tissue_search,
        'cell_line_expression': cell_line_expression,
        'source': '人类蛋白质图谱'
    }
报告中的HPA表达输出:
markdown
undefined

Tissue Expression Profile (Human Protein Atlas)

组织表达谱(人类蛋白质图谱)

TissueProtein LevelRNA nTPMSpecificity
BrainHigh45.2Enriched
LiverMedium23.1Enhanced
KidneyLow8.4Not detected
Subcellular Localization: Cytoplasm, Plasma membrane
组织蛋白质水平RNA nTPM特异性
45.2富集
23.1增强
8.4未检测到
亚细胞定位:细胞质、质膜

Cancer Cell Line Expression

癌细胞系表达

Cell LineCancer TypeExpressionvs Normal
A549LungHighElevated
MCF7BreastMediumSimilar
HeLaCervicalHighElevated
Source: Human Protein Atlas via
HPA_search_genes_by_query
,
HPA_get_comparative_expression_by_gene_and_cellline

**Why HPA for Target Research**:
- **Drug target validation** - Confirm expression in target tissue
- **Safety assessment** - Expression in essential organs
- **Biomarker potential** - Tissue-specific expression
- **Cell line selection** - Choose appropriate models

---
细胞系癌症类型表达水平与正常组织对比
A549肺癌上调
MCF7乳腺癌相似
HeLa宫颈癌上调
来源:人类蛋白质图谱,通过
HPA_search_genes_by_query
HPA_get_comparative_expression_by_gene_and_cellline
获取

**为何针对靶点研究使用HPA**:
- **药物靶点验证** - 确认靶点在目标组织中的表达
- **安全性评估** - 在重要器官中的表达情况
- **生物标志物潜力** - 组织特异性表达
- **细胞系选择** - 选择合适的模型

---

PATH 6: Variants & Disease (Enhanced)

路径6:变异与疾病(增强版)

6.1 ClinVar SNV vs CNV Separation

6.1 ClinVar SNV与CNV分离

markdown
undefined
markdown
undefined

8.3 Clinical Variants (ClinVar)

8.3临床变异(ClinVar)

Single Nucleotide Variants (SNVs)

单核苷酸变异(SNVs)

VariantClinical SignificanceConditionReview StatusPMID
p.L858RPathogenicLung cancer4 stars15118125
p.T790MPathogenicDrug resistance4 stars15737014
Total Pathogenic SNVs: 47
变异临床意义病症评审状态PMID
p.L858R致病性肺癌4星15118125
p.T790M致病性耐药性4星15737014
致病性SNV总数:47个

Copy Number Variants (CNVs) - Reported Separately

拷贝数变异(CNVs)- 单独报告

TypeRegionClinical SignificanceFrequency
Amplification7p11.2PathogenicCommon in cancer
Note: CNV data separated as it represents different mutation mechanism
undefined
类型区域临床意义频率
扩增7p11.2致病性在癌症中常见
注意:CNV数据单独报告,因为其代表不同的突变机制
undefined

6.2 DisGeNET Integration (NEW)

6.2 DisGeNET整合(新增)

DisGeNET provides curated gene-disease associations with evidence scores. Requires:
DISGENET_API_KEY
python
def get_disgenet_associations(tu, ids):
    """
    Get gene-disease associations from DisGeNET.
    Complements Open Targets with curated association scores.
    """
    symbol = ids.get('symbol')
    if not symbol:
        return {'status': 'skipped', 'reason': 'No gene symbol'}
    
    # Get all disease associations for gene
    gda = tu.tools.DisGeNET_search_gene(
        operation="search_gene",
        gene=symbol,
        limit=50
    )
    
    if gda.get('status') != 'success':
        return {'status': 'error', 'message': 'DisGeNET query failed'}
    
    associations = gda.get('data', {}).get('associations', [])
    
    # Categorize by evidence strength
    strong = []     # score >= 0.7
    moderate = []   # score 0.4-0.7  
    weak = []       # score < 0.4
    
    for assoc in associations:
        score = assoc.get('score', 0)
        disease_name = assoc.get('disease_name', '')
        umls_cui = assoc.get('disease_id', '')
        
        entry = {
            'disease': disease_name,
            'umls_cui': umls_cui,
            'score': score,
            'evidence_index': assoc.get('ei'),
            'dsi': assoc.get('dsi'),  # Disease Specificity Index
            'dpi': assoc.get('dpi')   # Disease Pleiotropy Index
        }
        
        if score >= 0.7:
            strong.append(entry)
        elif score >= 0.4:
            moderate.append(entry)
        else:
            weak.append(entry)
    
    return {
        'total_associations': len(associations),
        'strong_associations': strong,
        'moderate_associations': moderate,
        'weak_associations': weak[:10],  # Limit weak
        'disease_pleiotropy': len(associations)  # How many diseases linked
    }
DisGeNET Report Section (add to Section 8 - Disease Associations):
markdown
undefined
DisGeNET提供带证据评分的curated基因-疾病关联数据。需要
DISGENET_API_KEY
python
def get_disgenet_associations(tu, ids):
    """
    从DisGeNET获取基因-疾病关联数据。
    用curated关联分数补充Open Targets数据。
    """
    symbol = ids.get('symbol')
    if not symbol:
        return {'status': 'skipped', 'reason': '无基因符号'}
    
    # 获取基因的所有疾病关联
    gda = tu.tools.DisGeNET_search_gene(
        operation="search_gene",
        gene=symbol,
        limit=50
    )
    
    if gda.get('status') != 'success':
        return {'status': 'error', 'message': 'DisGeNET查询失败'}
    
    associations = gda.get('data', {}).get('associations', [])
    
    # 按证据强度分类
    strong = []     # 评分≥0.7
    moderate = []   # 评分0.4-0.7  
    weak = []       # 评分<0.4
    
    for assoc in associations:
        score = assoc.get('score', 0)
        disease_name = assoc.get('disease_name', '')
        umls_cui = assoc.get('disease_id', '')
        
        entry = {
            'disease': disease_name,
            'umls_cui': umls_cui,
            'score': score,
            'evidence_index': assoc.get('ei'),
            'dsi': assoc.get('dsi'),  # 疾病特异性指数
            'dpi': assoc.get('dpi')   # 疾病多效性指数
        }
        
        if score >= 0.7:
            strong.append(entry)
        elif score >= 0.4:
            moderate.append(entry)
        else:
            weak.append(entry)
    
    return {
        'total_associations': len(associations),
        'strong_associations': strong,
        'moderate_associations': moderate,
        'weak_associations': weak[:10],  # 限制弱关联数量
        'disease_pleiotropy': len(associations)  # 关联的疾病数量
    }
DisGeNET报告章节(添加到章节8 - 疾病关联):
markdown
undefined

8.x DisGeNET Gene-Disease Associations (NEW)

8.x DisGeNET基因-疾病关联(新增)

Total Diseases Associated: 47
Disease Pleiotropy Index: High (gene linked to many disease types)
关联疾病总数:47种
疾病多效性指数:高(该基因与多种疾病类型相关)

Strong Associations (Score ≥0.7)

强关联(评分≥0.7)

DiseaseUMLS CUIScoreEvidence Index
Non-small cell lung cancerC00071310.850.92
GlioblastomaC00176360.780.88
疾病UMLS CUI评分证据指数
非小细胞肺癌C00071310.850.92
胶质母细胞瘤C00176360.780.88

Moderate Associations (Score 0.4-0.7)

中等关联(评分0.4-0.7)

DiseaseUMLS CUIScoreDSI
Breast cancerC00061420.620.45
Note: DisGeNET score integrates curated databases, GWAS, animal models, and literature

**Evidence Tier Assignment**:
- DisGeNET Score ≥0.7 → Consider T2 evidence (multiple validated sources)
- DisGeNET Score 0.4-0.7 → Consider T3 evidence
- DisGeNET Score <0.4 → T4 evidence only

---
疾病UMLS CUI评分DSI
乳腺癌C00061420.620.45
注意:DisGeNET评分整合了curated数据库、GWAS、动物模型和文献数据

**证据层级分配**:
- DisGeNET评分≥0.7 → 视为T2证据(多个验证来源)
- DisGeNET评分0.4-0.7 → 视为T3证据
- DisGeNET评分<0.4 → 仅视为T4证据

---

PATH 7: Druggability & Target Validation (ENHANCED)

路径7:成药性与靶点验证(增强版)

7.1 Pharos/TCRD - Target Development Level (NEW)

7.1 Pharos/TCRD - 靶点开发水平(新增)

NIH's Illuminating the Druggable Genome (IDG) portal provides TDL classification for all human proteins:
python
def get_pharos_target_info(tu, ids):
    """
    Get Pharos/TCRD target development level and druggability.
    
    TDL Classification:
    - Tclin: Approved drug targets
    - Tchem: Targets with small molecule activities (IC50 < 30nM)
    - Tbio: Targets with biological annotations
    - Tdark: Understudied proteins
    """
    gene_symbol = ids.get('symbol')
    uniprot = ids.get('uniprot')
    
    # Try by gene symbol first
    if gene_symbol:
        result = tu.tools.Pharos_get_target(
            gene=gene_symbol
        )
    elif uniprot:
        result = tu.tools.Pharos_get_target(
            uniprot=uniprot
        )
    else:
        return {'status': 'error', 'message': 'Need gene symbol or UniProt'}
    
    if result.get('status') == 'success' and result.get('data'):
        target = result['data']
        return {
            'name': target.get('name'),
            'symbol': target.get('sym'),
            'tdl': target.get('tdl'),  # Tclin/Tchem/Tbio/Tdark
            'family': target.get('fam'),  # Kinase, GPCR, etc.
            'novelty': target.get('novelty'),
            'description': target.get('description'),
            'publications': target.get('publicationCount'),
            'interpretation': interpret_tdl(target.get('tdl'))
        }
    return None

def interpret_tdl(tdl):
    """Interpret Target Development Level for druggability."""
    interpretations = {
        'Tclin': 'Approved drug target - highest confidence for druggability',
        'Tchem': 'Small molecule active - good chemical tractability',
        'Tbio': 'Biologically characterized - may require novel modalities',
        'Tdark': 'Understudied - limited data, high novelty potential'
    }
    return interpretations.get(tdl, 'Unknown')

def search_disease_targets(tu, disease_name):
    """Find targets associated with a disease via Pharos."""
    
    result = tu.tools.Pharos_get_disease_targets(
        disease=disease_name,
        top=50
    )
    
    if result.get('status') == 'success':
        targets = result['data'].get('targets', [])
        # Group by TDL for prioritization
        by_tdl = {'Tclin': [], 'Tchem': [], 'Tbio': [], 'Tdark': []}
        for t in targets:
            tdl = t.get('tdl', 'Unknown')
            if tdl in by_tdl:
                by_tdl[tdl].append(t)
        return by_tdl
    return None
Pharos Report Section (add to Section 9 - Druggability):
markdown
undefined
NIH的照亮可成药基因组(IDG)门户为所有人类蛋白质提供TDL分类:
python
def get_pharos_target_info(tu, ids):
    """
    获取Pharos/TCRD靶点开发水平和成药性数据。
    
    TDL分类:
    - Tclin:已获批药物靶点
    - Tchem:具有小分子活性的靶点(IC50 < 30nM)
    - Tbio:具有生物学注释的靶点
    - Tdark:研究不足的蛋白质
    """
    gene_symbol = ids.get('symbol')
    uniprot = ids.get('uniprot')
    
    # 先尝试按基因符号查询
    if gene_symbol:
        result = tu.tools.Pharos_get_target(
            gene=gene_symbol
        )
    elif uniprot:
        result = tu.tools.Pharos_get_target(
            uniprot=uniprot
        )
    else:
        return {'status': 'error', 'message': '需要基因符号或UniProt登录号'}
    
    if result.get('status') == 'success' and result.get('data'):
        target = result['data']
        return {
            'name': target.get('name'),
            'symbol': target.get('sym'),
            'tdl': target.get('tdl'),  # Tclin/Tchem/Tbio/Tdark
            'family': target.get('fam'),  # 激酶、GPCR等
            'novelty': target.get('novelty'),
            'description': target.get('description'),
            'publications': target.get('publicationCount'),
            'interpretation': interpret_tdl(target.get('tdl'))
        }
    return None

def interpret_tdl(tdl):
    """为成药性解读靶点开发水平。"""
    interpretations = {
        'Tclin': '已获批药物靶点 - 成药性置信度最高',
        'Tchem': '具有小分子活性 - 化学成药性良好',
        'Tbio': '已进行生物学表征 - 可能需要新的药物形式',
        'Tdark': '研究不足 - 数据有限,具有高新颖性潜力'
    }
    return interpretations.get(tdl, '未知')

def search_disease_targets(tu, disease_name):
    """通过Pharos查找与疾病相关的靶点。"""
    
    result = tu.tools.Pharos_get_disease_targets(
        disease=disease_name,
        top=50
    )
    
    if result.get('status') == 'success':
        targets = result['data'].get('targets', [])
        # 按TDL分组以优先排序
        by_tdl = {'Tclin': [], 'Tchem': [], 'Tbio': [], 'Tdark': []}
        for t in targets:
            tdl = t.get('tdl', 'Unknown')
            if tdl in by_tdl:
                by_tdl[tdl].append(t)
        return by_tdl
    return None
Pharos报告章节(添加到章节9 - 成药性):
markdown
undefined

9.x Pharos/TCRD Target Classification (NEW)

9.x Pharos/TCRD靶点分类(新增)

Target Development Level: Tchem
Protein Family: Kinase
Novelty Score: 0.35 (moderately studied)
Publication Count: 12,456
TDL Interpretation: Target has validated small molecule activities with IC50 < 30nM. Good chemical starting points exist.
Disease Targets Analysis (for disease-centric queries):
TDLCountExamples
Tclin12EGFR, ALK, RET
Tchem45KRAS, SHP2, CDK4
Tbio78Novel kinases
Tdark23Understudied
Source: Pharos/TCRD via
Pharos_get_target
undefined
靶点开发水平:Tchem
蛋白质家族:激酶
新颖性评分:0.35(中等研究程度)
出版物数量:12,456篇
TDL解读:该靶点具有经过验证的小分子活性,IC50 < 30nM。存在良好的化学起始点。
疾病靶点分析(针对疾病中心型查询):
TDL数量示例
Tclin12EGFR, ALK, RET
Tchem45KRAS, SHP2, CDK4
Tbio78新型激酶
Tdark23研究不足的靶点
来源:Pharos/TCRD,通过
Pharos_get_target
获取
undefined

7.2 DepMap - Target Essentiality Validation (NEW)

7.2 DepMap - 靶点必需性验证(新增)

CRISPR knockout data from cancer cell lines to validate target essentiality:
python
def assess_target_essentiality(tu, ids):
    """
    Is this target essential for cancer cell survival?
    
    Negative effect scores = gene is essential (cells die upon KO)
    """
    gene_symbol = ids.get('symbol')
    
    if not gene_symbol:
        return {'status': 'error', 'message': 'Need gene symbol'}
    
    deps = tu.tools.DepMap_get_gene_dependencies(
        gene_symbol=gene_symbol
    )
    
    if deps.get('status') == 'success':
        return {
            'gene': gene_symbol,
            'data': deps.get('data', {}),
            'interpretation': 'Negative scores indicate gene is essential for cell survival',
            'note': 'Score < -0.5 is strongly essential, < -1.0 is extremely essential'
        }
    return None

def get_cancer_type_essentiality(tu, gene_symbol, cancer_type):
    """Check if gene is essential in specific cancer type."""
    
    # Get cell lines for cancer type
    cell_lines = tu.tools.DepMap_get_cell_lines(
        cancer_type=cancer_type,
        page_size=20
    )
    
    return {
        'gene': gene_symbol,
        'cancer_type': cancer_type,
        'cell_lines': cell_lines.get('data', {}).get('cell_lines', []),
        'note': 'Query individual cell lines for dependency scores via DepMap portal'
    }
DepMap Report Section (add to Section 9 - Druggability):
markdown
undefined
来自癌细胞系的CRISPR敲除数据,用于验证靶点必需性:
python
def assess_target_essentiality(tu, ids):
    """
    该靶点对癌细胞存活是否必需?
    
    负效应评分 = 基因是必需的(敲除后细胞死亡)
    """
    gene_symbol = ids.get('symbol')
    
    if not gene_symbol:
        return {'status': 'error', 'message': '需要基因符号'}
    
    deps = tu.tools.DepMap_get_gene_dependencies(
        gene_symbol=gene_symbol
    )
    
    if deps.get('status') == 'success':
        return {
            'gene': gene_symbol,
            'data': deps.get('data', {}),
            'interpretation': '负评分表明基因对细胞存活是必需的',
            'note': '评分< -0.5表示强必需,< -1.0表示极强必需'
        }
    return None

def get_cancer_type_essentiality(tu, gene_symbol, cancer_type):
    """检查基因在特定癌症类型中是否必需。"""
    
    # 获取该癌症类型的细胞系
    cell_lines = tu.tools.DepMap_get_cell_lines(
        cancer_type=cancer_type,
        page_size=20
    )
    
    return {
        'gene': gene_symbol,
        'cancer_type': cancer_type,
        'cell_lines': cell_lines.get('data', {}).get('cell_lines', []),
        'note': '通过DepMap门户查询单个细胞系的依赖评分'
    }
DepMap报告章节(添加到章节9 - 成药性):
markdown
undefined

9.x Target Essentiality (DepMap) (NEW)

9.x靶点必需性(DepMap)(新增)

Gene Essentiality Assessment:
ContextEffect ScoreInterpretation
Pan-cancer-0.42Moderately essential
Lung cancer-0.78Strongly essential
Breast cancer-0.21Weakly essential
Selectivity: Differential essentiality suggests cancer-type selective target
Cell Lines Tested: 1,054 cancer cell lines from DepMap
Interpretation: Score < -0.5 indicates strong dependency. This target is more essential in lung cancer than other cancer types - suggesting lung-selective targeting may be feasible.
Source: DepMap via
DepMap_get_gene_dependencies
undefined
基因必需性评估:
场景效应评分解读
泛癌症-0.42中等必需
肺癌-0.78强必需
乳腺癌-0.21弱必需
选择性:差异必需性表明该靶点具有癌症类型选择性
测试细胞系:来自DepMap的1,054个癌细胞系
解读:评分< -0.5表示强依赖。该靶点在肺癌中比其他癌症类型更必需 - 表明肺癌选择性靶向是可行的。
来源:DepMap,通过
DepMap_get_gene_dependencies
获取
undefined

7.3 InterProScan - Novel Domain Prediction (NEW)

7.3 InterProScan - 新结构域预测(新增)

For uncharacterized proteins, run InterProScan to predict domains and function:
python
def predict_protein_domains(tu, sequence, title="Query protein"):
    """
    Run InterProScan for de novo domain prediction.
    
    Use when:
    - Protein has no InterPro annotations
    - Novel/uncharacterized protein
    - Custom sequence analysis
    """
    
    result = tu.tools.InterProScan_scan_sequence(
        sequence=sequence,
        title=title,
        go_terms=True,
        pathways=True
    )
    
    if result.get('status') == 'success':
        data = result.get('data', {})
        
        # Job may still be running
        if data.get('job_status') == 'RUNNING':
            return {
                'job_id': data.get('job_id'),
                'status': 'running',
                'note': 'Use InterProScan_get_job_results to retrieve when ready'
            }
        
        # Parse completed results
        return {
            'domains': data.get('domains', []),
            'domain_count': data.get('domain_count', 0),
            'go_annotations': data.get('go_annotations', []),
            'pathways': data.get('pathways', []),
            'sequence_length': data.get('sequence_length')
        }
    return None

def check_interproscan_job(tu, job_id):
    """Check status and get results for InterProScan job."""
    
    status = tu.tools.InterProScan_get_job_status(job_id=job_id)
    
    if status.get('data', {}).get('is_finished'):
        results = tu.tools.InterProScan_get_job_results(job_id=job_id)
        return results.get('data', {})
    
    return status.get('data', {})
When to use InterProScan:
  • Novel/uncharacterized proteins (Tdark in Pharos)
  • Custom sequences (e.g., protein variants)
  • Proteins with outdated/sparse InterPro annotations
  • Validating domain predictions
InterProScan Report Section (for novel proteins):
markdown
undefined
针对未表征的蛋白质,运行InterProScan以预测结构域和功能:
python
def predict_protein_domains(tu, sequence, title="Query protein"):
    """
    运行InterProScan进行从头结构域预测。
    
    使用场景:
    - 蛋白质无InterPro注释
    - 新型/未表征蛋白质
    - 自定义序列分析
    """
    
    result = tu.tools.InterProScan_scan_sequence(
        sequence=sequence,
        title=title,
        go_terms=True,
        pathways=True
    )
    
    if result.get('status') == 'success':
        data = result.get('data', {})
        
        # 任务可能仍在运行
        if data.get('job_status') == 'RUNNING':
            return {
                'job_id': data.get('job_id'),
                'status': 'running',
                'note': '使用InterProScan_get_job_results在任务完成后获取结果'
            }
        
        # 解析已完成的结果
        return {
            'domains': data.get('domains', []),
            'domain_count': data.get('domain_count', 0),
            'go_annotations': data.get('go_annotations', []),
            'pathways': data.get('pathways', []),
            'sequence_length': data.get('sequence_length')
        }
    return None

def check_interproscan_job(tu, job_id):
    """检查InterProScan任务状态并获取结果。"""
    
    status = tu.tools.InterProScan_get_job_status(job_id=job_id)
    
    if status.get('data', {}).get('is_finished'):
        results = tu.tools.InterProScan_get_job_results(job_id=job_id)
        return results.get('data', {})
    
    return status.get('data', {})
何时使用InterProScan:
  • 新型/未表征蛋白质(Pharos中的Tdark)
  • 自定义序列(例如:蛋白质变异体)
  • InterPro注释过时/稀疏的蛋白质
  • 验证结构域预测
InterProScan报告章节(针对新型蛋白质):
markdown
undefined

Domain Prediction (InterProScan) (NEW)

结构域预测(InterProScan)(新增)

Used for uncharacterized protein analysis
Predicted Domains:
DomainDatabaseStart-EndE-valueInterPro Entry
Protein kinase domainPfam45-3051.2e-89IPR000719
SH2 domainSMART320-4103.4e-45IPR000980
Predicted GO Terms:
  • GO:0004672 protein kinase activity
  • GO:0005524 ATP binding
Predicted Pathways:
  • Reactome: Signal Transduction
Source: InterProScan via
InterProScan_scan_sequence
undefined
用于未表征蛋白质分析
预测的结构域:
结构域数据库起始-终止位置E值InterPro条目
蛋白激酶结构域Pfam45-3051.2e-89IPR000719
SH2结构域SMART320-4103.4e-45IPR000980
预测的GO术语:
  • GO:0004672 蛋白激酶活性
  • GO:0005524 ATP结合
预测的通路:
  • Reactome: 信号转导
来源:InterProScan,通过
InterProScan_scan_sequence
获取
undefined

7.4 BindingDB - Known Ligands & Binding Data (NEW)

7.4 BindingDB - 已知配体与结合数据(新增)

BindingDB provides experimental binding affinity data (Ki, IC50, Kd) for target-ligand pairs:
python
def get_bindingdb_ligands(tu, uniprot_id, affinity_cutoff=10000):
    """
    Get ligands with measured binding affinities from BindingDB.
    
    Critical for:
    - Identifying chemical starting points
    - Understanding existing chemical matter
    - Assessing tractability with small molecules
    
    Args:
        uniprot_id: UniProt accession (e.g., P00533 for EGFR)
        affinity_cutoff: Maximum affinity in nM (lower = more potent)
    """
    
    # Get ligands by UniProt
    result = tu.tools.BindingDB_get_ligands_by_uniprot(
        uniprot=uniprot_id,
        affinity_cutoff=affinity_cutoff
    )
    
    if result:
        ligands = []
        for entry in result:
            ligands.append({
                'smiles': entry.get('smile'),
                'affinity_type': entry.get('affinity_type'),  # Ki, IC50, Kd
                'affinity_nM': entry.get('affinity'),
                'monomer_id': entry.get('monomerid'),
                'pmid': entry.get('pmid')
            })
        
        # Sort by affinity (most potent first)
        ligands.sort(key=lambda x: float(x['affinity_nM']) if x['affinity_nM'] else float('inf'))
        
        return {
            'total_ligands': len(ligands),
            'ligands': ligands[:20],  # Top 20 most potent
            'best_affinity': ligands[0]['affinity_nM'] if ligands else None
        }
    
    return {'total_ligands': 0, 'ligands': [], 'note': 'No ligands found in BindingDB'}

def get_ligands_by_structure(tu, pdb_id, affinity_cutoff=10000):
    """Get ligands for a protein by PDB structure ID."""
    
    result = tu.tools.BindingDB_get_ligands_by_pdb(
        pdb_ids=pdb_id,
        affinity_cutoff=affinity_cutoff,
        sequence_identity=100
    )
    
    return result

def find_compound_targets(tu, smiles, similarity_cutoff=0.85):
    """Find other targets for a compound (polypharmacology)."""
    
    result = tu.tools.BindingDB_get_targets_by_compound(
        smiles=smiles,
        similarity_cutoff=similarity_cutoff
    )
    
    return result
BindingDB Report Section (add to Section 9 - Druggability):
markdown
undefined
BindingDB提供靶点-配体对的实验结合亲和力数据(Ki、IC50、Kd):
python
def get_bindingdb_ligands(tu, uniprot_id, affinity_cutoff=10000):
    """
    从BindingDB获取具有测量结合亲和力的配体。
    
    关键用途:
    - 识别化学起始点
    - 了解现有化学物质
    - 评估小分子成药性
    
    参数:
        uniprot_id: UniProt登录号(例如:EGFR的P00533)
        affinity_cutoff: 最大亲和力(单位:nM,值越小表示活性越强)
    """
    
    # 按UniProt获取配体
    result = tu.tools.BindingDB_get_ligands_by_uniprot(
        uniprot=uniprot_id,
        affinity_cutoff=affinity_cutoff
    )
    
    if result:
        ligands = []
        for entry in result:
            ligands.append({
                'smiles': entry.get('smile'),
                'affinity_type': entry.get('affinity_type'),  # Ki、IC50、Kd
                'affinity_nM': entry.get('affinity'),
                'monomer_id': entry.get('monomerid'),
                'pmid': entry.get('pmid')
            })
        
        # 按亲和力排序(活性最强的在前)
        ligands.sort(key=lambda x: float(x['affinity_nM']) if x['affinity_nM'] else float('inf'))
        
        return {
            'total_ligands': len(ligands),
            'ligands': ligands[:20],  # 前20个活性最强的配体
            'best_affinity': ligands[0]['affinity_nM'] if ligands else None
        }
    
    return {'total_ligands': 0, 'ligands': [], 'note': '在BindingDB中未找到配体'}

def get_ligands_by_structure(tu, pdb_id, affinity_cutoff=10000):
    """按PDB结构ID获取蛋白质的配体。"""
    
    result = tu.tools.BindingDB_get_ligands_by_pdb(
        pdb_ids=pdb_id,
        affinity_cutoff=affinity_cutoff,
        sequence_identity=100
    )
    
    return result

def find_compound_targets(tu, smiles, similarity_cutoff=0.85):
    """找到化合物的其他靶点(多药理学)。"""
    
    result = tu.tools.BindingDB_get_targets_by_compound(
        smiles=smiles,
        similarity_cutoff=similarity_cutoff
    )
    
    return result
BindingDB报告章节(添加到章节9 - 成药性):
markdown
undefined

Known Ligands (BindingDB) (NEW)

已知配体(BindingDB)(新增)

Total Ligands with Binding Data: 156 Best Reported Affinity: 0.3 nM (Ki)
具有结合数据的配体总数:156个 最佳报告亲和力:0.3 nM(Ki)

Most Potent Ligands

活性最强的配体

SMILESAffinity TypeValue (nM)Source PMID
CC(=O)Nc1ccc(cc1)c2...Ki0.315737014
CN(C)C/C=C/C(=O)Nc1...IC500.815896103
COc1cc2ncnc(Nc3ccc...Kd2.116460808
Chemical Tractability Assessment:
  • Tchem-level target: Multiple ligands with <30nM affinity
  • Diverse chemotypes: Multiple scaffolds identified
  • Published literature: Ligands have PMID references
Source: BindingDB via
BindingDB_get_ligands_by_uniprot

**Affinity Interpretation for Druggability**:
| Affinity Range | Interpretation | Drug Development Potential |
|----------------|----------------|---------------------------|
| <1 nM | Ultra-potent | Clinical compound likely |
| 1-10 nM | Highly potent | Drug-like |
| 10-100 nM | Potent | Good starting point |
| 100-1000 nM | Moderate | Needs optimization |
| >1000 nM | Weak | Early hit only |
SMILES亲和力类型值(nM)来源PMID
CC(=O)Nc1ccc(cc1)c2...Ki0.315737014
CN(C)C/C=C/C(=O)Nc1...IC500.815896103
COc1cc2ncnc(Nc3ccc...Kd2.116460808
化学成药性评估:
  • Tchem级靶点:多个配体亲和力<30nM
  • 多样的化学类型:识别到多个骨架
  • 已发表文献:配体具有PMID参考文献
来源:BindingDB,通过
BindingDB_get_ligands_by_uniprot
获取

**亲和力成药性解读**:
| 亲和力范围 | 解读 | 药物开发潜力 |
|----------------|----------------|---------------------------|
| <1 nM | 超活性 | 可能为临床化合物 |
| 1-10 nM | 高活性 | 类药物 |
| 10-100 nM | 活性良好 | 良好的起始点 |
| 100-1000 nM | 中等活性 | 需要优化 |
| >1000 nM | 弱活性 | 仅为早期命中 |

7.5 PubChem BioAssay - Screening Data (NEW)

7.5 PubChem生物分析 - 筛选数据(新增)

PubChem BioAssay provides HTS screening data and dose-response curves:
python
def get_pubchem_assays_for_target(tu, gene_symbol):
    """
    Get bioassays targeting a gene from PubChem.
    
    Provides:
    - HTS screening results
    - Dose-response data (IC50/EC50)
    - Active compound counts
    """
    
    # Search assays by target gene
    assays = tu.tools.PubChem_search_assays_by_target_gene(
        gene_symbol=gene_symbol
    )
    
    assay_info = []
    if assays.get('data', {}).get('aids'):
        for aid in assays['data']['aids'][:10]:  # Top 10 assays
            # Get assay details
            summary = tu.tools.PubChem_get_assay_summary(aid=aid)
            targets = tu.tools.PubChem_get_assay_targets(aid=aid)
            
            assay_info.append({
                'aid': aid,
                'summary': summary.get('data', {}),
                'targets': targets.get('data', {})
            })
    
    return {
        'total_assays': len(assays.get('data', {}).get('aids', [])),
        'assay_details': assay_info
    }

def get_active_compounds_from_assay(tu, aid):
    """Get active compounds from a specific bioassay."""
    
    actives = tu.tools.PubChem_get_assay_active_compounds(aid=aid)
    
    return {
        'aid': aid,
        'active_cids': actives.get('data', {}).get('cids', []),
        'count': len(actives.get('data', {}).get('cids', []))
    }
PubChem BioAssay Report Section:
markdown
undefined
PubChem生物分析提供HTS筛选数据和剂量反应曲线:
python
def get_pubchem_assays_for_target(tu, gene_symbol):
    """
    从PubChem获取针对基因的生物分析数据。
    
    提供:
    - HTS筛选结果
    - 剂量反应数据(IC50/EC50)
    - 活性化合物数量
    """
    
    # 按靶点基因搜索分析
    assays = tu.tools.PubChem_search_assays_by_target_gene(
        gene_symbol=gene_symbol
    )
    
    assay_info = []
    if assays.get('data', {}).get('aids'):
        for aid in assays['data']['aids'][:10]:  # 前10个分析
            # 获取分析详情
            summary = tu.tools.PubChem_get_assay_summary(aid=aid)
            targets = tu.tools.PubChem_get_assay_targets(aid=aid)
            
            assay_info.append({
                'aid': aid,
                'summary': summary.get('data', {}),
                'targets': targets.get('data', {})
            })
    
    return {
        'total_assays': len(assays.get('data', {}).get('aids', [])),
        'assay_details': assay_info
    }

def get_active_compounds_from_assay(tu, aid):
    """从特定生物分析中获取活性化合物。"""
    
    actives = tu.tools.PubChem_get_assay_active_compounds(aid=aid)
    
    return {
        'aid': aid,
        'active_cids': actives.get('data', {}).get('cids', []),
        'count': len(actives.get('data', {}).get('cids', []))
    }
PubChem生物分析报告章节:
markdown
undefined

PubChem BioAssay Data (NEW)

PubChem生物分析数据(新增)

Assays Targeting This Gene: 45
AIDAssay TypeActive CompoundsTarget Info
1053104Dose-response12EGFR kinase
504526HTS234EGFR binding
651564Confirmatory8EGFR cellular
Total Active Compounds Across Assays: ~500
Source: PubChem via
PubChem_search_assays_by_target_gene
,
PubChem_get_assay_active_compounds

---
针对该基因的分析数量:45个
AID分析类型活性化合物数量靶点信息
1053104剂量反应12EGFR激酶
504526HTS234EGFR结合
651564确证性分析8EGFR细胞水平分析
所有分析中的活性化合物总数:约500个
来源:PubChem,通过
PubChem_search_assays_by_target_gene
PubChem_get_assay_active_compounds
获取

---

PATH 8: Literature & Research (Collision-Aware)

路径8:文献与研究(碰撞感知)

Collision-Aware Query Strategy

碰撞感知查询策略

python
def path_literature_collision_aware(tu, ids):
    """
    Literature search with collision detection and filtering.
    """
    symbol = ids['symbol']
    full_name = ids.get('full_name', '')
    uniprot = ids['uniprot']
    synonyms = ids.get('synonyms', [])
    
    # Step 1: Detect collisions
    collision_filter = detect_collisions(tu, symbol, full_name)
    
    # Step 2: Build high-precision seed queries
    seed_queries = [
        f'"{symbol}"[Title] AND (protein OR gene OR expression)',  # Symbol in title
        f'"{full_name}"[Title]' if full_name else None,  # Full name in title
        f'"UniProt:{uniprot}"' if uniprot else None,  # UniProt accession
    ]
    seed_queries = [q for q in seed_queries if q]
    
    # Add key synonyms
    for syn in synonyms[:3]:
        seed_queries.append(f'"{syn}"[Title]')
    
    # Step 3: Execute seed queries and collect PMIDs
    seed_pmids = set()
    for query in seed_queries:
        if collision_filter:
            query = f"({query}){collision_filter}"
        results = tu.tools.PubMed_search_articles(query=query, limit=30)
        for article in results.get('articles', []):
            seed_pmids.add(article.get('pmid'))
    
    # Step 4: Expand via citation network (for sparse targets)
    if len(seed_pmids) < 30:
        expanded_pmids = set()
        for pmid in list(seed_pmids)[:10]:  # Top 10 seeds
            # Get related articles
            related = tu.tools.PubMed_get_related(pmid=pmid, limit=20)
            for r in related.get('articles', []):
                expanded_pmids.add(r.get('pmid'))
            
            # Get citing articles
            citing = tu.tools.EuropePMC_get_citations(pmid=pmid, limit=20)
            for c in citing.get('citations', []):
                expanded_pmids.add(c.get('pmid'))
        
        seed_pmids.update(expanded_pmids)
    
    # Step 5: Classify papers by evidence tier
    papers_by_tier = {'T1': [], 'T2': [], 'T3': [], 'T4': []}
    # ... classification logic based on title/abstract keywords
    
    return {
        'total_papers': len(seed_pmids),
        'collision_filter_applied': collision_filter if collision_filter else 'None needed',
        'seed_queries': seed_queries,
        'papers_by_tier': papers_by_tier
    }

python
def path_literature_collision_aware(tu, ids):
    """
    带碰撞检测和过滤的文献搜索。
    """
    symbol = ids['symbol']
    full_name = ids.get('full_name', '')
    uniprot = ids['uniprot']
    synonyms = ids.get('synonyms', [])
    
    # 步骤1:检测冲突
    collision_filter = detect_collisions(tu, symbol, full_name)
    
    # 步骤2:构建高精度种子查询
    seed_queries = [
        f'"{symbol}"[Title] AND (protein OR gene OR expression)',  # 标题中的符号
        f'"{full_name}"[Title]' if full_name else None,  # 标题中的全名
        f'"UniProt:{uniprot}"' if uniprot else None,  # UniProt登录号
    ]
    seed_queries = [q for q in seed_queries if q]
    
    # 添加关键同义词
    for syn in synonyms[:3]:
        seed_queries.append(f'"{syn}"[Title]')
    
    # 步骤3:执行种子查询并收集PMID
    seed_pmids = set()
    for query in seed_queries:
        if collision_filter:
            query = f"({query}){collision_filter}"
        results = tu.tools.PubMed_search_articles(query=query, limit=30)
        for article in results.get('articles', []):
            seed_pmids.add(article.get('pmid'))
    
    # 步骤4:通过引用网络扩展(针对稀疏靶点)
    if len(seed_pmids) < 30:
        expanded_pmids = set()
        for pmid in list(seed_pmids)[:10]:  # 前10个种子
            # 获取相关文章
            related = tu.tools.PubMed_get_related(pmid=pmid, limit=20)
            for r in related.get('articles', []):
                expanded_pmids.add(r.get('pmid'))
            
            # 获取引用文章
            citing = tu.tools.EuropePMC_get_citations(pmid=pmid, limit=20)
            for c in citing.get('citations', []):
                expanded_pmids.add(c.get('pmid'))
        
        seed_pmids.update(expanded_pmids)
    
    # 步骤5:按证据层级分类论文
    papers_by_tier = {'T1': [], 'T2': [], 'T3': [], 'T4': []}
    # ... 基于标题/摘要关键词的分类逻辑
    
    return {
        'total_papers': len(seed_pmids),
        'collision_filter_applied': collision_filter if collision_filter else '无需过滤',
        'seed_queries': seed_queries,
        'papers_by_tier': papers_by_tier
    }

Retry Logic & Fallback Chains

重试逻辑与备选流程

Retry Policy

重试策略

For each critical tool, implement retry with exponential backoff:
python
def call_with_retry(tu, tool_name, params, max_retries=3):
    """
    Call tool with retry logic.
    """
    for attempt in range(max_retries):
        try:
            result = getattr(tu.tools, tool_name)(**params)
            if result and not result.get('error'):
                return result
        except Exception as e:
            if attempt < max_retries - 1:
                time.sleep(2 ** attempt)  # Exponential backoff
            else:
                return {'error': str(e), 'tool': tool_name, 'attempts': max_retries}
    return None
针对每个关键工具,实现带指数退避的重试:
python
def call_with_retry(tu, tool_name, params, max_retries=3):
    """
    带重试逻辑的工具调用。
    """
    for attempt in range(max_retries):
        try:
            result = getattr(tu.tools, tool_name)(**params)
            if result and not result.get('error'):
                return result
        except Exception as e:
            if attempt < max_retries - 1:
                time.sleep(2 ** attempt)  # 指数退避
            else:
                return {'error': str(e), 'tool': tool_name, 'attempts': max_retries}
    return None

Fallback Chains (CRITICAL)

备选流程(关键)

Primary ToolFallback 1Fallback 2Failure Action
ChEMBL_get_target_activities
GtoPdb_get_target_ligands
OpenTargets drugs
Note in report
intact_get_interactions
STRING_get_protein_interactions
OpenTargets interactions
Note in report
GO_get_annotations_for_gene
OpenTargets GO
MyGene GO
Note in report
GTEx_get_median_gene_expression
HPA_get_rna_expression
Note as unavailableDocument in report
gnomad_get_gene_constraints
OpenTargets constraint
-Note in report
DGIdb_get_drug_gene_interactions
OpenTargets drugs
GtoPdb
Note in report
主工具备选1备选2失败操作
ChEMBL_get_target_activities
GtoPdb_get_target_ligands
OpenTargets drugs
在报告中注明
intact_get_interactions
STRING_get_protein_interactions
OpenTargets interactions
在报告中注明
GO_get_annotations_for_gene
OpenTargets GO
MyGene GO
在报告中注明
GTEx_get_median_gene_expression
HPA_get_rna_expression
-在报告中注明不可用
gnomad_get_gene_constraints
OpenTargets constraint
-在报告中注明
DGIdb_get_drug_gene_interactions
OpenTargets drugs
GtoPdb
在报告中注明

Failure Surfacing Rule

失败披露规则

NEVER silently skip failed tools. Always document:
markdown
undefined
永远不要静默跳过失败的工具。始终记录:
markdown
undefined

7.1 Tissue Expression

7.1组织表达

GTEx Data: Unavailable (API timeout after 3 attempts) Fallback Data (HPA):
TissueExpression LevelSpecificity
LiverHighEnhanced
KidneyMedium-
Note: For complete GTEx data, query directly at gtexportal.org

---
GTEx数据:不可用(3次尝试后API超时) 备选数据(HPA):
组织表达水平特异性
增强
-
注意:如需完整GTEx数据,请直接在gtexportal.org查询

---

Per-Section Data Minimums & Completeness Audit

按章节数据最小值与完整性审计

Minimum Data Requirements (Enforced)

最低数据要求(强制执行)

SectionMinimum DataIf Not Met
6. PPIs≥20 interactorsDocument which tools failed + why
7. ExpressionTop 10 tissues with TPM + HPA RNA summaryNote "limited data" with specific gaps
8. DiseaseTop 10 OT diseases + gnomAD constraints + ClinVar summarySeparate SNV/CNV; note if constraint unavailable
9. DruggabilityOT tractability + probes + drugs + DGIdb + GtoPdb fallback"No drugs/probes" is valid data
11. LiteratureTotal count + 5-year trend + 3-5 key papers with evidence tiersNote if sparse (<50 papers)
章节最低数据要求未满足时的操作
6.蛋白质相互作用≥20个相互作用蛋白记录哪些工具失败及原因
7.表达前10个带TPM值的组织 + HPA RNA摘要注明“数据有限”及具体缺口
8.疾病前10个Open Targets疾病 + gnomAD约束 + ClinVar摘要分离SNV/CNV;如约束不可用则注明
9.成药性Open Targets成药性 + 探针 + 药物 + DGIdb + GtoPdb备选“无药物/探针”属于有效数据
11.文献总数 + 5年趋势 + 3-5篇带证据层级的关键论文如文献稀疏(<50篇)则注明

Post-Run Completeness Audit

运行后完整性审计

Before finalizing the report, run this checklist:
markdown
undefined
在最终确定报告前,运行以下检查清单:
markdown
undefined

Completeness Audit (REQUIRED)

完整性审计(必填)

Data Minimums Check

数据最小值检查

  • PPIs: ≥20 interactors OR explanation why fewer
  • Expression: Top 10 tissues with values OR explicit "unavailable"
  • Diseases: Top 10 associations with scores OR "no associations"
  • Constraints: All 4 scores (pLI, LOEUF, missense Z, pRec) OR "unavailable"
  • Druggability: All modalities assessed; probes + drugs listed OR "none"
  • 蛋白质相互作用:≥20个相互作用蛋白或解释原因
  • 表达:前10个带数值的组织或明确标注“不可用”
  • 疾病:前10个带评分的关联或“无关联”
  • 约束:所有4个评分(pLI、LOEUF、错义Z、pRec)或“不可用”
  • 成药性:评估所有模态;列出探针+药物或“无”

Negative Results Documented

阴性结果记录

  • Empty tool results noted explicitly (not left blank)
  • Failed tools with fallbacks documented
  • "No data" sections have implications noted
  • 空工具结果已明确注明(未留空)
  • 失败工具及备选方案已记录
  • “无数据”章节已注明影响

Evidence Quality

证据质量

  • T1-T4 grades in Executive Summary disease claims
  • T1-T4 grades in Disease Associations table
  • Key papers table has evidence tiers
  • Per-section evidence summaries included
  • 执行摘要中的疾病结论带有T1-T4分级
  • 疾病关联表格带有T1-T4分级
  • 关键论文表格带有证据层级
  • 包含按章节的证据摘要

Source Attribution

来源标注

  • Every data point has source tool/database cited
  • Section-end source summaries present
undefined
  • 每个数据点都标注了来源工具/数据库
  • 章节末尾有来源摘要
undefined

Data Gap Table (Required if minimums not met)

数据缺口表格(未满足最小值时必填)

markdown
undefined
markdown
undefined

15. Data Gaps & Limitations

15.数据缺口与局限性

SectionExpected DataActualReasonAlternative Source
6. PPIs≥20 interactors8Novel target, limited studiesLiterature review needed
7. ExpressionGTEx TPMNoneVersioned ID not recognizedSee HPA data
9. ProbesChemical probesNoneNo validated probes existConsider tool compound dev
Recommendations for Data Gaps:
  1. For PPIs: Query BioGRID with broader parameters; check yeast-2-hybrid studies
  2. For Expression: Query GEO directly for tissue-specific datasets

---
章节预期数据实际数据原因替代来源
6.蛋白质相互作用≥20个相互作用蛋白8个新型靶点,研究有限需要文献综述
7.表达GTEx TPM版本化ID未被识别参见HPA数据
9.探针化学探针无经过验证的探针考虑开发工具化合物
数据缺口建议:
  1. 针对蛋白质相互作用:使用更广泛的参数查询BioGRID;检查酵母双杂交研究
  2. 针对表达:直接查询GEO获取组织特异性数据集

---

Report Template (Initial File)

报告模板(初始文件)

File:
[TARGET]_target_report.md
markdown
undefined
文件
[TARGET]_target_report.md
markdown
undefined

Target Intelligence Report: [TARGET NAME]

靶点情报报告:[靶点名称]

Generated: [Date] | Query: [Original query] | Status: In Progress

生成时间:[日期] | 查询内容:[原始查询] | 状态:研究中

1. Executive Summary

1.执行摘要

[Researching...]
<!-- REQUIRED: 2-3 sentences, disease claims must have T1-T4 grades -->
[研究中...]
<!-- 必填:2-3句话,疾病结论必须带有T1-T4分级 -->

2. Target Identifiers

2.靶点标识符

[Researching...]
<!-- REQUIRED: UniProt, Ensembl (versioned), Entrez, ChEMBL, HGNC, Symbol -->
[研究中...]
<!-- 必填:UniProt、Ensembl(带版本)、Entrez、ChEMBL、HGNC、符号 -->

3. Basic Information

3.基本信息

3.1 Protein Description

3.1蛋白质描述

[Researching...]
[研究中...]

3.2 Protein Function

3.2蛋白质功能

[Researching...]
[研究中...]

3.3 Subcellular Localization

3.3亚细胞定位

[Researching...]
[研究中...]

4. Structural Biology

4.结构生物学

4.1 Experimental Structures (PDB)

4.1实验结构(PDB)

[Researching...]
<!-- METHOD: 3-step chain (UniProt xrefs → sequence search → domain search) -->
[研究中...]
<!-- 方法:三步流程(UniProt交叉引用→序列搜索→结构域搜索) -->

4.2 AlphaFold Prediction

4.2 AlphaFold预测

[Researching...]
[研究中...]

4.3 Domain Architecture

4.3结构域架构

[Researching...]
[研究中...]

4.4 Key Structural Features

4.4关键结构特征

[Researching...]
[研究中...]

5. Function & Pathways

5.功能与通路

5.1 Gene Ontology Annotations

5.1基因本体注释

[Researching...]
<!-- REQUIRED: Evidence codes mapped to T1-T4 -->
[研究中...]
<!-- 必填:证据代码映射到T1-T4 -->

5.2 Pathway Involvement

5.2通路参与

[Researching...]
[研究中...]

6. Protein-Protein Interactions

6.蛋白质-蛋白质相互作用

[Researching...]
<!-- MINIMUM: ≥20 interactors OR explanation -->
[研究中...]
<!-- 最小值:≥20个相互作用蛋白或解释原因 -->

7. Expression Profile

7.表达谱

7.1 Tissue Expression (GTEx/HPA)

7.1组织表达(GTEx/HPA)

[Researching...]
<!-- NOTE: Use versioned Ensembl ID for GTEx if needed -->
[研究中...]
<!-- 注意:如需GTEx数据,尝试使用版本化Ensembl ID -->

7.2 Tissue Specificity

7.2组织特异性

[Researching...]
<!-- MINIMUM: Top 10 tissues with TPM values -->
[研究中...]
<!-- 最小值:前10个带TPM值的组织 -->

8. Genetic Variation & Disease

8.遗传变异与疾病

8.1 Constraint Scores

8.1约束评分

[Researching...]
<!-- REQUIRED: pLI, LOEUF, missense Z, pRec with interpretations -->
[研究中...]
<!-- 必填:pLI、LOEUF、错义Z、pRec及解读 -->

8.2 Disease Associations

8.2疾病关联

[Researching...]
<!-- REQUIRED: Top 10 with OT scores; T1-T4 evidence grades -->
[研究中...]
<!-- 必填:前10个带Open Targets评分的关联;T1-T4证据分级 -->

8.3 Clinical Variants (ClinVar)

8.3临床变异(ClinVar)

[Researching...]
<!-- REQUIRED: Separate SNV and CNV tables -->
[研究中...]
<!-- 必填:分离SNV和CNV表格 -->

8.4 Mouse Model Phenotypes

8.4小鼠模型表型

[Researching...]
[研究中...]

9. Druggability & Pharmacology

9.成药性与药理学

9.1 Tractability Assessment

9.1成药性评估

[Researching...]
<!-- REQUIRED: All modalities (SM, Ab, PROTAC, other) -->
[研究中...]
<!-- 必填:所有模态(小分子、抗体、PROTAC、其他) -->

9.2 Known Drugs

9.2已知药物

[Researching...]
[研究中...]

9.3 Chemical Probes

9.3化学探针

[Researching...]
<!-- NOTE: "No probes" is valid data - document explicitly -->
[研究中...]
<!-- 注意:“无探针”属于有效数据 - 明确记录 -->

9.4 Clinical Pipeline

9.4临床管线

[Researching...]
[研究中...]

9.5 ChEMBL Bioactivity

9.5 ChEMBL生物活性

[Researching...]
[研究中...]

10. Safety Profile

10.安全性概况

10.1 Safety Liabilities

10.1安全性风险

[Researching...]
[研究中...]

10.2 Expression-Based Toxicity Risk

10.2基于表达的毒性风险

[Researching...]
[研究中...]

10.3 Mouse KO Phenotypes

10.3小鼠敲除表型

[Researching...]
[研究中...]

11. Literature & Research Landscape

11.文献与研究态势

11.1 Publication Metrics

11.1出版物指标

[Researching...]
<!-- REQUIRED: Total, 5y, 1y, drug-related, clinical -->
[研究中...]
<!-- 必填:总数、5年、1年、药物相关、临床相关 -->

11.2 Research Trend

11.2研究趋势

[Researching...]
[研究中...]

11.3 Key Publications

11.3关键出版物

[Researching...]
<!-- REQUIRED: Table with PMID, title, year, evidence tier -->
[研究中...]
<!-- 必填:带PMID、标题、年份、证据层级的表格 -->

11.4 Evidence Summary by Theme

11.4按主题的证据摘要

[Researching...]
<!-- REQUIRED: T1-T4 breakdown per research theme -->
[研究中...]
<!-- 必填:按研究主题的T1-T4细分 -->

12. Competitive Landscape

12.竞争态势

[Researching...]
[研究中...]

13. Summary & Recommendations

13.总结与建议

13.1 Target Validation Scorecard

13.1靶点验证评分卡

[Researching...]
<!-- REQUIRED: 6 criteria, 1-5 scores, evidence quality noted -->
[研究中...]
<!-- 必填:6个标准、1-5分、注明证据质量 -->

13.2 Strengths

13.2优势

[Researching...]
[研究中...]

13.3 Challenges & Risks

13.3挑战与风险

[Researching...]
[研究中...]

13.4 Recommendations

13.4建议

[Researching...]
<!-- REQUIRED: ≥3 prioritized (HIGH/MEDIUM/LOW) -->
[研究中...]
<!-- 必填:≥3个优先级(高/中/低) -->

14. Data Sources & Methodology

14.数据来源与方法

[Will be populated as research progresses...]
[将随着研究进展填充...]

15. Data Gaps & Limitations

15.数据缺口与局限性

[To be populated post-audit...]

---
[审计后填充...]

---

Quick Reference: Tool Parameters

快速参考:工具参数

ToolParameterNotes
Reactome_map_uniprot_to_pathways
id
NOT
uniprot_id
ensembl_get_xrefs
id
NOT
gene_id
GTEx_get_median_gene_expression
gencode_id
,
operation
Try versioned ID if empty
OpenTargets_*
ensemblId
camelCase, not
ensemblID
STRING_get_protein_interactions
protein_ids
,
species
List format for IDs
intact_get_interactions
identifier
UniProt accession

工具参数注意事项
Reactome_map_uniprot_to_pathways
id
不是
uniprot_id
ensembl_get_xrefs
id
不是
gene_id
GTEx_get_median_gene_expression
gencode_id
,
operation
如返回空结果,尝试版本化ID
OpenTargets_*
ensemblId
小驼峰命名,不是
ensemblID
STRING_get_protein_interactions
protein_ids
,
species
ID为列表格式
intact_get_interactions
identifier
UniProt登录号

When NOT to Use This Skill

何时不使用该技能

  • Simple protein lookup → Use
    UniProt_get_entry_by_accession
    directly
  • Drug information only → Use drug-focused tools
  • Disease-centric query → Use disease-intelligence-gatherer skill
  • Sequence retrieval → Use sequence-retrieval skill
  • Structure download → Use protein-structure-retrieval skill
Use this skill for comprehensive, multi-angle target analysis with guaranteed data completeness.
  • 简单蛋白质查询 → 直接使用
    UniProt_get_entry_by_accession
  • 仅需药物信息 → 使用药物专用工具
  • 疾病中心型查询 → 使用疾病情报收集技能
  • 序列检索 → 使用序列检索技能
  • 结构下载 → 使用蛋白质结构检索技能
当需要全面、多角度的靶点分析并保证数据完整性时,使用该技能。