tooluniverse-target-research

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Comprehensive Target Intelligence Gatherer

全面靶点情报收集工具

Gather complete target intelligence by exploring 9 parallel research paths. Supports targets identified by gene symbol, UniProt accession, Ensembl ID, or gene name.

KEY PRINCIPLES:

Report-first approach - Create report file FIRST, then populate progressively
Tool parameter verification - Verify params via
```
get_tool_info
```
before calling unfamiliar tools
Evidence grading - Grade all claims by evidence strength (T1-T4)
Citation requirements - Every fact must have inline source attribution
Mandatory completeness - All sections must exist with data minimums or explicit "No data" notes
Disambiguation first - Resolve all identifiers before research
Negative results documented - "No drugs found" is data; empty sections are failures
Collision-aware literature search - Detect and filter naming collisions
English-first queries - Always use English terms in tool calls, even if the user writes in another language. Translate gene names, disease names, and search terms to English. Only try original-language terms as a fallback if English returns no results. Respond in the user's language

通过探索9条并行研究路径，收集完整的靶点情报。支持通过基因符号、UniProt登录号、Ensembl ID或基因名称识别靶点。

核心原则:

先报告后执行 - 先创建报告文件，再逐步填充内容
工具参数验证 - 在调用不熟悉的工具前，通过
```
get_tool_info
```
验证参数
证据分级 - 按证据强度（T1-T4）为所有结论分级
引用要求 - 每个事实必须附带内联来源标注
强制完整性 - 所有章节必须存在，要么包含最低要求的数据，要么明确标注“无数据”
先消歧再研究 - 在开展研究前解析所有标识符
记录阴性结果 - “未找到药物”属于有效数据；空白章节视为失败
碰撞感知文献搜索 - 检测并过滤命名冲突
优先英文查询 - 即使用户使用其他语言，工具调用也始终使用英文术语。将基因名称、疾病名称和搜索术语翻译成英文。仅当英文查询无结果时，才尝试使用原语言术语作为备选。最终以用户使用的语言回复

Phase 0: Tool Parameter Verification (CRITICAL)

阶段0：工具参数验证（关键步骤）

BEFORE calling ANY tool for the first time, verify its parameters:

python

undefined

在首次调用任何工具之前，必须验证其参数：

python

undefined

Always check tool params to prevent silent failures

始终检查工具参数以避免静默失败

tool_info = tu.tools.get_tool_info(tool_name="Reactome_map_uniprot_to_pathways")

Reveals: takes

id

not

uniprot_id

结果显示：该工具接受

id

参数，而非

uniprot_id

undefined

undefined

Known Parameter Corrections (Updated)

已知参数修正（更新版）

Tool	WRONG Parameter	CORRECT Parameter
`Reactome_map_uniprot_to_pathways`	`uniprot_id`	`id`
`ensembl_get_xrefs`	`gene_id`	`id`
`GTEx_get_median_gene_expression`	`gencode_id` only	`gencode_id` + `operation="median"`
`OpenTargets_*`	`ensemblID`	`ensemblId` (camelCase)

工具	错误参数	正确参数
`Reactome_map_uniprot_to_pathways`	`uniprot_id`	`id`
`ensembl_get_xrefs`	`gene_id`	`id`
`GTEx_get_median_gene_expression`	仅 `gencode_id`	`gencode_id` + `operation="median"`
`OpenTargets_*`	`ensemblID`	`ensemblId` （小驼峰命名）

GTEx Versioned ID Fallback (CRITICAL)

GTEx版本化ID备选方案（关键步骤）

GTEx often requires versioned Ensembl IDs. If

ENSG00000123456

returns empty:

python

undefined

GTEx通常需要带版本的Ensembl ID。如果

ENSG00000123456

返回空结果：

python

undefined

Step 1: Get gene info with version

步骤1：获取带版本的基因信息

gene_info = tu.tools.ensembl_lookup_gene(id=ensembl_id, species="human") version = gene_info.get('version', 1)

Step 2: Try versioned ID

步骤2：尝试使用版本化ID

versioned_id = f"{ensembl_id}.{version}" # e.g., "ENSG00000123456.12" result = tu.tools.GTEx_get_median_gene_expression( gencode_id=versioned_id, operation="median" )

---

versioned_id = f"{ensembl_id}.{version}" # 示例："ENSG00000123456.12" result = tu.tools.GTEx_get_median_gene_expression( gencode_id=versioned_id, operation="median" )

---

When to Use This Skill

何时使用该技能

Apply when users:

Ask about a drug target, protein, or gene
Need target validation or assessment
Request druggability analysis
Want comprehensive target profiling
Ask "what do we know about [target]?"
Need target-disease associations
Request safety profile for a target

当用户有以下需求时适用：

询问药物靶点、蛋白质或基因相关信息
需要靶点验证或评估
请求成药性分析
想要全面的靶点分析
询问“关于[靶点]我们了解哪些信息？”
需要靶点-疾病关联数据
请求靶点的安全性分析

Critical Workflow Requirements

关键工作流要求

1. Report-First Approach (MANDATORY)

1. 先报告后执行（强制要求）

DO NOT show the search process or tool outputs to the user. Instead:

Create the report file FIRST - Before any data collection:
- File name:
```
[TARGET]_target_report.md
```
- Initialize with all 14 section headers
- Add placeholder:
```
[Researching...]
```
  in each section
Progressively update the report - As you gather data:
- Update each section immediately after retrieving data
- Replace
```
[Researching...]
```
  with actual content
- Include "No data returned" when tools return empty results
Methodology in appendix only - If user requests methodology details, create separate
```
[TARGET]_methods_appendix.md
```

禁止向用户展示搜索过程或工具输出。正确流程如下：

先创建报告文件 - 在收集任何数据之前：
- 文件名：
```
[TARGET]_target_report.md
```
- 初始化所有14个章节标题
- 在每个章节中添加占位符：
```
[研究中...]
```
逐步更新报告 - 收集到数据后立即更新：
- 获取数据后立即更新对应章节
- 将
```
[研究中...]
```
  替换为实际内容
- 当工具返回空结果时，标注“未返回数据”
方法论仅放在附录 - 如果用户请求方法论细节，创建单独的
```
[TARGET]_methods_appendix.md
```
文件

2. Evidence Grading System (MANDATORY)

2. 证据分级系统（强制要求）

CRITICAL: Grade every claim by evidence strength.

关键：为每个结论按证据强度分级。

Evidence Tiers

证据层级

Tier	Symbol	Criteria	Examples
T1	★★★	Direct mechanistic evidence, human genetic proof	CRISPR KO, patient mutations, crystal structure with mechanism
T2	★★☆	Functional studies, model organism validation	siRNA phenotype, mouse KO, biochemical assay
T3	★☆☆	Association, screen hits, computational	GWAS hit, DepMap essentiality, expression correlation
T4	☆☆☆	Mention, review, text-mined, predicted	Review article, database annotation, computational prediction

层级	符号	标准	示例
T1	★★★	直接机制证据、人类遗传学证明	CRISPR敲除、患者突变、带机制解析的晶体结构
T2	★★☆	功能研究、模式生物验证	siRNA表型、小鼠敲除、生化分析
T3	★☆☆	关联、筛选命中、计算预测	GWAS命中、DepMap必需性、表达相关性
T4	☆☆☆	提及、综述、文本挖掘、预测	综述文章、数据库注释、计算预测

Required Evidence Grading Locations

证据分级必填位置

Evidence grades MUST appear in:

Executive Summary - Key disease claims graded
Section 8.2 Disease Associations - Every disease link graded with source type
Section 11 Literature - Key papers table with evidence tier
Section 13 Recommendations - Scorecard items reference evidence quality

证据等级必须出现在：

执行摘要 - 关键疾病结论需分级
8.2疾病关联 - 每个疾病关联需标注来源类型和分级
11.文献 - 关键论文表格需包含证据层级
13.建议 - 评分卡条目需引用证据质量

Per-Section Evidence Summary

按章节的证据摘要

markdown

---
**Evidence Quality for this Section**: Strong
- Mechanistic (T1): 12 papers
- Functional (T2): 8 papers
- Association (T3): 15 papers
- Mention (T4): 23 papers
**Data Gaps**: No CRISPR data; mouse KO phenotypes limited
---

markdown

---
**本章证据质量**：强
- 机制证据（T1）：12篇论文
- 功能证据（T2）：8篇论文
- 关联证据（T3）：15篇论文
- 提及证据（T4）：23篇论文
**数据缺口**：无CRISPR数据；小鼠敲除表型数据有限
---

3. Citation Requirements (MANDATORY)

3. 引用要求（强制要求）

Every piece of information MUST include its source:

markdown

EGFR mutations cause lung adenocarcinoma [★★★: PMID:15118125, activating mutations 
in patients]. *Source: ClinVar, CIViC*

每条信息必须包含来源：

markdown

EGFR突变导致肺腺癌 [★★★: PMID:15118125, 患者体内的激活突变]。*来源：ClinVar, CIViC*

Core Strategy: 9 Research Paths

核心策略：9条研究路径

Execute 9 research paths (Path 0 is always first):

Target Query (e.g., "EGFR" or "P00533")
│
├─ IDENTIFIER RESOLUTION (always first)
│   └─ Check if GPCR → GPCRdb_get_protein
│
├─ PATH 0: Open Targets Foundation (ALWAYS FIRST - fills gaps in all other paths)
│
├─ PATH 1: Core Identity (names, IDs, sequence, organism)
│   └─ InterProScan_scan_sequence for novel domain prediction (NEW)
├─ PATH 2: Structure & Domains (3D structure, domains, binding sites)
│   └─ If GPCR: GPCRdb_get_structures (active/inactive states)
├─ PATH 3: Function & Pathways (GO terms, pathways, biological role)
├─ PATH 4: Protein Interactions (PPI network, complexes)
├─ PATH 5: Expression Profile (tissue expression, single-cell)
├─ PATH 6: Variants & Disease (mutations, clinical significance)
│   └─ DisGeNET_search_gene for curated gene-disease associations
├─ PATH 7: Drug Interactions (known drugs, druggability, safety)
│   ├─ Pharos_get_target for TDL classification (Tclin/Tchem/Tbio/Tdark)
│   ├─ BindingDB_get_ligands_by_uniprot for known ligands (NEW)
│   ├─ PubChem_search_assays_by_target_gene for HTS data (NEW)
│   ├─ If GPCR: GPCRdb_get_ligands (curated agonists/antagonists)
│   └─ DepMap_get_gene_dependencies for target essentiality
└─ PATH 8: Literature & Research (publications, trends)

执行9条研究路径（路径0始终优先执行）：

靶点查询（例如："EGFR"或"P00533"）
│
├─ 标识符解析（始终第一步）
│   └─ 检查是否为GPCR → 调用GPCRdb_get_protein
│
├─ 路径0：Open Targets基础数据（始终优先执行 - 填补其他路径的缺口）
│
├─ 路径1：核心身份（名称、ID、序列、物种）
│   └─ 调用InterProScan_scan_sequence进行新结构域预测（新增）
├─ 路径2：结构与结构域（3D结构、结构域、结合位点）
│   └─ 如果是GPCR：调用GPCRdb_get_structures获取激活/非激活状态结构
├─ 路径3：功能与通路（GO术语、通路、生物学角色）
├─ 路径4：蛋白质相互作用（PPI网络、复合物）
├─ 路径5：表达谱（组织表达、单细胞表达）
├─ 路径6：变异与疾病（突变、临床意义）
│   └─ 调用DisGeNET_search_gene获取 curated 基因-疾病关联
├─ 路径7：药物相互作用（已知药物、成药性、安全性）
│   ├─ 调用Pharos_get_target获取TDL分类（Tclin/Tchem/Tbio/Tdark）
│   ├─ 调用BindingDB_get_ligands_by_uniprot获取已知配体（新增）
│   ├─ 调用PubChem_search_assays_by_target_gene获取HTS数据（新增）
│   ├─ 如果是GPCR：调用GPCRdb_get_ligands获取 curated 激动剂/拮抗剂
│   └─ 调用DepMap_get_gene_dependencies获取靶点必需性
└─ 路径8：文献与研究（出版物、趋势）

Identifier Resolution (Phase 1)

标识符解析（阶段1）

CRITICAL: Resolve ALL identifiers before any research path.

python

def resolve_target_ids(tu, query):
    """
    Resolve target query to ALL needed identifiers.
    Returns dict with: query, uniprot, ensembl, ensembl_version, symbol, 
    entrez, chembl_target, hgnc
    """
    ids = {
        'query': query, 
        'uniprot': None, 
        'ensembl': None, 
        'ensembl_versioned': None,  # For GTEx
        'symbol': None,
        'entrez': None,
        'chembl_target': None,
        'hgnc': None,
        'full_name': None,
        'synonyms': []
    }
    
    # [Resolution logic based on input type]
    # ... (see current implementation)
    
    # CRITICAL: Get versioned Ensembl ID for GTEx
    if ids['ensembl']:
        gene_info = tu.tools.ensembl_lookup_gene(id=ids['ensembl'], species="human")
        if gene_info and gene_info.get('version'):
            ids['ensembl_versioned'] = f"{ids['ensembl']}.{gene_info['version']}"
        
        # Also get synonyms for literature collision detection
        ids['full_name'] = gene_info.get('description', '').split(' [')[0]
    
    # Get UniProt alternative names for synonyms
    if ids['uniprot']:
        alt_names = tu.tools.UniProt_get_alternative_names_by_accession(accession=ids['uniprot'])
        if alt_names:
            ids['synonyms'].extend(alt_names)
    
    return ids

关键：在开展任何研究路径之前，解析所有标识符。

python

def resolve_target_ids(tu, query):
    """
    将靶点查询解析为所有所需标识符。
    返回包含以下字段的字典：query, uniprot, ensembl, ensembl_version, symbol, 
    entrez, chembl_target, hgnc
    """
    ids = {
        'query': query, 
        'uniprot': None, 
        'ensembl': None, 
        'ensembl_versioned': None,  # 用于GTEx
        'symbol': None,
        'entrez': None,
        'chembl_target': None,
        'hgnc': None,
        'full_name': None,
        'synonyms': []
    }
    
    # [基于输入类型的解析逻辑]
    # ...（参见当前实现）
    
    # 关键：获取带版本的Ensembl ID用于GTEx
    if ids['ensembl']:
        gene_info = tu.tools.ensembl_lookup_gene(id=ids['ensembl'], species="human")
        if gene_info and gene_info.get('version'):
            ids['ensembl_versioned'] = f"{ids['ensembl']}.{gene_info['version']}"
        
        # 同时获取同义词用于文献碰撞检测
        ids['full_name'] = gene_info.get('description', '').split(' [')[0]
    
    # 获取UniProt别名作为同义词
    if ids['uniprot']:
        alt_names = tu.tools.UniProt_get_alternative_names_by_accession(accession=ids['uniprot'])
        if alt_names:
            ids['synonyms'].extend(alt_names)
    
    return ids

GPCR Target Detection (NEW)

GPCR靶点检测（新增）

~35% of approved drugs target GPCRs. After identifier resolution, check if target is a GPCR:

python

def check_gpcr_target(tu, ids):
    """
    Check if target is a GPCR and retrieve specialized data.
    Call after identifier resolution.
    """
    symbol = ids.get('symbol', '')
    
    # Build GPCRdb entry name
    entry_name = f"{symbol.lower()}_human"
    
    gpcr_info = tu.tools.GPCRdb_get_protein(
        operation="get_protein",
        protein=entry_name
    )
    
    if gpcr_info.get('status') == 'success':
        # Target is a GPCR - get specialized data
        
        # Get structures with receptor state
        structures = tu.tools.GPCRdb_get_structures(
            operation="get_structures",
            protein=entry_name
        )
        
        # Get known ligands (critical for binder projects)
        ligands = tu.tools.GPCRdb_get_ligands(
            operation="get_ligands",
            protein=entry_name
        )
        
        # Get mutation data
        mutations = tu.tools.GPCRdb_get_mutations(
            operation="get_mutations",
            protein=entry_name
        )
        
        return {
            'is_gpcr': True,
            'gpcr_family': gpcr_info['data'].get('family'),
            'gpcr_class': gpcr_info['data'].get('receptor_class'),
            'structures': structures.get('data', {}).get('structures', []),
            'ligands': ligands.get('data', {}).get('ligands', []),
            'mutations': mutations.get('data', {}).get('mutations', []),
            'ballesteros_numbering': True  # GPCRdb provides this
        }
    
    return {'is_gpcr': False}

GPCRdb Report Section (add to Section 2 for GPCR targets):

markdown

undefined

约35%的获批药物靶点为GPCR。完成标识符解析后，检查靶点是否为GPCR：

python

def check_gpcr_target(tu, ids):
    """
    检查靶点是否为GPCR并获取专用数据。
    在标识符解析后调用。
    """
    symbol = ids.get('symbol', '')
    
    # 构建GPCRdb条目名称
    entry_name = f"{symbol.lower()}_human"
    
    gpcr_info = tu.tools.GPCRdb_get_protein(
        operation="get_protein",
        protein=entry_name
    )
    
    if gpcr_info.get('status') == 'success':
        # 靶点为GPCR - 获取专用数据
        
        # 获取带受体状态的结构
        structures = tu.tools.GPCRdb_get_structures(
            operation="get_structures",
            protein=entry_name
        )
        
        # 获取已知配体（对结合物项目至关重要）
        ligands = tu.tools.GPCRdb_get_ligands(
            operation="get_ligands",
            protein=entry_name
        )
        
        # 获取突变数据
        mutations = tu.tools.GPCRdb_get_mutations(
            operation="get_mutations",
            protein=entry_name
        )
        
        return {
            'is_gpcr': True,
            'gpcr_family': gpcr_info['data'].get('family'),
            'gpcr_class': gpcr_info['data'].get('receptor_class'),
            'structures': structures.get('data', {}).get('structures', []),
            'ligands': ligands.get('data', {}).get('ligands', []),
            'mutations': mutations.get('data', {}).get('mutations', []),
            'ballesteros_numbering': True  # GPCRdb提供该编号
        }
    
    return {'is_gpcr': False}

GPCRdb报告章节（为GPCR靶点添加到章节2）：

markdown

undefined

2.x GPCR-Specific Data (GPCRdb)

2.x GPCR专用数据（GPCRdb）

Receptor Class: Class A (Rhodopsin-like)
GPCR Family: Adrenoceptors

Structures by State:

PDB ID	State	Resolution	Ligand	Year
3SN6	Active	3.2Å	Agonist (BI-167107)	2011
2RH1	Inactive	2.4Å	Antagonist (carazolol)	2007

Known Ligands: 45 agonists, 32 antagonists, 8 allosteric modulators
Key Binding Site Residues (Ballesteros-Weinstein): 3.32, 5.42, 6.48, 7.39

undefined

受体类别：A类（视紫红质样）
GPCR家族：肾上腺素能受体

按状态分类的结构:

PDB ID	状态	分辨率	配体	年份
3SN6	激活态	3.2Å	激动剂（BI-167107）	2011
2RH1	非激活态	2.4Å	拮抗剂（卡拉洛尔）	2007

已知配体：45种激动剂、32种拮抗剂、8种变构调节剂
关键结合位点残基（Ballesteros-Weinstein编号）：3.32, 5.42, 6.48, 7.39

undefined

Collision Detection for Literature Search

文献搜索的碰撞检测

Before literature search, detect naming collisions:

python

def detect_collisions(tu, symbol, full_name):
    """
    Detect if gene symbol has naming collisions in literature.
    Returns negative filter terms if collisions found.
    """
    # Search by symbol in title
    results = tu.tools.PubMed_search_articles(
        query=f'"{symbol}"[Title]',
        limit=20
    )
    
    # Check if >20% are off-topic
    off_topic_terms = []
    for paper in results.get('articles', []):
        title = paper.get('title', '').lower()
        # Check if title mentions biology/protein/gene context
        bio_terms = ['protein', 'gene', 'cell', 'expression', 'mutation', 'kinase', 'receptor']
        if not any(term in title for term in bio_terms):
            # Extract potential collision terms
            # e.g., "JAK" might collide with "Just Another Kinase" jokes
            # e.g., "WDR7" might collide with other WDR family members in certain contexts
            pass
    
    # Build negative filter
    collision_filter = ""
    if off_topic_terms:
        collision_filter = " NOT " + " NOT ".join(off_topic_terms)
    
    return collision_filter

在文献搜索前，检测命名冲突：

python

def detect_collisions(tu, symbol, full_name):
    """
    检测基因符号在文献中是否存在命名冲突。
    如果发现冲突，返回负面过滤术语。
    """
    # 按标题中的符号搜索
    results = tu.tools.PubMed_search_articles(
        query=f'"{symbol}"[Title]',
        limit=20
    )
    
    # 检查是否超过20%的结果偏离主题
    off_topic_terms = []
    for paper in results.get('articles', []):
        title = paper.get('title', '').lower()
        # 检查标题是否提及生物学/蛋白质/基因相关语境
        bio_terms = ['protein', 'gene', 'cell', 'expression', 'mutation', 'kinase', 'receptor']
        if not any(term in title for term in bio_terms):
            # 提取潜在冲突术语
            # 例如："JAK"可能与"Just Another Kinase"玩笑冲突
            # 例如："WDR7"在某些语境下可能与其他WDR家族成员冲突
            pass
    
    # 构建负面过滤器
    collision_filter = ""
    if off_topic_terms:
        collision_filter = " NOT " + " NOT ".join(off_topic_terms)
    
    return collision_filter

PATH 0: Open Targets Foundation (ALWAYS FIRST)

路径0：Open Targets基础数据（始终优先执行）

Objective: Populate baseline data for Sections 5, 8, 9, 10, 11 before specialized queries.

CRITICAL: Open Targets provides the most comprehensive aggregated data. Query ALL these endpoints:

Endpoint	Section	Data Type
`OpenTargets_get_diseases_phenotypes_by_target_ensemblId`	8	Diseases/phenotypes
`OpenTargets_get_target_tractability_by_ensemblId`	9	Druggability assessment
`OpenTargets_get_target_safety_profile_by_ensemblId`	10	Safety liabilities
`OpenTargets_get_target_interactions_by_ensemblId`	6	PPI network
`OpenTargets_get_target_gene_ontology_by_ensemblId`	5	GO annotations
`OpenTargets_get_publications_by_target_ensemblId`	11	Literature
`OpenTargets_get_biological_mouse_models_by_ensemblId`	8/10	Mouse KO phenotypes
`OpenTargets_get_chemical_probes_by_target_ensemblId`	9	Chemical probes
`OpenTargets_get_associated_drugs_by_target_ensemblId`	9	Known drugs

目标：在进行专用查询前，填充章节5、8、9、10、11的基线数据。

关键：Open Targets提供最全面的聚合数据。需查询所有以下端点：

端点	章节	数据类型
`OpenTargets_get_diseases_phenotypes_by_target_ensemblId`	8	疾病/表型
`OpenTargets_get_target_tractability_by_ensemblId`	9	成药性评估
`OpenTargets_get_target_safety_profile_by_ensemblId`	10	安全性风险
`OpenTargets_get_target_interactions_by_ensemblId`	6	PPI网络
`OpenTargets_get_target_gene_ontology_by_ensemblId`	5	GO注释
`OpenTargets_get_publications_by_target_ensemblId`	11	文献
`OpenTargets_get_biological_mouse_models_by_ensemblId`	8/10	小鼠敲除表型
`OpenTargets_get_chemical_probes_by_target_ensemblId`	9	化学探针
`OpenTargets_get_associated_drugs_by_target_ensemblId`	9	已知药物

Path 0 Implementation

路径0实现

python

def path_0_open_targets(tu, ids):
    """
    Open Targets foundation data - fills gaps for sections 5, 6, 8, 9, 10, 11.
    ALWAYS run this first.
    """
    ensembl_id = ids['ensembl']
    if not ensembl_id:
        return {'status': 'skipped', 'reason': 'No Ensembl ID'}
    
    results = {}
    
    # 1. Diseases & Phenotypes (Section 8)
    diseases = tu.tools.OpenTargets_get_diseases_phenotypes_by_target_ensemblId(
        ensemblId=ensembl_id
    )
    results['diseases'] = diseases if diseases else {'note': 'No disease associations returned'}
    
    # 2. Tractability (Section 9)
    tractability = tu.tools.OpenTargets_get_target_tractability_by_ensemblId(
        ensemblId=ensembl_id
    )
    results['tractability'] = tractability if tractability else {'note': 'No tractability data returned'}
    
    # 3. Safety Profile (Section 10)
    safety = tu.tools.OpenTargets_get_target_safety_profile_by_ensemblId(
        ensemblId=ensembl_id
    )
    results['safety'] = safety if safety else {'note': 'No safety liabilities identified'}
    
    # 4. Interactions (Section 6)
    interactions = tu.tools.OpenTargets_get_target_interactions_by_ensemblId(
        ensemblId=ensembl_id
    )
    results['interactions'] = interactions if interactions else {'note': 'No interactions returned'}
    
    # 5. GO Annotations (Section 5)
    go_terms = tu.tools.OpenTargets_get_target_gene_ontology_by_ensemblId(
        ensemblId=ensembl_id
    )
    results['go_terms'] = go_terms if go_terms else {'note': 'No GO annotations returned'}
    
    # 6. Publications (Section 11)
    publications = tu.tools.OpenTargets_get_publications_by_target_ensemblId(
        ensemblId=ensembl_id
    )
    results['publications'] = publications if publications else {'note': 'No publications returned'}
    
    # 7. Mouse Models (Section 8/10)
    mouse_models = tu.tools.OpenTargets_get_biological_mouse_models_by_ensemblId(
        ensemblId=ensembl_id
    )
    results['mouse_models'] = mouse_models if mouse_models else {'note': 'No mouse model data returned'}
    
    # 8. Chemical Probes (Section 9)
    probes = tu.tools.OpenTargets_get_chemical_probes_by_target_ensemblId(
        ensemblId=ensembl_id
    )
    results['chemical_probes'] = probes if probes else {'note': 'No chemical probes available'}
    
    # 9. Associated Drugs (Section 9)
    drugs = tu.tools.OpenTargets_get_associated_drugs_by_target_ensemblId(
        ensemblId=ensembl_id
    )
    results['drugs'] = drugs if drugs else {'note': 'No approved/trial drugs found'}
    
    return results

python

def path_0_open_targets(tu, ids):
    """
    Open Targets基础数据 - 填补章节5、6、8、9、10、11的缺口。
    始终优先运行该路径。
    """
    ensembl_id = ids['ensembl']
    if not ensembl_id:
        return {'status': 'skipped', 'reason': '无Ensembl ID'}
    
    results = {}
    
    # 1. 疾病与表型（章节8）
    diseases = tu.tools.OpenTargets_get_diseases_phenotypes_by_target_ensemblId(
        ensemblId=ensembl_id
    )
    results['diseases'] = diseases if diseases else {'note': '未返回疾病关联数据'}
    
    # 2. 成药性（章节9）
    tractability = tu.tools.OpenTargets_get_target_tractability_by_ensemblId(
        ensemblId=ensembl_id
    )
    results['tractability'] = tractability if tractability else {'note': '未返回成药性数据'}
    
    # 3. 安全性概况（章节10）
    safety = tu.tools.OpenTargets_get_target_safety_profile_by_ensemblId(
        ensemblId=ensembl_id
    )
    results['safety'] = safety if safety else {'note': '未识别到安全性风险'}
    
    # 4. 相互作用（章节6）
    interactions = tu.tools.OpenTargets_get_target_interactions_by_ensemblId(
        ensemblId=ensembl_id
    )
    results['interactions'] = interactions if interactions else {'note': '未返回相互作用数据'}
    
    # 5. GO注释（章节5）
    go_terms = tu.tools.OpenTargets_get_target_gene_ontology_by_ensemblId(
        ensemblId=ensembl_id
    )
    results['go_terms'] = go_terms if go_terms else {'note': '未返回GO注释数据'}
    
    # 6. 出版物（章节11）
    publications = tu.tools.OpenTargets_get_publications_by_target_ensemblId(
        ensemblId=ensembl_id
    )
    results['publications'] = publications if publications else {'note': '未返回出版物数据'}
    
    # 7. 小鼠模型（章节8/10）
    mouse_models = tu.tools.OpenTargets_get_biological_mouse_models_by_ensemblId(
        ensemblId=ensembl_id
    )
    results['mouse_models'] = mouse_models if mouse_models else {'note': '未返回小鼠模型数据'}
    
    # 8. 化学探针（章节9）
    probes = tu.tools.OpenTargets_get_chemical_probes_by_target_ensemblId(
        ensemblId=ensembl_id
    )
    results['chemical_probes'] = probes if probes else {'note': '无可用化学探针'}
    
    # 9. 关联药物（章节9）
    drugs = tu.tools.OpenTargets_get_associated_drugs_by_target_ensemblId(
        ensemblId=ensembl_id
    )
    results['drugs'] = drugs if drugs else {'note': '未找到获批/临床试验药物'}
    
    return results

Negative Results Are Data

阴性结果也是有效数据

CRITICAL: Always document when a query returns empty:

markdown

undefined

关键：始终记录查询返回空结果的情况：

markdown

undefined

9.3 Chemical Probes

9.3化学探针

Status: No validated chemical probes available for this target. Source: OpenTargets_get_chemical_probes_by_target_ensemblId returned empty

Implication: Tool compound development would be needed for chemical biology studies.

---

状态：该靶点暂无经过验证的化学探针。 来源：OpenTargets_get_chemical_probes_by_target_ensemblId返回空结果

影响：化学生物学研究需要开发工具化合物。

---

PATH 2: Structure & Domains (Enhanced)

路径2：结构与结构域（增强版）

Objective: Robust structure coverage using 3-step chain.

目标：通过三步流程实现可靠的结构覆盖。

3-Step Structure Search Chain

三步结构搜索流程

Do NOT rely solely on PDB text search. Use this chain:

python

def path_structure_robust(tu, ids):
    """
    Robust structure search using 3-step chain.
    """
    structures = {'pdb': [], 'alphafold': None, 'domains': [], 'method_notes': []}
    
    # STEP 1: UniProt PDB Cross-References (most reliable)
    if ids['uniprot']:
        entry = tu.tools.UniProt_get_entry_by_accession(accession=ids['uniprot'])
        pdb_xrefs = [x for x in entry.get('uniProtKBCrossReferences', []) 
                    if x.get('database') == 'PDB']
        for xref in pdb_xrefs:
            pdb_id = xref.get('id')
            # Get details for each PDB
            pdb_info = tu.tools.get_protein_metadata_by_pdb_id(pdb_id=pdb_id)
            if pdb_info:
                structures['pdb'].append(pdb_info)
        structures['method_notes'].append(f"Step 1: {len(pdb_xrefs)} PDB cross-refs from UniProt")
    
    # STEP 2: Sequence-based PDB Search (catches missing annotations)
    if ids['uniprot'] and len(structures['pdb']) < 5:
        sequence = tu.tools.UniProt_get_sequence_by_accession(accession=ids['uniprot'])
        if sequence and len(sequence) < 1000:  # Reasonable length for search
            similar = tu.tools.PDB_search_similar_structures(
                sequence=sequence[:500],  # Use first 500 AA if long
                identity_cutoff=0.7
            )
            if similar:
                for hit in similar[:10]:  # Top 10 similar
                    if hit['pdb_id'] not in [s.get('pdb_id') for s in structures['pdb']]:
                        structures['pdb'].append(hit)
        structures['method_notes'].append(f"Step 2: Sequence search (identity ≥70%)")
    
    # STEP 3: Domain-based Search (for multi-domain proteins)
    if ids['uniprot']:
        domains = tu.tools.InterPro_get_protein_domains(uniprot_accession=ids['uniprot'])
        structures['domains'] = domains if domains else []
        
        # For large proteins with domains, search by domain sequence windows
        if len(structures['pdb']) < 3 and domains:
            for domain in domains[:3]:  # Top 3 domains
                domain_name = domain.get('name', '')
                # Could search PDB by domain name
                domain_hits = tu.tools.PDB_search_by_keyword(query=domain_name, limit=5)
                if domain_hits:
                    structures['method_notes'].append(f"Step 3: Domain '{domain_name}' search")
    
    # AlphaFold (always check)
    alphafold = tu.tools.alphafold_get_prediction(uniprot_accession=ids['uniprot'])
    structures['alphafold'] = alphafold if alphafold else {'note': 'No AlphaFold prediction'}
    
    # IMPORTANT: Document limitations
    if not structures['pdb']:
        structures['limitation'] = "No direct PDB hit does NOT mean no structure exists. Check: (1) structures under different UniProt entries, (2) homolog structures, (3) domain-only structures."
    
    return structures

不要仅依赖PDB文本搜索。使用以下流程：

python

def path_structure_robust(tu, ids):
    """
    使用三步流程进行可靠的结构搜索。
    """
    structures = {'pdb': [], 'alphafold': None, 'domains': [], 'method_notes': []}
    
    # 步骤1：UniProt PDB交叉引用（最可靠）
    if ids['uniprot']:
        entry = tu.tools.UniProt_get_entry_by_accession(accession=ids['uniprot'])
        pdb_xrefs = [x for x in entry.get('uniProtKBCrossReferences', []) 
                    if x.get('database') == 'PDB']
        for xref in pdb_xrefs:
            pdb_id = xref.get('id')
            # 获取每个PDB的详细信息
            pdb_info = tu.tools.get_protein_metadata_by_pdb_id(pdb_id=pdb_id)
            if pdb_info:
                structures['pdb'].append(pdb_info)
        structures['method_notes'].append(f"步骤1：从UniProt获取{len(pdb_xrefs)}条PDB交叉引用")
    
    # 步骤2：基于序列的PDB搜索（捕获缺失的注释）
    if ids['uniprot'] and len(structures['pdb']) < 5:
        sequence = tu.tools.UniProt_get_sequence_by_accession(accession=ids['uniprot'])
        if sequence and len(sequence) < 1000:  # 序列长度适合搜索
            similar = tu.tools.PDB_search_similar_structures(
                sequence=sequence[:500],  # 如果序列过长，使用前500个氨基酸
                identity_cutoff=0.7
            )
            if similar:
                for hit in similar[:10]:  # 前10个相似结构
                    if hit['pdb_id'] not in [s.get('pdb_id') for s in structures['pdb']]:
                        structures['pdb'].append(hit)
        structures['method_notes'].append(f"步骤2：序列搜索（一致性≥70%）")
    
    # 步骤3：基于结构域的搜索（针对多结构域蛋白质）
    if ids['uniprot']:
        domains = tu.tools.InterPro_get_protein_domains(uniprot_accession=ids['uniprot'])
        structures['domains'] = domains if domains else []
        
        # 对于带结构域的大蛋白，按结构域序列窗口搜索
        if len(structures['pdb']) < 3 and domains:
            for domain in domains[:3]:  # 前3个结构域
                domain_name = domain.get('name', '')
                # 可按结构域名称搜索PDB
                domain_hits = tu.tools.PDB_search_by_keyword(query=domain_name, limit=5)
                if domain_hits:
                    structures['method_notes'].append(f"步骤3：结构域'{domain_name}'搜索")
    
    # AlphaFold（始终检查）
    alphafold = tu.tools.alphafold_get_prediction(uniprot_accession=ids['uniprot'])
    structures['alphafold'] = alphafold if alphafold else {'note': '无AlphaFold预测结果'}
    
    # 重要：记录局限性
    if not structures['pdb']:
        structures['limitation'] = "无直接PDB命中并不意味着不存在结构。请检查：(1) 不同UniProt条目下的结构，(2) 同源结构，(3) 仅含结构域的结构。"
    
    return structures

Structure Section Output Format

结构章节输出格式

markdown

undefined

markdown

undefined

4.1 Experimental Structures (PDB)

4.1实验结构（PDB）

Total PDB Entries: 23 structures (Source: UniProt cross-references) Search Method: 3-step chain (UniProt xrefs → sequence search → domain search)

PDB ID	Resolution	Method	Ligand	Coverage	Year
1M17	2.6Å	X-ray	Erlotinib	672-998	2002
3POZ	2.8Å	X-ray	Gefitinib	696-1022	2010

Note: "No direct PDB hit" ≠ "no structure exists". Check homologs and domain structures.

---

PDB条目总数：23个结构 来源：UniProt交叉引用 搜索方法：三步流程（UniProt交叉引用→序列搜索→结构域搜索）

PDB ID	分辨率	方法	配体	覆盖范围	年份
1M17	2.6Å	X射线	厄洛替尼	672-998	2002
3POZ	2.8Å	X射线	吉非替尼	696-1022	2010

注意："无直接PDB命中"≠"不存在结构"。请检查同源结构和结构域结构。

---

PATH 5: Expression Profile (Enhanced)

路径5：表达谱（增强版）

GTEx with Versioned ID Fallback

带版本化ID备选方案的GTEx

python

def path_expression(tu, ids):
    """
    Expression data with GTEx versioned ID fallback.
    """
    results = {'gtex': None, 'hpa': None, 'failed_tools': []}
    
    # GTEx with fallback
    ensembl_id = ids['ensembl']
    versioned_id = ids.get('ensembl_versioned')
    
    # Try unversioned first
    gtex_result = tu.tools.GTEx_get_median_gene_expression(
        gencode_id=ensembl_id,
        operation="median"
    )
    
    # Fallback to versioned if empty
    if not gtex_result or gtex_result.get('data') == []:
        if versioned_id:
            gtex_result = tu.tools.GTEx_get_median_gene_expression(
                gencode_id=versioned_id,
                operation="median"
            )
            if gtex_result and gtex_result.get('data'):
                results['gtex'] = gtex_result
                results['gtex_note'] = f"Used versioned ID: {versioned_id}"
        
        if not results.get('gtex'):
            results['failed_tools'].append({
                'tool': 'GTEx_get_median_gene_expression',
                'tried': [ensembl_id, versioned_id],
                'fallback': 'See HPA data below'
            })
    else:
        results['gtex'] = gtex_result
    
    # HPA (always query as backup)
    hpa_result = tu.tools.HPA_get_rna_expression_by_source(ensembl_id=ensembl_id)
    results['hpa'] = hpa_result if hpa_result else {'note': 'No HPA RNA data'}
    
    return results

python

def path_expression(tu, ids):
    """
    带GTEx版本化ID备选方案的表达数据。
    """
    results = {'gtex': None, 'hpa': None, 'failed_tools': []}
    
    # 带备选方案的GTEx
    ensembl_id = ids['ensembl']
    versioned_id = ids.get('ensembl_versioned')
    
    # 先尝试未版本化ID
    gtex_result = tu.tools.GTEx_get_median_gene_expression(
        gencode_id=ensembl_id,
        operation="median"
    )
    
    # 如果返回空结果，备选使用版本化ID
    if not gtex_result or gtex_result.get('data') == []:
        if versioned_id:
            gtex_result = tu.tools.GTEx_get_median_gene_expression(
                gencode_id=versioned_id,
                operation="median"
            )
            if gtex_result and gtex_result.get('data'):
                results['gtex'] = gtex_result
                results['gtex_note'] = f"使用版本化ID：{versioned_id}"
        
        if not results.get('gtex'):
            results['failed_tools'].append({
                'tool': 'GTEx_get_median_gene_expression',
                'tried': [ensembl_id, versioned_id],
                'fallback': '参见下方HPA数据'
            })
    else:
        results['gtex'] = gtex_result
    
    # HPA（始终作为备份查询）
    hpa_result = tu.tools.HPA_get_rna_expression_by_source(ensembl_id=ensembl_id)
    results['hpa'] = hpa_result if hpa_result else {'note': '无HPA RNA数据'}
    
    return results

Human Protein Atlas - Extended Expression (NEW)

人类蛋白质图谱 - 扩展表达数据（新增）

HPA provides comprehensive protein expression data including tissue-level, cell-level, and cell line expression.

python

def get_hpa_comprehensive_expression(tu, gene_symbol):
    """
    Get comprehensive expression data from Human Protein Atlas.
    
    Provides:
    - Tissue expression (protein and RNA)
    - Subcellular localization
    - Cell line expression comparison
    - Tissue specificity
    """
    
    # 1. Search for gene to get IDs
    gene_info = tu.tools.HPA_search_genes_by_query(search_query=gene_symbol)
    
    if not gene_info:
        return {'error': f'Gene {gene_symbol} not found in HPA'}
    
    # 2. Get tissue expression with specificity
    tissue_search = tu.tools.HPA_generic_search(
        search_query=gene_symbol,
        columns="g,gs,rnat,rnatsm,scml,scal",  # Gene, synonyms, tissue specificity, subcellular
        format="json"
    )
    
    # 3. Compare expression in cancer cell lines vs normal tissue
    cell_lines = ['a549', 'mcf7', 'hela', 'hepg2', 'pc3']
    cell_line_expression = {}
    
    for cell_line in cell_lines:
        try:
            expr = tu.tools.HPA_get_comparative_expression_by_gene_and_cellline(
                gene_name=gene_symbol,
                cell_line=cell_line
            )
            cell_line_expression[cell_line] = expr
        except:
            continue
    
    return {
        'gene_info': gene_info,
        'tissue_data': tissue_search,
        'cell_line_expression': cell_line_expression,
        'source': 'Human Protein Atlas'
    }

HPA Expression Output for Report:

markdown

undefined

HPA提供全面的蛋白质表达数据，包括组织水平、细胞水平和细胞系表达。

python

def get_hpa_comprehensive_expression(tu, gene_symbol):
    """
    从人类蛋白质图谱获取全面的表达数据。
    
    提供：
    - 组织表达（蛋白质和RNA）
    - 亚细胞定位
    - 细胞系表达比较
    - 组织特异性
    """
    
    # 1. 搜索基因以获取ID
    gene_info = tu.tools.HPA_search_genes_by_query(search_query=gene_symbol)
    
    if not gene_info:
        return {'error': f'在HPA中未找到基因{gene_symbol}'}
    
    # 2. 获取带特异性的组织表达数据
    tissue_search = tu.tools.HPA_generic_search(
        search_query=gene_symbol,
        columns="g,gs,rnat,rnatsm,scml,scal",  # 基因、同义词、组织特异性、亚细胞定位
        format="json"
    )
    
    # 3. 比较癌细胞系与正常组织的表达
    cell_lines = ['a549', 'mcf7', 'hela', 'hepg2', 'pc3']
    cell_line_expression = {}
    
    for cell_line in cell_lines:
        try:
            expr = tu.tools.HPA_get_comparative_expression_by_gene_and_cellline(
                gene_name=gene_symbol,
                cell_line=cell_line
            )
            cell_line_expression[cell_line] = expr
        except:
            continue
    
    return {
        'gene_info': gene_info,
        'tissue_data': tissue_search,
        'cell_line_expression': cell_line_expression,
        'source': '人类蛋白质图谱'
    }

报告中的HPA表达输出:

markdown

undefined

Tissue Expression Profile (Human Protein Atlas)

组织表达谱（人类蛋白质图谱）

Tissue	Protein Level	RNA nTPM	Specificity
Brain	High	45.2	Enriched
Liver	Medium	23.1	Enhanced
Kidney	Low	8.4	Not detected

Subcellular Localization: Cytoplasm, Plasma membrane

组织	蛋白质水平	RNA nTPM	特异性
脑	高	45.2	富集
肝	中	23.1	增强
肾	低	8.4	未检测到

亚细胞定位：细胞质、质膜

Cancer Cell Line Expression

癌细胞系表达

Cell Line	Cancer Type	Expression	vs Normal
A549	Lung	High	Elevated
MCF7	Breast	Medium	Similar
HeLa	Cervical	High	Elevated

Source: Human Protein Atlas via
HPA_search_genes_by_query
,
HPA_get_comparative_expression_by_gene_and_cellline


**Why HPA for Target Research**:
- **Drug target validation** - Confirm expression in target tissue
- **Safety assessment** - Expression in essential organs
- **Biomarker potential** - Tissue-specific expression
- **Cell line selection** - Choose appropriate models

---

细胞系	癌症类型	表达水平	与正常组织对比
A549	肺癌	高	上调
MCF7	乳腺癌	中	相似
HeLa	宫颈癌	高	上调

来源：人类蛋白质图谱，通过
HPA_search_genes_by_query
、
HPA_get_comparative_expression_by_gene_and_cellline
获取


**为何针对靶点研究使用HPA**:
- **药物靶点验证** - 确认靶点在目标组织中的表达
- **安全性评估** - 在重要器官中的表达情况
- **生物标志物潜力** - 组织特异性表达
- **细胞系选择** - 选择合适的模型

---

PATH 6: Variants & Disease (Enhanced)

路径6：变异与疾病（增强版）

6.1 ClinVar SNV vs CNV Separation

6.1 ClinVar SNV与CNV分离

markdown

undefined

markdown

undefined

8.3 Clinical Variants (ClinVar)

8.3临床变异（ClinVar）

Single Nucleotide Variants (SNVs)

单核苷酸变异（SNVs）

Variant	Clinical Significance	Condition	Review Status	PMID
p.L858R	Pathogenic	Lung cancer	4 stars	15118125
p.T790M	Pathogenic	Drug resistance	4 stars	15737014

Total Pathogenic SNVs: 47

变异	临床意义	病症	评审状态	PMID
p.L858R	致病性	肺癌	4星	15118125
p.T790M	致病性	耐药性	4星	15737014

致病性SNV总数：47个

Copy Number Variants (CNVs) - Reported Separately

拷贝数变异（CNVs）- 单独报告

Type	Region	Clinical Significance	Frequency
Amplification	7p11.2	Pathogenic	Common in cancer

Note: CNV data separated as it represents different mutation mechanism

undefined

类型	区域	临床意义	频率
扩增	7p11.2	致病性	在癌症中常见

注意：CNV数据单独报告，因为其代表不同的突变机制

undefined

6.2 DisGeNET Integration (NEW)

6.2 DisGeNET整合（新增）

DisGeNET provides curated gene-disease associations with evidence scores. Requires:

DISGENET_API_KEY

python

def get_disgenet_associations(tu, ids):
    """
    Get gene-disease associations from DisGeNET.
    Complements Open Targets with curated association scores.
    """
    symbol = ids.get('symbol')
    if not symbol:
        return {'status': 'skipped', 'reason': 'No gene symbol'}
    
    # Get all disease associations for gene
    gda = tu.tools.DisGeNET_search_gene(
        operation="search_gene",
        gene=symbol,
        limit=50
    )
    
    if gda.get('status') != 'success':
        return {'status': 'error', 'message': 'DisGeNET query failed'}
    
    associations = gda.get('data', {}).get('associations', [])
    
    # Categorize by evidence strength
    strong = []     # score >= 0.7
    moderate = []   # score 0.4-0.7  
    weak = []       # score < 0.4
    
    for assoc in associations:
        score = assoc.get('score', 0)
        disease_name = assoc.get('disease_name', '')
        umls_cui = assoc.get('disease_id', '')
        
        entry = {
            'disease': disease_name,
            'umls_cui': umls_cui,
            'score': score,
            'evidence_index': assoc.get('ei'),
            'dsi': assoc.get('dsi'),  # Disease Specificity Index
            'dpi': assoc.get('dpi')   # Disease Pleiotropy Index
        }
        
        if score >= 0.7:
            strong.append(entry)
        elif score >= 0.4:
            moderate.append(entry)
        else:
            weak.append(entry)
    
    return {
        'total_associations': len(associations),
        'strong_associations': strong,
        'moderate_associations': moderate,
        'weak_associations': weak[:10],  # Limit weak
        'disease_pleiotropy': len(associations)  # How many diseases linked
    }

DisGeNET Report Section (add to Section 8 - Disease Associations):

markdown

undefined

DisGeNET提供带证据评分的curated基因-疾病关联数据。需要：

DISGENET_API_KEY

python

def get_disgenet_associations(tu, ids):
    """
    从DisGeNET获取基因-疾病关联数据。
    用curated关联分数补充Open Targets数据。
    """
    symbol = ids.get('symbol')
    if not symbol:
        return {'status': 'skipped', 'reason': '无基因符号'}
    
    # 获取基因的所有疾病关联
    gda = tu.tools.DisGeNET_search_gene(
        operation="search_gene",
        gene=symbol,
        limit=50
    )
    
    if gda.get('status') != 'success':
        return {'status': 'error', 'message': 'DisGeNET查询失败'}
    
    associations = gda.get('data', {}).get('associations', [])
    
    # 按证据强度分类
    strong = []     # 评分≥0.7
    moderate = []   # 评分0.4-0.7  
    weak = []       # 评分<0.4
    
    for assoc in associations:
        score = assoc.get('score', 0)
        disease_name = assoc.get('disease_name', '')
        umls_cui = assoc.get('disease_id', '')
        
        entry = {
            'disease': disease_name,
            'umls_cui': umls_cui,
            'score': score,
            'evidence_index': assoc.get('ei'),
            'dsi': assoc.get('dsi'),  # 疾病特异性指数
            'dpi': assoc.get('dpi')   # 疾病多效性指数
        }
        
        if score >= 0.7:
            strong.append(entry)
        elif score >= 0.4:
            moderate.append(entry)
        else:
            weak.append(entry)
    
    return {
        'total_associations': len(associations),
        'strong_associations': strong,
        'moderate_associations': moderate,
        'weak_associations': weak[:10],  # 限制弱关联数量
        'disease_pleiotropy': len(associations)  # 关联的疾病数量
    }

DisGeNET报告章节（添加到章节8 - 疾病关联）：

markdown

undefined

8.x DisGeNET Gene-Disease Associations (NEW)

8.x DisGeNET基因-疾病关联（新增）

Total Diseases Associated: 47
Disease Pleiotropy Index: High (gene linked to many disease types)

关联疾病总数：47种
疾病多效性指数：高（该基因与多种疾病类型相关）

Strong Associations (Score ≥0.7)

强关联（评分≥0.7）

Disease	UMLS CUI	Score	Evidence Index
Non-small cell lung cancer	C0007131	0.85	0.92
Glioblastoma	C0017636	0.78	0.88

疾病	UMLS CUI	评分	证据指数
非小细胞肺癌	C0007131	0.85	0.92
胶质母细胞瘤	C0017636	0.78	0.88

Moderate Associations (Score 0.4-0.7)

中等关联（评分0.4-0.7）

Disease	UMLS CUI	Score	DSI
Breast cancer	C0006142	0.62	0.45

Note: DisGeNET score integrates curated databases, GWAS, animal models, and literature


**Evidence Tier Assignment**:
- DisGeNET Score ≥0.7 → Consider T2 evidence (multiple validated sources)
- DisGeNET Score 0.4-0.7 → Consider T3 evidence
- DisGeNET Score <0.4 → T4 evidence only

---

疾病	UMLS CUI	评分	DSI
乳腺癌	C0006142	0.62	0.45

注意：DisGeNET评分整合了curated数据库、GWAS、动物模型和文献数据


**证据层级分配**:
- DisGeNET评分≥0.7 → 视为T2证据（多个验证来源）
- DisGeNET评分0.4-0.7 → 视为T3证据
- DisGeNET评分<0.4 → 仅视为T4证据

---

PATH 7: Druggability & Target Validation (ENHANCED)

路径7：成药性与靶点验证（增强版）

7.1 Pharos/TCRD - Target Development Level (NEW)

7.1 Pharos/TCRD - 靶点开发水平（新增）

NIH's Illuminating the Druggable Genome (IDG) portal provides TDL classification for all human proteins:

python

def get_pharos_target_info(tu, ids):
    """
    Get Pharos/TCRD target development level and druggability.
    
    TDL Classification:
    - Tclin: Approved drug targets
    - Tchem: Targets with small molecule activities (IC50 < 30nM)
    - Tbio: Targets with biological annotations
    - Tdark: Understudied proteins
    """
    gene_symbol = ids.get('symbol')
    uniprot = ids.get('uniprot')
    
    # Try by gene symbol first
    if gene_symbol:
        result = tu.tools.Pharos_get_target(
            gene=gene_symbol
        )
    elif uniprot:
        result = tu.tools.Pharos_get_target(
            uniprot=uniprot
        )
    else:
        return {'status': 'error', 'message': 'Need gene symbol or UniProt'}
    
    if result.get('status') == 'success' and result.get('data'):
        target = result['data']
        return {
            'name': target.get('name'),
            'symbol': target.get('sym'),
            'tdl': target.get('tdl'),  # Tclin/Tchem/Tbio/Tdark
            'family': target.get('fam'),  # Kinase, GPCR, etc.
            'novelty': target.get('novelty'),
            'description': target.get('description'),
            'publications': target.get('publicationCount'),
            'interpretation': interpret_tdl(target.get('tdl'))
        }
    return None

def interpret_tdl(tdl):
    """Interpret Target Development Level for druggability."""
    interpretations = {
        'Tclin': 'Approved drug target - highest confidence for druggability',
        'Tchem': 'Small molecule active - good chemical tractability',
        'Tbio': 'Biologically characterized - may require novel modalities',
        'Tdark': 'Understudied - limited data, high novelty potential'
    }
    return interpretations.get(tdl, 'Unknown')

def search_disease_targets(tu, disease_name):
    """Find targets associated with a disease via Pharos."""
    
    result = tu.tools.Pharos_get_disease_targets(
        disease=disease_name,
        top=50
    )
    
    if result.get('status') == 'success':
        targets = result['data'].get('targets', [])
        # Group by TDL for prioritization
        by_tdl = {'Tclin': [], 'Tchem': [], 'Tbio': [], 'Tdark': []}
        for t in targets:
            tdl = t.get('tdl', 'Unknown')
            if tdl in by_tdl:
                by_tdl[tdl].append(t)
        return by_tdl
    return None

Pharos Report Section (add to Section 9 - Druggability):

markdown

undefined

NIH的照亮可成药基因组（IDG）门户为所有人类蛋白质提供TDL分类：

python

def get_pharos_target_info(tu, ids):
    """
    获取Pharos/TCRD靶点开发水平和成药性数据。
    
    TDL分类：
    - Tclin：已获批药物靶点
    - Tchem：具有小分子活性的靶点（IC50 < 30nM）
    - Tbio：具有生物学注释的靶点
    - Tdark：研究不足的蛋白质
    """
    gene_symbol = ids.get('symbol')
    uniprot = ids.get('uniprot')
    
    # 先尝试按基因符号查询
    if gene_symbol:
        result = tu.tools.Pharos_get_target(
            gene=gene_symbol
        )
    elif uniprot:
        result = tu.tools.Pharos_get_target(
            uniprot=uniprot
        )
    else:
        return {'status': 'error', 'message': '需要基因符号或UniProt登录号'}
    
    if result.get('status') == 'success' and result.get('data'):
        target = result['data']
        return {
            'name': target.get('name'),
            'symbol': target.get('sym'),
            'tdl': target.get('tdl'),  # Tclin/Tchem/Tbio/Tdark
            'family': target.get('fam'),  # 激酶、GPCR等
            'novelty': target.get('novelty'),
            'description': target.get('description'),
            'publications': target.get('publicationCount'),
            'interpretation': interpret_tdl(target.get('tdl'))
        }
    return None

def interpret_tdl(tdl):
    """为成药性解读靶点开发水平。"""
    interpretations = {
        'Tclin': '已获批药物靶点 - 成药性置信度最高',
        'Tchem': '具有小分子活性 - 化学成药性良好',
        'Tbio': '已进行生物学表征 - 可能需要新的药物形式',
        'Tdark': '研究不足 - 数据有限，具有高新颖性潜力'
    }
    return interpretations.get(tdl, '未知')

def search_disease_targets(tu, disease_name):
    """通过Pharos查找与疾病相关的靶点。"""
    
    result = tu.tools.Pharos_get_disease_targets(
        disease=disease_name,
        top=50
    )
    
    if result.get('status') == 'success':
        targets = result['data'].get('targets', [])
        # 按TDL分组以优先排序
        by_tdl = {'Tclin': [], 'Tchem': [], 'Tbio': [], 'Tdark': []}
        for t in targets:
            tdl = t.get('tdl', 'Unknown')
            if tdl in by_tdl:
                by_tdl[tdl].append(t)
        return by_tdl
    return None

Pharos报告章节（添加到章节9 - 成药性）：

markdown

undefined

9.x Pharos/TCRD Target Classification (NEW)

9.x Pharos/TCRD靶点分类（新增）

Target Development Level: Tchem
Protein Family: Kinase
Novelty Score: 0.35 (moderately studied)
Publication Count: 12,456

TDL Interpretation: Target has validated small molecule activities with IC50 < 30nM. Good chemical starting points exist.

Disease Targets Analysis (for disease-centric queries):

TDL	Count	Examples
Tclin	12	EGFR, ALK, RET
Tchem	45	KRAS, SHP2, CDK4
Tbio	78	Novel kinases
Tdark	23	Understudied

Source: Pharos/TCRD via
Pharos_get_target

undefined

靶点开发水平：Tchem
蛋白质家族：激酶
新颖性评分：0.35（中等研究程度）
出版物数量：12,456篇

TDL解读：该靶点具有经过验证的小分子活性，IC50 < 30nM。存在良好的化学起始点。

疾病靶点分析（针对疾病中心型查询）：

TDL	数量	示例
Tclin	12	EGFR, ALK, RET
Tchem	45	KRAS, SHP2, CDK4
Tbio	78	新型激酶
Tdark	23	研究不足的靶点

来源：Pharos/TCRD，通过
Pharos_get_target
获取

undefined

7.2 DepMap - Target Essentiality Validation (NEW)

7.2 DepMap - 靶点必需性验证（新增）

CRISPR knockout data from cancer cell lines to validate target essentiality:

python

def assess_target_essentiality(tu, ids):
    """
    Is this target essential for cancer cell survival?
    
    Negative effect scores = gene is essential (cells die upon KO)
    """
    gene_symbol = ids.get('symbol')
    
    if not gene_symbol:
        return {'status': 'error', 'message': 'Need gene symbol'}
    
    deps = tu.tools.DepMap_get_gene_dependencies(
        gene_symbol=gene_symbol
    )
    
    if deps.get('status') == 'success':
        return {
            'gene': gene_symbol,
            'data': deps.get('data', {}),
            'interpretation': 'Negative scores indicate gene is essential for cell survival',
            'note': 'Score < -0.5 is strongly essential, < -1.0 is extremely essential'
        }
    return None

def get_cancer_type_essentiality(tu, gene_symbol, cancer_type):
    """Check if gene is essential in specific cancer type."""
    
    # Get cell lines for cancer type
    cell_lines = tu.tools.DepMap_get_cell_lines(
        cancer_type=cancer_type,
        page_size=20
    )
    
    return {
        'gene': gene_symbol,
        'cancer_type': cancer_type,
        'cell_lines': cell_lines.get('data', {}).get('cell_lines', []),
        'note': 'Query individual cell lines for dependency scores via DepMap portal'
    }

DepMap Report Section (add to Section 9 - Druggability):

markdown

undefined

来自癌细胞系的CRISPR敲除数据，用于验证靶点必需性：

python

def assess_target_essentiality(tu, ids):
    """
    该靶点对癌细胞存活是否必需？
    
    负效应评分 = 基因是必需的（敲除后细胞死亡）
    """
    gene_symbol = ids.get('symbol')
    
    if not gene_symbol:
        return {'status': 'error', 'message': '需要基因符号'}
    
    deps = tu.tools.DepMap_get_gene_dependencies(
        gene_symbol=gene_symbol
    )
    
    if deps.get('status') == 'success':
        return {
            'gene': gene_symbol,
            'data': deps.get('data', {}),
            'interpretation': '负评分表明基因对细胞存活是必需的',
            'note': '评分< -0.5表示强必需，< -1.0表示极强必需'
        }
    return None

def get_cancer_type_essentiality(tu, gene_symbol, cancer_type):
    """检查基因在特定癌症类型中是否必需。"""
    
    # 获取该癌症类型的细胞系
    cell_lines = tu.tools.DepMap_get_cell_lines(
        cancer_type=cancer_type,
        page_size=20
    )
    
    return {
        'gene': gene_symbol,
        'cancer_type': cancer_type,
        'cell_lines': cell_lines.get('data', {}).get('cell_lines', []),
        'note': '通过DepMap门户查询单个细胞系的依赖评分'
    }

DepMap报告章节（添加到章节9 - 成药性）：

markdown

undefined

9.x Target Essentiality (DepMap) (NEW)

9.x靶点必需性（DepMap）（新增）

Gene Essentiality Assessment:

Context	Effect Score	Interpretation
Pan-cancer	-0.42	Moderately essential
Lung cancer	-0.78	Strongly essential
Breast cancer	-0.21	Weakly essential

Selectivity: Differential essentiality suggests cancer-type selective target

Cell Lines Tested: 1,054 cancer cell lines from DepMap

Interpretation: Score < -0.5 indicates strong dependency. This target is more essential in lung cancer than other cancer types - suggesting lung-selective targeting may be feasible.

Source: DepMap via
DepMap_get_gene_dependencies

undefined

基因必需性评估:

场景	效应评分	解读
泛癌症	-0.42	中等必需
肺癌	-0.78	强必需
乳腺癌	-0.21	弱必需

选择性：差异必需性表明该靶点具有癌症类型选择性

测试细胞系：来自DepMap的1,054个癌细胞系

解读：评分< -0.5表示强依赖。该靶点在肺癌中比其他癌症类型更必需 - 表明肺癌选择性靶向是可行的。

来源：DepMap，通过
DepMap_get_gene_dependencies
获取

undefined

7.3 InterProScan - Novel Domain Prediction (NEW)

7.3 InterProScan - 新结构域预测（新增）

For uncharacterized proteins, run InterProScan to predict domains and function:

python

def predict_protein_domains(tu, sequence, title="Query protein"):
    """
    Run InterProScan for de novo domain prediction.
    
    Use when:
    - Protein has no InterPro annotations
    - Novel/uncharacterized protein
    - Custom sequence analysis
    """
    
    result = tu.tools.InterProScan_scan_sequence(
        sequence=sequence,
        title=title,
        go_terms=True,
        pathways=True
    )
    
    if result.get('status') == 'success':
        data = result.get('data', {})
        
        # Job may still be running
        if data.get('job_status') == 'RUNNING':
            return {
                'job_id': data.get('job_id'),
                'status': 'running',
                'note': 'Use InterProScan_get_job_results to retrieve when ready'
            }
        
        # Parse completed results
        return {
            'domains': data.get('domains', []),
            'domain_count': data.get('domain_count', 0),
            'go_annotations': data.get('go_annotations', []),
            'pathways': data.get('pathways', []),
            'sequence_length': data.get('sequence_length')
        }
    return None

def check_interproscan_job(tu, job_id):
    """Check status and get results for InterProScan job."""
    
    status = tu.tools.InterProScan_get_job_status(job_id=job_id)
    
    if status.get('data', {}).get('is_finished'):
        results = tu.tools.InterProScan_get_job_results(job_id=job_id)
        return results.get('data', {})
    
    return status.get('data', {})

When to use InterProScan:

Novel/uncharacterized proteins (Tdark in Pharos)
Custom sequences (e.g., protein variants)
Proteins with outdated/sparse InterPro annotations
Validating domain predictions

InterProScan Report Section (for novel proteins):

markdown

undefined

针对未表征的蛋白质，运行InterProScan以预测结构域和功能：

python

def predict_protein_domains(tu, sequence, title="Query protein"):
    """
    运行InterProScan进行从头结构域预测。
    
    使用场景：
    - 蛋白质无InterPro注释
    - 新型/未表征蛋白质
    - 自定义序列分析
    """
    
    result = tu.tools.InterProScan_scan_sequence(
        sequence=sequence,
        title=title,
        go_terms=True,
        pathways=True
    )
    
    if result.get('status') == 'success':
        data = result.get('data', {})
        
        # 任务可能仍在运行
        if data.get('job_status') == 'RUNNING':
            return {
                'job_id': data.get('job_id'),
                'status': 'running',
                'note': '使用InterProScan_get_job_results在任务完成后获取结果'
            }
        
        # 解析已完成的结果
        return {
            'domains': data.get('domains', []),
            'domain_count': data.get('domain_count', 0),
            'go_annotations': data.get('go_annotations', []),
            'pathways': data.get('pathways', []),
            'sequence_length': data.get('sequence_length')
        }
    return None

def check_interproscan_job(tu, job_id):
    """检查InterProScan任务状态并获取结果。"""
    
    status = tu.tools.InterProScan_get_job_status(job_id=job_id)
    
    if status.get('data', {}).get('is_finished'):
        results = tu.tools.InterProScan_get_job_results(job_id=job_id)
        return results.get('data', {})
    
    return status.get('data', {})

何时使用InterProScan:

新型/未表征蛋白质（Pharos中的Tdark）
自定义序列（例如：蛋白质变异体）
InterPro注释过时/稀疏的蛋白质
验证结构域预测

InterProScan报告章节（针对新型蛋白质）：

markdown

undefined

Domain Prediction (InterProScan) (NEW)

结构域预测（InterProScan）（新增）

Used for uncharacterized protein analysis

Predicted Domains:

Domain	Database	Start-End	E-value	InterPro Entry
Protein kinase domain	Pfam	45-305	1.2e-89	IPR000719
SH2 domain	SMART	320-410	3.4e-45	IPR000980

Predicted GO Terms:

GO:0004672 protein kinase activity
GO:0005524 ATP binding

Predicted Pathways:

Reactome: Signal Transduction

Source: InterProScan via
InterProScan_scan_sequence

undefined

用于未表征蛋白质分析

预测的结构域:

结构域	数据库	起始-终止位置	E值	InterPro条目
蛋白激酶结构域	Pfam	45-305	1.2e-89	IPR000719
SH2结构域	SMART	320-410	3.4e-45	IPR000980

预测的GO术语:

GO:0004672 蛋白激酶活性
GO:0005524 ATP结合

预测的通路:

Reactome: 信号转导

来源：InterProScan，通过
InterProScan_scan_sequence
获取

undefined

7.4 BindingDB - Known Ligands & Binding Data (NEW)

7.4 BindingDB - 已知配体与结合数据（新增）

BindingDB provides experimental binding affinity data (Ki, IC50, Kd) for target-ligand pairs:

python

def get_bindingdb_ligands(tu, uniprot_id, affinity_cutoff=10000):
    """
    Get ligands with measured binding affinities from BindingDB.
    
    Critical for:
    - Identifying chemical starting points
    - Understanding existing chemical matter
    - Assessing tractability with small molecules
    
    Args:
        uniprot_id: UniProt accession (e.g., P00533 for EGFR)
        affinity_cutoff: Maximum affinity in nM (lower = more potent)
    """
    
    # Get ligands by UniProt
    result = tu.tools.BindingDB_get_ligands_by_uniprot(
        uniprot=uniprot_id,
        affinity_cutoff=affinity_cutoff
    )
    
    if result:
        ligands = []
        for entry in result:
            ligands.append({
                'smiles': entry.get('smile'),
                'affinity_type': entry.get('affinity_type'),  # Ki, IC50, Kd
                'affinity_nM': entry.get('affinity'),
                'monomer_id': entry.get('monomerid'),
                'pmid': entry.get('pmid')
            })
        
        # Sort by affinity (most potent first)
        ligands.sort(key=lambda x: float(x['affinity_nM']) if x['affinity_nM'] else float('inf'))
        
        return {
            'total_ligands': len(ligands),
            'ligands': ligands[:20],  # Top 20 most potent
            'best_affinity': ligands[0]['affinity_nM'] if ligands else None
        }
    
    return {'total_ligands': 0, 'ligands': [], 'note': 'No ligands found in BindingDB'}

def get_ligands_by_structure(tu, pdb_id, affinity_cutoff=10000):
    """Get ligands for a protein by PDB structure ID."""
    
    result = tu.tools.BindingDB_get_ligands_by_pdb(
        pdb_ids=pdb_id,
        affinity_cutoff=affinity_cutoff,
        sequence_identity=100
    )
    
    return result

def find_compound_targets(tu, smiles, similarity_cutoff=0.85):
    """Find other targets for a compound (polypharmacology)."""
    
    result = tu.tools.BindingDB_get_targets_by_compound(
        smiles=smiles,
        similarity_cutoff=similarity_cutoff
    )
    
    return result

BindingDB Report Section (add to Section 9 - Druggability):

markdown

undefined

BindingDB提供靶点-配体对的实验结合亲和力数据（Ki、IC50、Kd）：

python

def get_bindingdb_ligands(tu, uniprot_id, affinity_cutoff=10000):
    """
    从BindingDB获取具有测量结合亲和力的配体。
    
    关键用途：
    - 识别化学起始点
    - 了解现有化学物质
    - 评估小分子成药性
    
    参数：
        uniprot_id: UniProt登录号（例如：EGFR的P00533）
        affinity_cutoff: 最大亲和力（单位：nM，值越小表示活性越强）
    """
    
    # 按UniProt获取配体
    result = tu.tools.BindingDB_get_ligands_by_uniprot(
        uniprot=uniprot_id,
        affinity_cutoff=affinity_cutoff
    )
    
    if result:
        ligands = []
        for entry in result:
            ligands.append({
                'smiles': entry.get('smile'),
                'affinity_type': entry.get('affinity_type'),  # Ki、IC50、Kd
                'affinity_nM': entry.get('affinity'),
                'monomer_id': entry.get('monomerid'),
                'pmid': entry.get('pmid')
            })
        
        # 按亲和力排序（活性最强的在前）
        ligands.sort(key=lambda x: float(x['affinity_nM']) if x['affinity_nM'] else float('inf'))
        
        return {
            'total_ligands': len(ligands),
            'ligands': ligands[:20],  # 前20个活性最强的配体
            'best_affinity': ligands[0]['affinity_nM'] if ligands else None
        }
    
    return {'total_ligands': 0, 'ligands': [], 'note': '在BindingDB中未找到配体'}

def get_ligands_by_structure(tu, pdb_id, affinity_cutoff=10000):
    """按PDB结构ID获取蛋白质的配体。"""
    
    result = tu.tools.BindingDB_get_ligands_by_pdb(
        pdb_ids=pdb_id,
        affinity_cutoff=affinity_cutoff,
        sequence_identity=100
    )
    
    return result

def find_compound_targets(tu, smiles, similarity_cutoff=0.85):
    """找到化合物的其他靶点（多药理学）。"""
    
    result = tu.tools.BindingDB_get_targets_by_compound(
        smiles=smiles,
        similarity_cutoff=similarity_cutoff
    )
    
    return result

BindingDB报告章节（添加到章节9 - 成药性）：

markdown

undefined

Known Ligands (BindingDB) (NEW)

已知配体（BindingDB）（新增）

Total Ligands with Binding Data: 156 Best Reported Affinity: 0.3 nM (Ki)

具有结合数据的配体总数：156个 最佳报告亲和力：0.3 nM（Ki）

Most Potent Ligands

活性最强的配体

SMILES	Affinity Type	Value (nM)	Source PMID
CC(=O)Nc1ccc(cc1)c2...	Ki	0.3	15737014
CN(C)C/C=C/C(=O)Nc1...	IC50	0.8	15896103
COc1cc2ncnc(Nc3ccc...	Kd	2.1	16460808

Chemical Tractability Assessment:

✅ Tchem-level target: Multiple ligands with <30nM affinity
✅ Diverse chemotypes: Multiple scaffolds identified
✅ Published literature: Ligands have PMID references

Source: BindingDB via
BindingDB_get_ligands_by_uniprot


**Affinity Interpretation for Druggability**:
| Affinity Range | Interpretation | Drug Development Potential |
|----------------|----------------|---------------------------|
| <1 nM | Ultra-potent | Clinical compound likely |
| 1-10 nM | Highly potent | Drug-like |
| 10-100 nM | Potent | Good starting point |
| 100-1000 nM | Moderate | Needs optimization |
| >1000 nM | Weak | Early hit only |

SMILES	亲和力类型	值（nM）	来源PMID
CC(=O)Nc1ccc(cc1)c2...	Ki	0.3	15737014
CN(C)C/C=C/C(=O)Nc1...	IC50	0.8	15896103
COc1cc2ncnc(Nc3ccc...	Kd	2.1	16460808

化学成药性评估:

✅ Tchem级靶点：多个配体亲和力<30nM
✅ 多样的化学类型：识别到多个骨架
✅ 已发表文献：配体具有PMID参考文献

来源：BindingDB，通过
BindingDB_get_ligands_by_uniprot
获取


**亲和力成药性解读**:
| 亲和力范围 | 解读 | 药物开发潜力 |
|----------------|----------------|---------------------------|
| <1 nM | 超活性 | 可能为临床化合物 |
| 1-10 nM | 高活性 | 类药物 |
| 10-100 nM | 活性良好 | 良好的起始点 |
| 100-1000 nM | 中等活性 | 需要优化 |
| >1000 nM | 弱活性 | 仅为早期命中 |

7.5 PubChem BioAssay - Screening Data (NEW)

7.5 PubChem生物分析 - 筛选数据（新增）

PubChem BioAssay provides HTS screening data and dose-response curves:

python

def get_pubchem_assays_for_target(tu, gene_symbol):
    """
    Get bioassays targeting a gene from PubChem.
    
    Provides:
    - HTS screening results
    - Dose-response data (IC50/EC50)
    - Active compound counts
    """
    
    # Search assays by target gene
    assays = tu.tools.PubChem_search_assays_by_target_gene(
        gene_symbol=gene_symbol
    )
    
    assay_info = []
    if assays.get('data', {}).get('aids'):
        for aid in assays['data']['aids'][:10]:  # Top 10 assays
            # Get assay details
            summary = tu.tools.PubChem_get_assay_summary(aid=aid)
            targets = tu.tools.PubChem_get_assay_targets(aid=aid)
            
            assay_info.append({
                'aid': aid,
                'summary': summary.get('data', {}),
                'targets': targets.get('data', {})
            })
    
    return {
        'total_assays': len(assays.get('data', {}).get('aids', [])),
        'assay_details': assay_info
    }

def get_active_compounds_from_assay(tu, aid):
    """Get active compounds from a specific bioassay."""
    
    actives = tu.tools.PubChem_get_assay_active_compounds(aid=aid)
    
    return {
        'aid': aid,
        'active_cids': actives.get('data', {}).get('cids', []),
        'count': len(actives.get('data', {}).get('cids', []))
    }

PubChem BioAssay Report Section:

markdown

undefined

PubChem生物分析提供HTS筛选数据和剂量反应曲线：

python

def get_pubchem_assays_for_target(tu, gene_symbol):
    """
    从PubChem获取针对基因的生物分析数据。
    
    提供：
    - HTS筛选结果
    - 剂量反应数据（IC50/EC50）
    - 活性化合物数量
    """
    
    # 按靶点基因搜索分析
    assays = tu.tools.PubChem_search_assays_by_target_gene(
        gene_symbol=gene_symbol
    )
    
    assay_info = []
    if assays.get('data', {}).get('aids'):
        for aid in assays['data']['aids'][:10]:  # 前10个分析
            # 获取分析详情
            summary = tu.tools.PubChem_get_assay_summary(aid=aid)
            targets = tu.tools.PubChem_get_assay_targets(aid=aid)
            
            assay_info.append({
                'aid': aid,
                'summary': summary.get('data', {}),
                'targets': targets.get('data', {})
            })
    
    return {
        'total_assays': len(assays.get('data', {}).get('aids', [])),
        'assay_details': assay_info
    }

def get_active_compounds_from_assay(tu, aid):
    """从特定生物分析中获取活性化合物。"""
    
    actives = tu.tools.PubChem_get_assay_active_compounds(aid=aid)
    
    return {
        'aid': aid,
        'active_cids': actives.get('data', {}).get('cids', []),
        'count': len(actives.get('data', {}).get('cids', []))
    }

PubChem生物分析报告章节:

markdown

undefined

PubChem BioAssay Data (NEW)

PubChem生物分析数据（新增）

Assays Targeting This Gene: 45

AID	Assay Type	Active Compounds	Target Info
1053104	Dose-response	12	EGFR kinase
504526	HTS	234	EGFR binding
651564	Confirmatory	8	EGFR cellular

Total Active Compounds Across Assays: ~500

Source: PubChem via
PubChem_search_assays_by_target_gene
,
PubChem_get_assay_active_compounds

---

针对该基因的分析数量：45个

AID	分析类型	活性化合物数量	靶点信息
1053104	剂量反应	12	EGFR激酶
504526	HTS	234	EGFR结合
651564	确证性分析	8	EGFR细胞水平分析

所有分析中的活性化合物总数：约500个

来源：PubChem，通过
PubChem_search_assays_by_target_gene
、
PubChem_get_assay_active_compounds
获取

---

PATH 8: Literature & Research (Collision-Aware)

路径8：文献与研究（碰撞感知）

Collision-Aware Query Strategy

碰撞感知查询策略

python

def path_literature_collision_aware(tu, ids):
    """
    Literature search with collision detection and filtering.
    """
    symbol = ids['symbol']
    full_name = ids.get('full_name', '')
    uniprot = ids['uniprot']
    synonyms = ids.get('synonyms', [])
    
    # Step 1: Detect collisions
    collision_filter = detect_collisions(tu, symbol, full_name)
    
    # Step 2: Build high-precision seed queries
    seed_queries = [
        f'"{symbol}"[Title] AND (protein OR gene OR expression)',  # Symbol in title
        f'"{full_name}"[Title]' if full_name else None,  # Full name in title
        f'"UniProt:{uniprot}"' if uniprot else None,  # UniProt accession
    ]
    seed_queries = [q for q in seed_queries if q]
    
    # Add key synonyms
    for syn in synonyms[:3]:
        seed_queries.append(f'"{syn}"[Title]')
    
    # Step 3: Execute seed queries and collect PMIDs
    seed_pmids = set()
    for query in seed_queries:
        if collision_filter:
            query = f"({query}){collision_filter}"
        results = tu.tools.PubMed_search_articles(query=query, limit=30)
        for article in results.get('articles', []):
            seed_pmids.add(article.get('pmid'))
    
    # Step 4: Expand via citation network (for sparse targets)
    if len(seed_pmids) < 30:
        expanded_pmids = set()
        for pmid in list(seed_pmids)[:10]:  # Top 10 seeds
            # Get related articles
            related = tu.tools.PubMed_get_related(pmid=pmid, limit=20)
            for r in related.get('articles', []):
                expanded_pmids.add(r.get('pmid'))
            
            # Get citing articles
            citing = tu.tools.EuropePMC_get_citations(pmid=pmid, limit=20)
            for c in citing.get('citations', []):
                expanded_pmids.add(c.get('pmid'))
        
        seed_pmids.update(expanded_pmids)
    
    # Step 5: Classify papers by evidence tier
    papers_by_tier = {'T1': [], 'T2': [], 'T3': [], 'T4': []}
    # ... classification logic based on title/abstract keywords
    
    return {
        'total_papers': len(seed_pmids),
        'collision_filter_applied': collision_filter if collision_filter else 'None needed',
        'seed_queries': seed_queries,
        'papers_by_tier': papers_by_tier
    }

python

def path_literature_collision_aware(tu, ids):
    """
    带碰撞检测和过滤的文献搜索。
    """
    symbol = ids['symbol']
    full_name = ids.get('full_name', '')
    uniprot = ids['uniprot']
    synonyms = ids.get('synonyms', [])
    
    # 步骤1：检测冲突
    collision_filter = detect_collisions(tu, symbol, full_name)
    
    # 步骤2：构建高精度种子查询
    seed_queries = [
        f'"{symbol}"[Title] AND (protein OR gene OR expression)',  # 标题中的符号
        f'"{full_name}"[Title]' if full_name else None,  # 标题中的全名
        f'"UniProt:{uniprot}"' if uniprot else None,  # UniProt登录号
    ]
    seed_queries = [q for q in seed_queries if q]
    
    # 添加关键同义词
    for syn in synonyms[:3]:
        seed_queries.append(f'"{syn}"[Title]')
    
    # 步骤3：执行种子查询并收集PMID
    seed_pmids = set()
    for query in seed_queries:
        if collision_filter:
            query = f"({query}){collision_filter}"
        results = tu.tools.PubMed_search_articles(query=query, limit=30)
        for article in results.get('articles', []):
            seed_pmids.add(article.get('pmid'))
    
    # 步骤4：通过引用网络扩展（针对稀疏靶点）
    if len(seed_pmids) < 30:
        expanded_pmids = set()
        for pmid in list(seed_pmids)[:10]:  # 前10个种子
            # 获取相关文章
            related = tu.tools.PubMed_get_related(pmid=pmid, limit=20)
            for r in related.get('articles', []):
                expanded_pmids.add(r.get('pmid'))
            
            # 获取引用文章
            citing = tu.tools.EuropePMC_get_citations(pmid=pmid, limit=20)
            for c in citing.get('citations', []):
                expanded_pmids.add(c.get('pmid'))
        
        seed_pmids.update(expanded_pmids)
    
    # 步骤5：按证据层级分类论文
    papers_by_tier = {'T1': [], 'T2': [], 'T3': [], 'T4': []}
    # ... 基于标题/摘要关键词的分类逻辑
    
    return {
        'total_papers': len(seed_pmids),
        'collision_filter_applied': collision_filter if collision_filter else '无需过滤',
        'seed_queries': seed_queries,
        'papers_by_tier': papers_by_tier
    }

Retry Logic & Fallback Chains

重试逻辑与备选流程

Retry Policy

重试策略

For each critical tool, implement retry with exponential backoff:

python

def call_with_retry(tu, tool_name, params, max_retries=3):
    """
    Call tool with retry logic.
    """
    for attempt in range(max_retries):
        try:
            result = getattr(tu.tools, tool_name)(**params)
            if result and not result.get('error'):
                return result
        except Exception as e:
            if attempt < max_retries - 1:
                time.sleep(2 ** attempt)  # Exponential backoff
            else:
                return {'error': str(e), 'tool': tool_name, 'attempts': max_retries}
    return None

针对每个关键工具，实现带指数退避的重试：

python

def call_with_retry(tu, tool_name, params, max_retries=3):
    """
    带重试逻辑的工具调用。
    """
    for attempt in range(max_retries):
        try:
            result = getattr(tu.tools, tool_name)(**params)
            if result and not result.get('error'):
                return result
        except Exception as e:
            if attempt < max_retries - 1:
                time.sleep(2 ** attempt)  # 指数退避
            else:
                return {'error': str(e), 'tool': tool_name, 'attempts': max_retries}
    return None

Fallback Chains (CRITICAL)

备选流程（关键）

Primary Tool	Fallback 1	Fallback 2	Failure Action
`ChEMBL_get_target_activities`	`GtoPdb_get_target_ligands`	`OpenTargets drugs`	Note in report
`intact_get_interactions`	`STRING_get_protein_interactions`	`OpenTargets interactions`	Note in report
`GO_get_annotations_for_gene`	`OpenTargets GO`	`MyGene GO`	Note in report
`GTEx_get_median_gene_expression`	`HPA_get_rna_expression`	Note as unavailable	Document in report
`gnomad_get_gene_constraints`	`OpenTargets constraint`	-	Note in report
`DGIdb_get_drug_gene_interactions`	`OpenTargets drugs`	`GtoPdb`	Note in report

主工具	备选1	备选2	失败操作
`ChEMBL_get_target_activities`	`GtoPdb_get_target_ligands`	`OpenTargets drugs`	在报告中注明
`intact_get_interactions`	`STRING_get_protein_interactions`	`OpenTargets interactions`	在报告中注明
`GO_get_annotations_for_gene`	`OpenTargets GO`	`MyGene GO`	在报告中注明
`GTEx_get_median_gene_expression`	`HPA_get_rna_expression`	-	在报告中注明不可用
`gnomad_get_gene_constraints`	`OpenTargets constraint`	-	在报告中注明
`DGIdb_get_drug_gene_interactions`	`OpenTargets drugs`	`GtoPdb`	在报告中注明

Failure Surfacing Rule

失败披露规则

NEVER silently skip failed tools. Always document:

markdown

undefined

永远不要静默跳过失败的工具。始终记录：

markdown

undefined

7.1 Tissue Expression

7.1组织表达

GTEx Data: Unavailable (API timeout after 3 attempts) Fallback Data (HPA):

Tissue	Expression Level	Specificity
Liver	High	Enhanced
Kidney	Medium	-

Note: For complete GTEx data, query directly at gtexportal.org

---

GTEx数据：不可用（3次尝试后API超时） 备选数据（HPA）:

组织	表达水平	特异性
肝	高	增强
肾	中	-

注意：如需完整GTEx数据，请直接在gtexportal.org查询

---

Per-Section Data Minimums & Completeness Audit

按章节数据最小值与完整性审计

Minimum Data Requirements (Enforced)

最低数据要求（强制执行）

Section	Minimum Data	If Not Met
6. PPIs	≥20 interactors	Document which tools failed + why
7. Expression	Top 10 tissues with TPM + HPA RNA summary	Note "limited data" with specific gaps
8. Disease	Top 10 OT diseases + gnomAD constraints + ClinVar summary	Separate SNV/CNV; note if constraint unavailable
9. Druggability	OT tractability + probes + drugs + DGIdb + GtoPdb fallback	"No drugs/probes" is valid data
11. Literature	Total count + 5-year trend + 3-5 key papers with evidence tiers	Note if sparse (<50 papers)

章节	最低数据要求	未满足时的操作
6.蛋白质相互作用	≥20个相互作用蛋白	记录哪些工具失败及原因
7.表达	前10个带TPM值的组织 + HPA RNA摘要	注明“数据有限”及具体缺口
8.疾病	前10个Open Targets疾病 + gnomAD约束 + ClinVar摘要	分离SNV/CNV；如约束不可用则注明
9.成药性	Open Targets成药性 + 探针 + 药物 + DGIdb + GtoPdb备选	“无药物/探针”属于有效数据
11.文献	总数 + 5年趋势 + 3-5篇带证据层级的关键论文	如文献稀疏（<50篇）则注明

Post-Run Completeness Audit

运行后完整性审计

Before finalizing the report, run this checklist:

markdown

undefined

在最终确定报告前，运行以下检查清单：

markdown

undefined

Completeness Audit (REQUIRED)

完整性审计（必填）

Data Minimums Check

数据最小值检查

Negative Results Documented

阴性结果记录

Empty tool results noted explicitly (not left blank)
Failed tools with fallbacks documented
"No data" sections have implications noted

空工具结果已明确注明（未留空）
失败工具及备选方案已记录
“无数据”章节已注明影响

Evidence Quality

证据质量

T1-T4 grades in Executive Summary disease claims
T1-T4 grades in Disease Associations table
Key papers table has evidence tiers
Per-section evidence summaries included

执行摘要中的疾病结论带有T1-T4分级
疾病关联表格带有T1-T4分级
关键论文表格带有证据层级
包含按章节的证据摘要

Source Attribution

来源标注

Every data point has source tool/database cited
Section-end source summaries present

undefined

每个数据点都标注了来源工具/数据库
章节末尾有来源摘要

undefined

Data Gap Table (Required if minimums not met)

数据缺口表格（未满足最小值时必填）

markdown

undefined

markdown

undefined

15. Data Gaps & Limitations

15.数据缺口与局限性

Section	Expected Data	Actual	Reason	Alternative Source
6. PPIs	≥20 interactors	8	Novel target, limited studies	Literature review needed
7. Expression	GTEx TPM	None	Versioned ID not recognized	See HPA data
9. Probes	Chemical probes	None	No validated probes exist	Consider tool compound dev

Recommendations for Data Gaps:

For PPIs: Query BioGRID with broader parameters; check yeast-2-hybrid studies
For Expression: Query GEO directly for tissue-specific datasets

---

章节	预期数据	实际数据	原因	替代来源
6.蛋白质相互作用	≥20个相互作用蛋白	8个	新型靶点，研究有限	需要文献综述
7.表达	GTEx TPM	无	版本化ID未被识别	参见HPA数据
9.探针	化学探针	无	无经过验证的探针	考虑开发工具化合物

数据缺口建议:

针对蛋白质相互作用：使用更广泛的参数查询BioGRID；检查酵母双杂交研究
针对表达：直接查询GEO获取组织特异性数据集

---

Report Template (Initial File)

报告模板（初始文件）

File:

[TARGET]_target_report.md

markdown

undefined

文件：

[TARGET]_target_report.md

markdown

undefined

Target Intelligence Report: [TARGET NAME]

靶点情报报告：[靶点名称]

Generated: [Date] | Query: [Original query] | Status: In Progress

生成时间：[日期] | 查询内容：[原始查询] | 状态：研究中

1. Executive Summary

1.执行摘要

[Researching...]

[研究中...]

2. Target Identifiers

2.靶点标识符

[Researching...]

[研究中...]

3. Basic Information

3.基本信息

4. Structural Biology

4.结构生物学

4.1 Experimental Structures (PDB)

4.1实验结构（PDB）

[Researching...]

5.1 Gene Ontology Annotations

5.1基因本体注释

[Researching...]

[研究中...]

5.2 Pathway Involvement

5.2通路参与

[Researching...]

[研究中...]

6. Protein-Protein Interactions

6.蛋白质-蛋白质相互作用

[Researching...]

[研究中...]

7. Expression Profile

7.表达谱

7.1 Tissue Expression (GTEx/HPA)

7.1组织表达（GTEx/HPA）

[Researching...]

[研究中...]

7.2 Tissue Specificity

7.2组织特异性

[Researching...]

[研究中...]

8. Genetic Variation & Disease

8.遗传变异与疾病

8.1 Constraint Scores

8.1约束评分

[Researching...]

[研究中...]

9.3化学探针

[Researching...]

11. Literature & Research Landscape

11.文献与研究态势

11.1 Publication Metrics

11.1出版物指标

[Researching...]

[研究中...]

11.2 Research Trend

11.2研究趋势

[Researching...]

13.3挑战与风险

[Researching...]

[研究中...]

13.4 Recommendations

13.4建议

[Researching...]

[研究中...]

14. Data Sources & Methodology

14.数据来源与方法

[Will be populated as research progresses...]

[将随着研究进展填充...]

15. Data Gaps & Limitations

15.数据缺口与局限性

[To be populated post-audit...]

---

[审计后填充...]

---

Quick Reference: Tool Parameters

快速参考：工具参数

Tool	Parameter	Notes
`Reactome_map_uniprot_to_pathways`	`id`	NOT `uniprot_id`
`ensembl_get_xrefs`	`id`	NOT `gene_id`
`GTEx_get_median_gene_expression`	`gencode_id` , `operation`	Try versioned ID if empty
`OpenTargets_*`	`ensemblId`	camelCase, not `ensemblID`
`STRING_get_protein_interactions`	`protein_ids` , `species`	List format for IDs
`intact_get_interactions`	`identifier`	UniProt accession

工具	参数	注意事项
`Reactome_map_uniprot_to_pathways`	`id`	不是 `uniprot_id`
`ensembl_get_xrefs`	`id`	不是 `gene_id`
`GTEx_get_median_gene_expression`	`gencode_id` , `operation`	如返回空结果，尝试版本化ID
`OpenTargets_*`	`ensemblId`	小驼峰命名，不是 `ensemblID`
`STRING_get_protein_interactions`	`protein_ids` , `species`	ID为列表格式
`intact_get_interactions`	`identifier`	UniProt登录号

When NOT to Use This Skill

何时不使用该技能

Simple protein lookup → Use
```
UniProt_get_entry_by_accession
```
directly
Drug information only → Use drug-focused tools
Disease-centric query → Use disease-intelligence-gatherer skill
Sequence retrieval → Use sequence-retrieval skill
Structure download → Use protein-structure-retrieval skill

Use this skill for comprehensive, multi-angle target analysis with guaranteed data completeness.

简单蛋白质查询 → 直接使用
```
UniProt_get_entry_by_accession
```
仅需药物信息 → 使用药物专用工具
疾病中心型查询 → 使用疾病情报收集技能
序列检索 → 使用序列检索技能
结构下载 → 使用蛋白质结构检索技能

当需要全面、多角度的靶点分析并保证数据完整性时，使用该技能。

tooluniverse-target-research

Original

Translation

Comprehensive Target Intelligence Gatherer

全面靶点情报收集工具

Phase 0: Tool Parameter Verification (CRITICAL)

阶段0：工具参数验证（关键步骤）

Always check tool params to prevent silent failures

始终检查工具参数以避免静默失败

Reveals: takes id not uniprot_id

结果显示：该工具接受id参数，而非uniprot_id

Known Parameter Corrections (Updated)

已知参数修正（更新版）

GTEx Versioned ID Fallback (CRITICAL)

GTEx版本化ID备选方案（关键步骤）

Step 1: Get gene info with version

步骤1：获取带版本的基因信息

Step 2: Try versioned ID

步骤2：尝试使用版本化ID

When to Use This Skill

何时使用该技能

Critical Workflow Requirements

关键工作流要求

1. Report-First Approach (MANDATORY)

1. 先报告后执行（强制要求）

2. Evidence Grading System (MANDATORY)

2. 证据分级系统（强制要求）

Evidence Tiers

证据层级

Required Evidence Grading Locations

证据分级必填位置

Per-Section Evidence Summary

按章节的证据摘要

3. Citation Requirements (MANDATORY)

3. 引用要求（强制要求）

Core Strategy: 9 Research Paths

核心策略：9条研究路径

Identifier Resolution (Phase 1)

标识符解析（阶段1）

GPCR Target Detection (NEW)

GPCR靶点检测（新增）

2.x GPCR-Specific Data (GPCRdb)

2.x GPCR专用数据（GPCRdb）

Collision Detection for Literature Search

文献搜索的碰撞检测

PATH 0: Open Targets Foundation (ALWAYS FIRST)

路径0：Open Targets基础数据（始终优先执行）

Path 0 Implementation

路径0实现

Negative Results Are Data

阴性结果也是有效数据

9.3 Chemical Probes

9.3化学探针

PATH 2: Structure & Domains (Enhanced)

路径2：结构与结构域（增强版）

3-Step Structure Search Chain

三步结构搜索流程

Structure Section Output Format

结构章节输出格式

4.1 Experimental Structures (PDB)

4.1实验结构（PDB）

PATH 5: Expression Profile (Enhanced)

路径5：表达谱（增强版）

GTEx with Versioned ID Fallback

带版本化ID备选方案的GTEx

Human Protein Atlas - Extended Expression (NEW)

人类蛋白质图谱 - 扩展表达数据（新增）

Tissue Expression Profile (Human Protein Atlas)

组织表达谱（人类蛋白质图谱）

Cancer Cell Line Expression

癌细胞系表达

PATH 6: Variants & Disease (Enhanced)

路径6：变异与疾病（增强版）

6.1 ClinVar SNV vs CNV Separation

6.1 ClinVar SNV与CNV分离

8.3 Clinical Variants (ClinVar)

8.3临床变异（ClinVar）

Single Nucleotide Variants (SNVs)

单核苷酸变异（SNVs）

Copy Number Variants (CNVs) - Reported Separately

Reveals: takes
`id`
not
`uniprot_id`

结果显示：该工具接受
`id`
参数，而非
`uniprot_id`