tooluniverse-protein-structure-retrieval
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseProtein Structure Data Retrieval
蛋白质结构数据检索
Retrieve protein structures with proper disambiguation, quality assessment, and comprehensive metadata.
IMPORTANT: Always use English terms in tool calls (protein names, organism names), even if the user writes in another language. Only try original-language terms as a fallback if English returns no results. Respond in the user's language.
通过正确的消歧、质量评估和全面的元数据获取蛋白质结构。
重要提示:工具调用中始终使用英文术语(蛋白质名称、生物名称),即使用户使用其他语言提问。仅当英文检索无结果时,才尝试使用原语言术语作为备选。用用户的语言回复。
Workflow Overview
工作流程概述
Phase 0: Clarify (if needed)
↓
Phase 1: Disambiguate Protein Identity
↓
Phase 2: Retrieve Structures (Internal)
↓
Phase 3: Report Structure Profile阶段0:澄清(必要时)
↓
阶段1:蛋白质身份消歧
↓
阶段2:检索结构(内部操作)
↓
阶段3:生成结构分析报告Phase 0: Clarification (When Needed)
阶段0:必要时进行澄清
Ask the user ONLY if:
- Protein name matches multiple genes/families (e.g., "kinase" → which kinase?)
- Organism not specified for conserved proteins
- Intent unclear: need experimental structure vs AlphaFold prediction?
Skip clarification for:
- Specific PDB IDs (4-character codes)
- UniProt accessions
- Unambiguous protein names with organism
仅在以下情况询问用户:
- 蛋白质名称对应多个基因/家族(例如,“激酶”→具体是哪种激酶?)
- 保守蛋白质未指定所属生物
- 用户意图不明确:需要实验结构还是AlphaFold预测结构?
以下情况无需澄清:
- 明确的PDB ID(4字符编码)
- UniProt登录号
- 已指定生物的无歧义蛋白质名称
Phase 1: Protein Disambiguation
阶段1:蛋白质身份消歧
1.1 Resolve Protein Identity
1.1 解析蛋白质身份
python
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()python
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()Strategy depends on input type
根据输入类型选择策略
if user_provided_pdb_id:
# Direct structure retrieval
pdb_id = user_provided_pdb_id.upper()
elif user_provided_uniprot:
# Get UniProt info, then search structures
uniprot_id = user_provided_uniprot
# Can also get AlphaFold structure
af_structure = tu.tools.alphafold_get_structure_by_uniprot(
uniprot_id=uniprot_id
)
elif user_provided_protein_name:
# Search by name
result = tu.tools.search_structures_by_protein_name(
protein_name=protein_name
)
undefinedif user_provided_pdb_id:
# 直接检索结构
pdb_id = user_provided_pdb_id.upper()
elif user_provided_uniprot:
# 获取UniProt信息,然后检索结构
uniprot_id = user_provided_uniprot
# 也可获取AlphaFold结构
af_structure = tu.tools.alphafold_get_structure_by_uniprot(
uniprot_id=uniprot_id
)
elif user_provided_protein_name:
# 按名称检索
result = tu.tools.search_structures_by_protein_name(
protein_name=protein_name
)
undefined1.2 Identity Resolution Checklist
1.2 身份解析检查清单
- Protein name/gene identified
- Organism confirmed
- UniProt accession (if available)
- Isoform/variant specified (if relevant)
- 已确定蛋白质名称/基因
- 已确认所属生物
- 已获取UniProt登录号(若有)
- 已指定亚型/变异体(如相关)
1.3 Handle Naming Collisions
1.3 处理命名冲突
Common ambiguous terms:
| Term | Ambiguity | Resolution |
|---|---|---|
| "kinase" | Hundreds of kinases | Ask which kinase (EGFR, CDK2, etc.) |
| "receptor" | Many receptor types | Specify receptor family |
| "protease" | Multiple families | Ask serine/cysteine/metallo/etc. |
| "hemoglobin" | Clear | Proceed (α/β chain specified if needed) |
| "insulin" | Clear | Proceed |
常见歧义术语:
| 术语 | 歧义点 | 解决方式 |
|---|---|---|
| "kinase" | 存在数百种激酶 | 询问具体激酶类型(如EGFR、CDK2等) |
| "receptor" | 存在多种受体类型 | 指定受体家族 |
| "protease" | 分属多个家族 | 询问具体类型(丝氨酸/半胱氨酸/金属蛋白酶等) |
| "hemoglobin" | 无歧义 | 直接继续 |
| "insulin" | 无歧义 | 直接继续 |
Phase 2: Data Retrieval (Internal)
阶段2:数据检索(内部操作)
Retrieve all data silently. Do NOT narrate the search process.
静默完成所有数据检索,无需向用户说明搜索过程。
2.1 Search Structures
2.1 检索结构
python
undefinedpython
undefinedSearch by protein name
按蛋白质名称检索
result = tu.tools.search_structures_by_protein_name(
protein_name=protein_name
)
result = tu.tools.search_structures_by_protein_name(
protein_name=protein_name
)
Filter results by quality
按质量过滤结果
high_res = [
entry for entry in result["data"]
if entry.get("resolution") and entry["resolution"] < 2.5
]
undefinedhigh_res = [
entry for entry in result["data"]
if entry.get("resolution") and entry["resolution"] < 2.5
]
undefined2.2 Get Structure Details
2.2 获取结构详情
For each relevant structure:
python
pdb_id = "4INS"针对每个相关结构执行以下操作:
python
pdb_id = "4INS"Basic metadata
基础元数据
metadata = tu.tools.get_protein_metadata_by_pdb_id(pdb_id=pdb_id)
metadata = tu.tools.get_protein_metadata_by_pdb_id(pdb_id=pdb_id)
Experimental details
实验细节
exp_details = tu.tools.get_protein_experimental_details_by_pdb_id(
pdb_id=pdb_id
)
exp_details = tu.tools.get_protein_experimental_details_by_pdb_id(
pdb_id=pdb_id
)
Resolution (if X-ray)
分辨率(X射线结构适用)
resolution = tu.tools.get_protein_resolution_by_pdb_id(pdb_id=pdb_id)
resolution = tu.tools.get_protein_resolution_by_pdb_id(pdb_id=pdb_id)
Bound ligands
结合配体
ligands = tu.tools.get_protein_ligands_by_pdb_id(pdb_id=pdb_id)
ligands = tu.tools.get_protein_ligands_by_pdb_id(pdb_id=pdb_id)
Similar structures
相似结构
similar = tu.tools.get_similar_structures_by_pdb_id(
pdb_id=pdb_id,
cutoff=2.0
)
undefinedsimilar = tu.tools.get_similar_structures_by_pdb_id(
pdb_id=pdb_id,
cutoff=2.0
)
undefined2.3 PDBe Additional Data
2.3 PDBe补充数据
python
undefinedpython
undefinedEntry summary
条目摘要
summary = tu.tools.pdbe_get_entry_summary(pdb_id=pdb_id)
summary = tu.tools.pdbe_get_entry_summary(pdb_id=pdb_id)
Molecular entities
分子实体
molecules = tu.tools.pdbe_get_molecules(pdb_id=pdb_id)
molecules = tu.tools.pdbe_get_molecules(pdb_id=pdb_id)
Binding sites
结合位点
binding_sites = tu.tools.pdbe_get_binding_sites(pdb_id=pdb_id)
undefinedbinding_sites = tu.tools.pdbe_get_binding_sites(pdb_id=pdb_id)
undefined2.4 AlphaFold Predictions
2.4 AlphaFold预测结构
python
undefinedpython
undefinedWhen no experimental structure exists, or for comparison
当无实验结构或需要对比时使用
if uniprot_id:
af_structure = tu.tools.alphafold_get_structure_by_uniprot(
uniprot_id=uniprot_id
)
undefinedif uniprot_id:
af_structure = tu.tools.alphafold_get_structure_by_uniprot(
uniprot_id=uniprot_id
)
undefinedFallback Chains
备选检索链
| Primary | Fallback | Notes |
|---|---|---|
| RCSB search | PDBe search | Regional availability |
| get_protein_metadata | pdbe_get_entry_summary | Alternative source |
| Experimental structure | AlphaFold prediction | No experimental structure |
| get_protein_ligands | pdbe_get_binding_sites | Ligand info unavailable |
| 优先方案 | 备选方案 | 说明 |
|---|---|---|
| RCSB检索 | PDBe检索 | 考虑区域可用性 |
| get_protein_metadata | pdbe_get_entry_summary | 备选数据源 |
| 实验结构 | AlphaFold预测结构 | 无实验结构时使用 |
| get_protein_ligands | pdbe_get_binding_sites | 配体信息不可用时使用 |
Phase 3: Report Structure Profile
阶段3:生成结构分析报告
Output Structure
输出结构
Present as a Structure Profile Report. Hide search process.
markdown
undefined以结构分析报告形式呈现,隐藏搜索过程。
markdown
undefinedProtein Structure Profile: [Protein Name]
蛋白质结构分析报告:[蛋白质名称]
Search Summary
- Query: [protein name/PDB ID]
- Organism: [species]
- Structures Found: [N] experimental, [M] AlphaFold
检索摘要
- 查询内容:[蛋白质名称/PDB ID]
- 所属生物:[物种]
- 检索结果:[N]个实验结构,[M]个AlphaFold预测结构
Best Available Structure
最优可用结构
[PDB ID]: [Title]
[PDB ID]:[标题]
| Attribute | Value |
|---|---|
| PDB ID | [pdb_id] |
| UniProt | [uniprot_id] |
| Organism | [species] |
| Method | X-ray / Cryo-EM / NMR |
| Resolution | [X.XX] Å |
| Release Date | [date] |
Quality Assessment: ●●● High / ●●○ Medium / ●○○ Low
| 属性 | 取值 |
|---|---|
| PDB ID | [pdb_id] |
| UniProt | [uniprot_id] |
| 所属生物 | [物种] |
| 实验方法 | X射线 / 冷冻电镜 / NMR |
| 分辨率 | [X.XX] Å |
| 发布日期 | [日期] |
质量评估:●●● 高 / ●●○ 中 / ●○○ 低
Experimental Details
实验细节
| Parameter | Value |
|---|---|
| Method | [X-ray crystallography] |
| Resolution | [1.9 Å] |
| R-factor | [0.18] |
| R-free | [0.21] |
| Space Group | [P 21 21 21] |
| 参数 | 取值 |
|---|---|
| 实验方法 | [X射线晶体学] |
| 分辨率 | [1.9 Å] |
| R因子 | [0.18] |
| 自由R因子 | [0.21] |
| 空间群 | [P 21 21 21] |
Structure Composition
结构组成
| Component | Count | Details |
|---|---|---|
| Chains | [N] | [A (enzyme), B (inhibitor)] |
| Residues | [N] | [coverage %] |
| Ligands | [N] | [list ligand names] |
| Waters | [N] | |
| Metals | [N] | [Zn, Mg, etc.] |
| 组件 | 数量 | 详情 |
|---|---|---|
| 链 | [N] | [A(酶)、B(抑制剂)] |
| 残基 | [N] | [覆盖率 %] |
| 配体 | [N] | [配体名称列表] |
| 水分子 | [N] | |
| 金属离子 | [N] | [锌、镁等] |
Bound Ligands
结合配体
| Ligand ID | Name | Type | Binding Site |
|---|---|---|---|
| [ATP] | Adenosine triphosphate | Substrate | Active site |
| [MG] | Magnesium ion | Cofactor | Catalytic |
| 配体ID | 名称 | 类型 | 结合位点 |
|---|---|---|---|
| [ATP] | 三磷酸腺苷 | 底物 | 活性位点 |
| [MG] | 镁离子 | 辅因子 | 催化位点 |
Binding Site Details
结合位点详情
For drug discovery applications:
Site 1: Active Site
- Location: Chain A, residues 45-89
- Key residues: Asp45, Glu67, His89
- Pocket volume: [X] ų
- Druggability: High/Medium/Low
(适用于药物研发场景)
位点1:活性位点
- 位置:A链,残基45-89
- 关键残基:Asp45、Glu67、His89
- 口袋体积:[X] ų
- 成药性:高/中/低
Alternative Structures
备选结构
Ranked by quality and relevance:
| Rank | PDB ID | Resolution | Method | Ligands | Notes |
|---|---|---|---|---|---|
| 1 | [4INS] | 1.9 Å | X-ray | Zn | Best resolution |
| 2 | [3I40] | 2.1 Å | X-ray | Zn, phenol | With inhibitor |
| 3 | [1TRZ] | 2.3 Å | X-ray | None | Porcine |
按质量和相关性排序:
| 排名 | PDB ID | 分辨率 | 实验方法 | 配体 | 说明 |
|---|---|---|---|---|---|
| 1 | [4INS] | 1.9 Å | X射线 | 锌 | 分辨率最优 |
| 2 | [3I40] | 2.1 Å | X射线 | 锌、苯酚 | 结合抑制剂 |
| 3 | [1TRZ] | 2.3 Å | X射线 | 无 | 猪源 |
AlphaFold Prediction
AlphaFold预测结构
AF-[UniProt]-F1
AF-[UniProt]-F1
| Attribute | Value |
|---|---|
| UniProt | [uniprot_id] |
| Model Version | [v4] |
| Confidence (pLDDT) | [average score] |
Confidence Distribution:
- Very High (>90): [X]% of residues
- High (70-90): [X]% of residues
- Low (50-70): [X]% of residues
- Very Low (<50): [X]% of residues
Use Cases:
- ✓ Overall fold reliable
- ✓ Core domain structure
- ⚠ Loop regions uncertain
- ✗ Not suitable for binding site analysis
| 属性 | 取值 |
|---|---|
| UniProt | [uniprot_id] |
| 模型版本 | [v4] |
| 置信度(pLDDT) | [平均得分] |
置信度分布:
- 极高(>90):[X]% 残基
- 高(70-90):[X]% 残基
- 低(50-70):[X]% 残基
- 极低(<50):[X]% 残基
适用场景:
- ✓ 整体折叠结构可靠
- ✓ 核心域结构可信
- ⚠ 环区结构不确定
- ✗ 不适用于结合位点分析
Structure Comparison
结构对比
| Property | [PDB_1] | [PDB_2] | AlphaFold |
|---|---|---|---|
| Resolution | 1.9 Å | 2.5 Å | N/A (predicted) |
| Completeness | 98% | 85% | 100% |
| Ligands | Yes | No | No |
| Confidence | Experimental | Experimental | High (85 avg) |
| 特性 | [PDB_1] | [PDB_2] | AlphaFold |
|---|---|---|---|
| 分辨率 | 1.9 Å | 2.5 Å | 不适用(预测结构) |
| 完整性 | 98% | 85% | 100% |
| 配体 | 有 | 无 | 无 |
| 置信度 | 实验验证 | 实验验证 | 高(平均85分) |
Download Links
下载链接
Coordinate Files
坐标文件
| Format | PDB ID | Link |
|---|---|---|
| PDB | [4INS] | [link] |
| mmCIF | [4INS] | [link] |
| AlphaFold | [UniProt] | [link] |
| 格式 | PDB ID | 链接 |
|---|---|---|
| PDB | [4INS] | [链接] |
| mmCIF | [4INS] | [链接] |
| AlphaFold | [UniProt] | [链接] |
Database Links
数据库链接
- RCSB PDB: https://www.rcsb.org/structure/[pdb_id]
- PDBe: https://www.ebi.ac.uk/pdbe/entry/pdb/[pdb_id]
- AlphaFold: https://alphafold.ebi.ac.uk/entry/[uniprot_id]
Retrieved: [date]
---- RCSB PDB: https://www.rcsb.org/structure/[pdb_id]
- PDBe: https://www.ebi.ac.uk/pdbe/entry/pdb/[pdb_id]
- AlphaFold: https://alphafold.ebi.ac.uk/entry/[uniprot_id]
检索日期:[日期]
---Quality Assessment Tiers
质量评估等级
Experimental Structures
实验结构
| Tier | Symbol | Criteria |
|---|---|---|
| Excellent | ●●●● | X-ray <1.5Å, complete, R-free <0.22 |
| High | ●●●○ | X-ray <2.0Å OR Cryo-EM <3.0Å |
| Good | ●●○○ | X-ray 2.0-3.0Å OR Cryo-EM 3.0-4.0Å |
| Moderate | ●○○○ | X-ray >3.0Å OR NMR ensemble |
| Low | ○○○○ | >4.0Å, incomplete, or problematic |
| 等级 | 符号 | 标准 |
|---|---|---|
| 优秀 | ●●●● | X射线分辨率<1.5Å,结构完整,自由R因子<0.22 |
| 高 | ●●●○ | X射线分辨率<2.0Å 或 冷冻电镜分辨率<3.0Å |
| 良好 | ●●○○ | X射线分辨率2.0-3.0Å 或 冷冻电镜分辨率3.0-4.0Å |
| 中等 | ●○○○ | X射线分辨率>3.0Å 或 NMR集合结构 |
| 低 | ○○○○ | 分辨率>4.0Å,结构不完整或存在问题 |
Resolution Guide
分辨率指南
| Resolution | Use Case |
|---|---|
| <1.5 Å | Atomic detail, H-bond analysis |
| 1.5-2.0 Å | Drug design, mechanism studies |
| 2.0-2.5 Å | Structure-based design |
| 2.5-3.5 Å | Overall architecture, fold |
| >3.5 Å | Domain arrangement only |
| 分辨率 | 适用场景 |
|---|---|
| <1.5 Å | 原子级细节分析、氢键分析 |
| 1.5-2.0 Å | 药物设计、作用机制研究 |
| 2.0-2.5 Å | 基于结构的设计 |
| 2.5-3.5 Å | 整体架构、折叠方式分析 |
| >3.5 Å | 仅适用于结构域排布分析 |
AlphaFold Confidence
AlphaFold置信度
| pLDDT Score | Interpretation |
|---|---|
| >90 | Very high confidence, experimental-like |
| 70-90 | Good backbone confidence |
| 50-70 | Uncertain, flexible regions |
| <50 | Low confidence, likely disordered |
| pLDDT得分 | 解读 |
|---|---|
| >90 | 置信度极高,接近实验结构 |
| 70-90 | 主链结构置信度良好 |
| 50-70 | 结构不确定,属于柔性区域 |
| <50 | 置信度极低,可能为无序结构 |
Completeness Checklist
完整性检查清单
Every structure report MUST include:
每份结构报告必须包含以下内容:
For Specific PDB ID (Required)
指定PDB ID时(必填)
- PDB ID and title
- Experimental method
- Resolution (or N/A for NMR)
- Organism
- Quality assessment
- Download links
- PDB ID及标题
- 实验方法
- 分辨率(NMR结构标注为不适用)
- 所属生物
- 质量评估
- 下载链接
For Protein Name Search (Required)
按蛋白质名称检索时(必填)
- Search summary with result count
- Top structures with quality ranking
- Best structure recommendation
- AlphaFold alternative (if no experimental structure)
- 包含结果数量的检索摘要
- 按质量排序的顶级结构
- 最优结构推荐
- AlphaFold备选结构(若无实验结构)
Always Include
所有报告均需包含
- Ligand information (or "No ligands bound")
- Data sources with links
- Retrieval date
- 配体信息(无配体则标注“未结合配体”)
- 数据源及链接
- 检索日期
Common Use Cases
常见使用场景
Drug Discovery Target
药物研发靶点
User: "Get structure for EGFR kinase with inhibitor"
→ Filter for ligand-bound structures, emphasize binding site
用户:“获取结合抑制剂的EGFR激酶结构”
→ 筛选结合配体的结构,重点突出结合位点
Model Building
模型构建
User: "Find best template for homology modeling of protein X"
→ High-resolution structures, note sequence coverage
用户:“为蛋白质X的同源建模寻找最佳模板”
→ 优先选择高分辨率结构,标注序列覆盖率
Structure Comparison
结构对比
User: "Compare available SARS-CoV-2 main protease structures"
→ All structures with systematic comparison table
用户:“对比已有的SARS-CoV-2主蛋白酶结构”
→ 列出所有结构并生成系统对比表格
AlphaFold When No Experimental
无实验结构时使用AlphaFold
User: "Structure of protein with UniProt P12345"
→ Check PDB first, then AlphaFold, note confidence
用户:“获取UniProt P12345对应的蛋白质结构”
→ 先检索PDB,若无结果则提供AlphaFold结构,标注置信度
Error Handling
错误处理
| Error | Response |
|---|---|
| "PDB ID not found" | Verify 4-character format, check if obsoleted |
| "No structures for protein" | Offer AlphaFold prediction, suggest similar proteins |
| "Download failed" | Retry once, provide alternative link |
| "Resolution unavailable" | Likely NMR/model, note in assessment |
| 错误类型 | 应对方式 |
|---|---|
| “PDB ID未找到” | 验证4字符格式,检查是否已被废弃 |
| “无该蛋白质的结构数据” | 提供AlphaFold预测结构,建议检索相似蛋白质 |
| “下载失败” | 重试一次,提供备选链接 |
| “分辨率信息不可用” | 大概率为NMR或模型结构,在评估中注明 |
Tool Reference
工具参考
RCSB PDB (Experimental Structures)
| Tool | Purpose |
|---|---|
| Name-based search |
| Basic info |
| Method details |
| Quality metric |
| Bound molecules |
| Coordinate files |
| Homologs |
PDBe (European PDB)
| Tool | Purpose |
|---|---|
| Overview |
| Molecular entities |
| Experimental data |
| Ligand pockets |
AlphaFold (Predictions)
| Tool | Purpose |
|---|---|
| Get prediction |
| Search predictions |
RCSB PDB(实验结构)
| 工具 | 用途 |
|---|---|
| 按名称检索 |
| 获取基础信息 |
| 获取实验细节 |
| 获取质量指标 |
| 获取结合分子信息 |
| 下载坐标文件 |
| 获取同源结构 |
PDBe(欧洲PDB数据库)
| 工具 | 用途 |
|---|---|
| 获取条目概述 |
| 获取分子实体信息 |
| 获取实验数据 |
| 获取配体结合位点 |
AlphaFold(预测结构)
| 工具 | 用途 |
|---|---|
| 获取预测结构 |
| 检索预测结构 |