tooluniverse-protein-structure-retrieval

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Protein Structure Data Retrieval

蛋白质结构数据检索

Retrieve protein structures with proper disambiguation, quality assessment, and comprehensive metadata.
IMPORTANT: Always use English terms in tool calls (protein names, organism names), even if the user writes in another language. Only try original-language terms as a fallback if English returns no results. Respond in the user's language.
通过正确的消歧、质量评估和全面的元数据获取蛋白质结构。
重要提示:工具调用中始终使用英文术语(蛋白质名称、生物名称),即使用户使用其他语言提问。仅当英文检索无结果时,才尝试使用原语言术语作为备选。用用户的语言回复。

Workflow Overview

工作流程概述

Phase 0: Clarify (if needed)
Phase 1: Disambiguate Protein Identity
Phase 2: Retrieve Structures (Internal)
Phase 3: Report Structure Profile

阶段0:澄清(必要时)
阶段1:蛋白质身份消歧
阶段2:检索结构(内部操作)
阶段3:生成结构分析报告

Phase 0: Clarification (When Needed)

阶段0:必要时进行澄清

Ask the user ONLY if:
  • Protein name matches multiple genes/families (e.g., "kinase" → which kinase?)
  • Organism not specified for conserved proteins
  • Intent unclear: need experimental structure vs AlphaFold prediction?
Skip clarification for:
  • Specific PDB IDs (4-character codes)
  • UniProt accessions
  • Unambiguous protein names with organism

仅在以下情况询问用户:
  • 蛋白质名称对应多个基因/家族(例如,“激酶”→具体是哪种激酶?)
  • 保守蛋白质未指定所属生物
  • 用户意图不明确:需要实验结构还是AlphaFold预测结构?
以下情况无需澄清:
  • 明确的PDB ID(4字符编码)
  • UniProt登录号
  • 已指定生物的无歧义蛋白质名称

Phase 1: Protein Disambiguation

阶段1:蛋白质身份消歧

1.1 Resolve Protein Identity

1.1 解析蛋白质身份

python
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()
python
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()

Strategy depends on input type

根据输入类型选择策略

if user_provided_pdb_id: # Direct structure retrieval pdb_id = user_provided_pdb_id.upper()
elif user_provided_uniprot: # Get UniProt info, then search structures uniprot_id = user_provided_uniprot # Can also get AlphaFold structure af_structure = tu.tools.alphafold_get_structure_by_uniprot( uniprot_id=uniprot_id )
elif user_provided_protein_name: # Search by name result = tu.tools.search_structures_by_protein_name( protein_name=protein_name )
undefined
if user_provided_pdb_id: # 直接检索结构 pdb_id = user_provided_pdb_id.upper()
elif user_provided_uniprot: # 获取UniProt信息,然后检索结构 uniprot_id = user_provided_uniprot # 也可获取AlphaFold结构 af_structure = tu.tools.alphafold_get_structure_by_uniprot( uniprot_id=uniprot_id )
elif user_provided_protein_name: # 按名称检索 result = tu.tools.search_structures_by_protein_name( protein_name=protein_name )
undefined

1.2 Identity Resolution Checklist

1.2 身份解析检查清单

  • Protein name/gene identified
  • Organism confirmed
  • UniProt accession (if available)
  • Isoform/variant specified (if relevant)
  • 已确定蛋白质名称/基因
  • 已确认所属生物
  • 已获取UniProt登录号(若有)
  • 已指定亚型/变异体(如相关)

1.3 Handle Naming Collisions

1.3 处理命名冲突

Common ambiguous terms:
TermAmbiguityResolution
"kinase"Hundreds of kinasesAsk which kinase (EGFR, CDK2, etc.)
"receptor"Many receptor typesSpecify receptor family
"protease"Multiple familiesAsk serine/cysteine/metallo/etc.
"hemoglobin"ClearProceed (α/β chain specified if needed)
"insulin"ClearProceed

常见歧义术语:
术语歧义点解决方式
"kinase"存在数百种激酶询问具体激酶类型(如EGFR、CDK2等)
"receptor"存在多种受体类型指定受体家族
"protease"分属多个家族询问具体类型(丝氨酸/半胱氨酸/金属蛋白酶等)
"hemoglobin"无歧义直接继续
"insulin"无歧义直接继续

Phase 2: Data Retrieval (Internal)

阶段2:数据检索(内部操作)

Retrieve all data silently. Do NOT narrate the search process.
静默完成所有数据检索,无需向用户说明搜索过程。

2.1 Search Structures

2.1 检索结构

python
undefined
python
undefined

Search by protein name

按蛋白质名称检索

result = tu.tools.search_structures_by_protein_name( protein_name=protein_name )
result = tu.tools.search_structures_by_protein_name( protein_name=protein_name )

Filter results by quality

按质量过滤结果

high_res = [ entry for entry in result["data"] if entry.get("resolution") and entry["resolution"] < 2.5 ]
undefined
high_res = [ entry for entry in result["data"] if entry.get("resolution") and entry["resolution"] < 2.5 ]
undefined

2.2 Get Structure Details

2.2 获取结构详情

For each relevant structure:
python
pdb_id = "4INS"
针对每个相关结构执行以下操作:
python
pdb_id = "4INS"

Basic metadata

基础元数据

metadata = tu.tools.get_protein_metadata_by_pdb_id(pdb_id=pdb_id)
metadata = tu.tools.get_protein_metadata_by_pdb_id(pdb_id=pdb_id)

Experimental details

实验细节

exp_details = tu.tools.get_protein_experimental_details_by_pdb_id( pdb_id=pdb_id )
exp_details = tu.tools.get_protein_experimental_details_by_pdb_id( pdb_id=pdb_id )

Resolution (if X-ray)

分辨率(X射线结构适用)

resolution = tu.tools.get_protein_resolution_by_pdb_id(pdb_id=pdb_id)
resolution = tu.tools.get_protein_resolution_by_pdb_id(pdb_id=pdb_id)

Bound ligands

结合配体

ligands = tu.tools.get_protein_ligands_by_pdb_id(pdb_id=pdb_id)
ligands = tu.tools.get_protein_ligands_by_pdb_id(pdb_id=pdb_id)

Similar structures

相似结构

similar = tu.tools.get_similar_structures_by_pdb_id( pdb_id=pdb_id, cutoff=2.0 )
undefined
similar = tu.tools.get_similar_structures_by_pdb_id( pdb_id=pdb_id, cutoff=2.0 )
undefined

2.3 PDBe Additional Data

2.3 PDBe补充数据

python
undefined
python
undefined

Entry summary

条目摘要

summary = tu.tools.pdbe_get_entry_summary(pdb_id=pdb_id)
summary = tu.tools.pdbe_get_entry_summary(pdb_id=pdb_id)

Molecular entities

分子实体

molecules = tu.tools.pdbe_get_molecules(pdb_id=pdb_id)
molecules = tu.tools.pdbe_get_molecules(pdb_id=pdb_id)

Binding sites

结合位点

binding_sites = tu.tools.pdbe_get_binding_sites(pdb_id=pdb_id)
undefined
binding_sites = tu.tools.pdbe_get_binding_sites(pdb_id=pdb_id)
undefined

2.4 AlphaFold Predictions

2.4 AlphaFold预测结构

python
undefined
python
undefined

When no experimental structure exists, or for comparison

当无实验结构或需要对比时使用

if uniprot_id: af_structure = tu.tools.alphafold_get_structure_by_uniprot( uniprot_id=uniprot_id )
undefined
if uniprot_id: af_structure = tu.tools.alphafold_get_structure_by_uniprot( uniprot_id=uniprot_id )
undefined

Fallback Chains

备选检索链

PrimaryFallbackNotes
RCSB searchPDBe searchRegional availability
get_protein_metadatapdbe_get_entry_summaryAlternative source
Experimental structureAlphaFold predictionNo experimental structure
get_protein_ligandspdbe_get_binding_sitesLigand info unavailable

优先方案备选方案说明
RCSB检索PDBe检索考虑区域可用性
get_protein_metadatapdbe_get_entry_summary备选数据源
实验结构AlphaFold预测结构无实验结构时使用
get_protein_ligandspdbe_get_binding_sites配体信息不可用时使用

Phase 3: Report Structure Profile

阶段3:生成结构分析报告

Output Structure

输出结构

Present as a Structure Profile Report. Hide search process.
markdown
undefined
结构分析报告形式呈现,隐藏搜索过程。
markdown
undefined

Protein Structure Profile: [Protein Name]

蛋白质结构分析报告:[蛋白质名称]

Search Summary
  • Query: [protein name/PDB ID]
  • Organism: [species]
  • Structures Found: [N] experimental, [M] AlphaFold

检索摘要
  • 查询内容:[蛋白质名称/PDB ID]
  • 所属生物:[物种]
  • 检索结果:[N]个实验结构,[M]个AlphaFold预测结构

Best Available Structure

最优可用结构

[PDB ID]: [Title]

[PDB ID]:[标题]

AttributeValue
PDB ID[pdb_id]
UniProt[uniprot_id]
Organism[species]
MethodX-ray / Cryo-EM / NMR
Resolution[X.XX] Å
Release Date[date]
Quality Assessment: ●●● High / ●●○ Medium / ●○○ Low
属性取值
PDB ID[pdb_id]
UniProt[uniprot_id]
所属生物[物种]
实验方法X射线 / 冷冻电镜 / NMR
分辨率[X.XX] Å
发布日期[日期]
质量评估:●●● 高 / ●●○ 中 / ●○○ 低

Experimental Details

实验细节

ParameterValue
Method[X-ray crystallography]
Resolution[1.9 Å]
R-factor[0.18]
R-free[0.21]
Space Group[P 21 21 21]
参数取值
实验方法[X射线晶体学]
分辨率[1.9 Å]
R因子[0.18]
自由R因子[0.21]
空间群[P 21 21 21]

Structure Composition

结构组成

ComponentCountDetails
Chains[N][A (enzyme), B (inhibitor)]
Residues[N][coverage %]
Ligands[N][list ligand names]
Waters[N]
Metals[N][Zn, Mg, etc.]
组件数量详情
[N][A(酶)、B(抑制剂)]
残基[N][覆盖率 %]
配体[N][配体名称列表]
水分子[N]
金属离子[N][锌、镁等]

Bound Ligands

结合配体

Ligand IDNameTypeBinding Site
[ATP]Adenosine triphosphateSubstrateActive site
[MG]Magnesium ionCofactorCatalytic
配体ID名称类型结合位点
[ATP]三磷酸腺苷底物活性位点
[MG]镁离子辅因子催化位点

Binding Site Details

结合位点详情

For drug discovery applications:
Site 1: Active Site
  • Location: Chain A, residues 45-89
  • Key residues: Asp45, Glu67, His89
  • Pocket volume: [X] ų
  • Druggability: High/Medium/Low

(适用于药物研发场景)
位点1:活性位点
  • 位置:A链,残基45-89
  • 关键残基:Asp45、Glu67、His89
  • 口袋体积:[X] ų
  • 成药性:高/中/低

Alternative Structures

备选结构

Ranked by quality and relevance:
RankPDB IDResolutionMethodLigandsNotes
1[4INS]1.9 ÅX-rayZnBest resolution
2[3I40]2.1 ÅX-rayZn, phenolWith inhibitor
3[1TRZ]2.3 ÅX-rayNonePorcine

按质量和相关性排序:
排名PDB ID分辨率实验方法配体说明
1[4INS]1.9 ÅX射线分辨率最优
2[3I40]2.1 ÅX射线锌、苯酚结合抑制剂
3[1TRZ]2.3 ÅX射线猪源

AlphaFold Prediction

AlphaFold预测结构

AF-[UniProt]-F1

AF-[UniProt]-F1

AttributeValue
UniProt[uniprot_id]
Model Version[v4]
Confidence (pLDDT)[average score]
Confidence Distribution:
  • Very High (>90): [X]% of residues
  • High (70-90): [X]% of residues
  • Low (50-70): [X]% of residues
  • Very Low (<50): [X]% of residues
Use Cases:
  • ✓ Overall fold reliable
  • ✓ Core domain structure
  • ⚠ Loop regions uncertain
  • ✗ Not suitable for binding site analysis

属性取值
UniProt[uniprot_id]
模型版本[v4]
置信度(pLDDT)[平均得分]
置信度分布:
  • 极高(>90):[X]% 残基
  • 高(70-90):[X]% 残基
  • 低(50-70):[X]% 残基
  • 极低(<50):[X]% 残基
适用场景:
  • ✓ 整体折叠结构可靠
  • ✓ 核心域结构可信
  • ⚠ 环区结构不确定
  • ✗ 不适用于结合位点分析

Structure Comparison

结构对比

Property[PDB_1][PDB_2]AlphaFold
Resolution1.9 Å2.5 ÅN/A (predicted)
Completeness98%85%100%
LigandsYesNoNo
ConfidenceExperimentalExperimentalHigh (85 avg)

特性[PDB_1][PDB_2]AlphaFold
分辨率1.9 Å2.5 Å不适用(预测结构)
完整性98%85%100%
配体
置信度实验验证实验验证高(平均85分)

Download Links

下载链接

Coordinate Files

坐标文件

FormatPDB IDLink
PDB[4INS][link]
mmCIF[4INS][link]
AlphaFold[UniProt][link]
格式PDB ID链接
PDB[4INS][链接]
mmCIF[4INS][链接]
AlphaFold[UniProt][链接]

Database Links

数据库链接

Quality Assessment Tiers

质量评估等级

Experimental Structures

实验结构

TierSymbolCriteria
Excellent●●●●X-ray <1.5Å, complete, R-free <0.22
High●●●○X-ray <2.0Å OR Cryo-EM <3.0Å
Good●●○○X-ray 2.0-3.0Å OR Cryo-EM 3.0-4.0Å
Moderate●○○○X-ray >3.0Å OR NMR ensemble
Low○○○○>4.0Å, incomplete, or problematic
等级符号标准
优秀●●●●X射线分辨率<1.5Å,结构完整,自由R因子<0.22
●●●○X射线分辨率<2.0Å 或 冷冻电镜分辨率<3.0Å
良好●●○○X射线分辨率2.0-3.0Å 或 冷冻电镜分辨率3.0-4.0Å
中等●○○○X射线分辨率>3.0Å 或 NMR集合结构
○○○○分辨率>4.0Å,结构不完整或存在问题

Resolution Guide

分辨率指南

ResolutionUse Case
<1.5 ÅAtomic detail, H-bond analysis
1.5-2.0 ÅDrug design, mechanism studies
2.0-2.5 ÅStructure-based design
2.5-3.5 ÅOverall architecture, fold
>3.5 ÅDomain arrangement only
分辨率适用场景
<1.5 Å原子级细节分析、氢键分析
1.5-2.0 Å药物设计、作用机制研究
2.0-2.5 Å基于结构的设计
2.5-3.5 Å整体架构、折叠方式分析
>3.5 Å仅适用于结构域排布分析

AlphaFold Confidence

AlphaFold置信度

pLDDT ScoreInterpretation
>90Very high confidence, experimental-like
70-90Good backbone confidence
50-70Uncertain, flexible regions
<50Low confidence, likely disordered

pLDDT得分解读
>90置信度极高,接近实验结构
70-90主链结构置信度良好
50-70结构不确定,属于柔性区域
<50置信度极低,可能为无序结构

Completeness Checklist

完整性检查清单

Every structure report MUST include:
每份结构报告必须包含以下内容:

For Specific PDB ID (Required)

指定PDB ID时(必填)

  • PDB ID and title
  • Experimental method
  • Resolution (or N/A for NMR)
  • Organism
  • Quality assessment
  • Download links
  • PDB ID及标题
  • 实验方法
  • 分辨率(NMR结构标注为不适用)
  • 所属生物
  • 质量评估
  • 下载链接

For Protein Name Search (Required)

按蛋白质名称检索时(必填)

  • Search summary with result count
  • Top structures with quality ranking
  • Best structure recommendation
  • AlphaFold alternative (if no experimental structure)
  • 包含结果数量的检索摘要
  • 按质量排序的顶级结构
  • 最优结构推荐
  • AlphaFold备选结构(若无实验结构)

Always Include

所有报告均需包含

  • Ligand information (or "No ligands bound")
  • Data sources with links
  • Retrieval date

  • 配体信息(无配体则标注“未结合配体”)
  • 数据源及链接
  • 检索日期

Common Use Cases

常见使用场景

Drug Discovery Target

药物研发靶点

User: "Get structure for EGFR kinase with inhibitor" → Filter for ligand-bound structures, emphasize binding site
用户:“获取结合抑制剂的EGFR激酶结构” → 筛选结合配体的结构,重点突出结合位点

Model Building

模型构建

User: "Find best template for homology modeling of protein X" → High-resolution structures, note sequence coverage
用户:“为蛋白质X的同源建模寻找最佳模板” → 优先选择高分辨率结构,标注序列覆盖率

Structure Comparison

结构对比

User: "Compare available SARS-CoV-2 main protease structures" → All structures with systematic comparison table
用户:“对比已有的SARS-CoV-2主蛋白酶结构” → 列出所有结构并生成系统对比表格

AlphaFold When No Experimental

无实验结构时使用AlphaFold

User: "Structure of protein with UniProt P12345" → Check PDB first, then AlphaFold, note confidence

用户:“获取UniProt P12345对应的蛋白质结构” → 先检索PDB,若无结果则提供AlphaFold结构,标注置信度

Error Handling

错误处理

ErrorResponse
"PDB ID not found"Verify 4-character format, check if obsoleted
"No structures for protein"Offer AlphaFold prediction, suggest similar proteins
"Download failed"Retry once, provide alternative link
"Resolution unavailable"Likely NMR/model, note in assessment

错误类型应对方式
“PDB ID未找到”验证4字符格式,检查是否已被废弃
“无该蛋白质的结构数据”提供AlphaFold预测结构,建议检索相似蛋白质
“下载失败”重试一次,提供备选链接
“分辨率信息不可用”大概率为NMR或模型结构,在评估中注明

Tool Reference

工具参考

RCSB PDB (Experimental Structures)
ToolPurpose
search_structures_by_protein_name
Name-based search
get_protein_metadata_by_pdb_id
Basic info
get_protein_experimental_details_by_pdb_id
Method details
get_protein_resolution_by_pdb_id
Quality metric
get_protein_ligands_by_pdb_id
Bound molecules
download_pdb_structure_file
Coordinate files
get_similar_structures_by_pdb_id
Homologs
PDBe (European PDB)
ToolPurpose
pdbe_get_entry_summary
Overview
pdbe_get_molecules
Molecular entities
pdbe_get_experiment_info
Experimental data
pdbe_get_binding_sites
Ligand pockets
AlphaFold (Predictions)
ToolPurpose
alphafold_get_structure_by_uniprot
Get prediction
alphafold_search_structures
Search predictions
RCSB PDB(实验结构)
工具用途
search_structures_by_protein_name
按名称检索
get_protein_metadata_by_pdb_id
获取基础信息
get_protein_experimental_details_by_pdb_id
获取实验细节
get_protein_resolution_by_pdb_id
获取质量指标
get_protein_ligands_by_pdb_id
获取结合分子信息
download_pdb_structure_file
下载坐标文件
get_similar_structures_by_pdb_id
获取同源结构
PDBe(欧洲PDB数据库)
工具用途
pdbe_get_entry_summary
获取条目概述
pdbe_get_molecules
获取分子实体信息
pdbe_get_experiment_info
获取实验数据
pdbe_get_binding_sites
获取配体结合位点
AlphaFold(预测结构)
工具用途
alphafold_get_structure_by_uniprot
获取预测结构
alphafold_search_structures
检索预测结构