pdb-database
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePDB Database
PDB数据库
Overview
概述
RCSB PDB is the worldwide repository for 3D structural data of biological macromolecules. Search for structures, retrieve coordinates and metadata, perform sequence and structure similarity searches across 200,000+ experimentally determined structures and computed models.
RCSB PDB是全球生物大分子3D结构数据的存储库。您可以在超过20万个实验测定结构和计算模型中搜索结构、检索坐标和元数据,执行序列和结构相似性搜索。
When to Use This Skill
何时使用该工具
This skill should be used when:
- Searching for protein or nucleic acid 3D structures by text, sequence, or structural similarity
- Downloading coordinate files in PDB, mmCIF, or BinaryCIF formats
- Retrieving structural metadata, experimental methods, or quality metrics
- Performing batch operations across multiple structures
- Integrating PDB data into computational workflows for drug discovery, protein engineering, or structural biology research
当您有以下需求时,可使用本工具:
- 通过文本、序列或结构相似性搜索蛋白质或核酸的3D结构
- 下载PDB、mmCIF或BinaryCIF格式的坐标文件
- 检索结构元数据、实验方法或质量指标
- 对多个结构执行批量操作
- 将PDB数据集成到药物研发、蛋白质工程或结构生物学研究的计算工作流中
Core Capabilities
核心能力
1. Searching for Structures
1. 搜索结构
Find PDB entries using various search criteria:
Text Search: Search by protein name, keywords, or descriptions
python
from rcsbapi.search import TextQuery
query = TextQuery("hemoglobin")
results = list(query())
print(f"Found {len(results)} structures")Attribute Search: Query specific properties (organism, resolution, method, etc.)
python
from rcsbapi.search import AttributeQuery
from rcsbapi.search.attrs import rcsb_entity_source_organism使用多种搜索条件查找PDB条目:
文本搜索: 通过蛋白质名称、关键词或描述进行搜索
python
from rcsbapi.search import TextQuery
query = TextQuery("hemoglobin")
results = list(query())
print(f"Found {len(results)} structures")属性搜索: 查询特定属性(生物种类、分辨率、实验方法等)
python
from rcsbapi.search import AttributeQuery
from rcsbapi.search.attrs import rcsb_entity_source_organismFind human protein structures
Find human protein structures
query = AttributeQuery(
attribute=rcsb_entity_source_organism.scientific_name,
operator="exact_match",
value="Homo sapiens"
)
results = list(query())
**Sequence Similarity:** Find structures similar to a given sequence
```python
from rcsbapi.search import SequenceQuery
query = SequenceQuery(
value="MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHHYREQIKRVKDSEDVPMVLVGNKCDLPSRTVDTKQAQDLARSYGIPFIETSAKTRQGVDDAFYTLVREIRKHKEKMSKDGKKKKKKSKTKCVIM",
evalue_cutoff=0.1,
identity_cutoff=0.9
)
results = list(query())Structure Similarity: Find structures with similar 3D geometry
python
from rcsbapi.search import StructSimilarityQuery
query = StructSimilarityQuery(
structure_search_type="entry",
entry_id="4HHB" # Hemoglobin
)
results = list(query())Combining Queries: Use logical operators to build complex searches
python
from rcsbapi.search import TextQuery, AttributeQuery
from rcsbapi.search.attrs import rcsb_entry_infoquery = AttributeQuery(
attribute=rcsb_entity_source_organism.scientific_name,
operator="exact_match",
value="Homo sapiens"
)
results = list(query())
**序列相似性:** 查找与给定序列相似的结构
```python
from rcsbapi.search import SequenceQuery
query = SequenceQuery(
value="MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHHYREQIKRVKDSEDVPMVLVGNKCDLPSRTVDTKQAQDLARSYGIPFIETSAKTRQGVDDAFYTLVREIRKHKEKMSKDGKKKKKKSKTKCVIM",
evalue_cutoff=0.1,
identity_cutoff=0.9
)
results = list(query())结构相似性: 查找具有相似3D几何结构的条目
python
from rcsbapi.search import StructSimilarityQuery
query = StructSimilarityQuery(
structure_search_type="entry",
entry_id="4HHB" # Hemoglobin
)
results = list(query())组合查询: 使用逻辑运算符构建复杂搜索
python
from rcsbapi.search import TextQuery, AttributeQuery
from rcsbapi.search.attrs import rcsb_entry_infoHigh-resolution human proteins
High-resolution human proteins
query1 = AttributeQuery(
attribute=rcsb_entity_source_organism.scientific_name,
operator="exact_match",
value="Homo sapiens"
)
query2 = AttributeQuery(
attribute=rcsb_entry_info.resolution_combined,
operator="less",
value=2.0
)
combined_query = query1 & query2 # AND operation
results = list(combined_query())
undefinedquery1 = AttributeQuery(
attribute=rcsb_entity_source_organism.scientific_name,
operator="exact_match",
value="Homo sapiens"
)
query2 = AttributeQuery(
attribute=rcsb_entry_info.resolution_combined,
operator="less",
value=2.0
)
combined_query = query1 & query2 # AND operation
results = list(combined_query())
undefined2. Retrieving Structure Data
2. 检索结构数据
Access detailed information about specific PDB entries:
Basic Entry Information:
python
from rcsbapi.data import Schema, fetch访问特定PDB条目的详细信息:
基础条目信息:
python
from rcsbapi.data import Schema, fetchGet entry-level data
Get entry-level data
entry_data = fetch("4HHB", schema=Schema.ENTRY)
print(entry_data["struct"]["title"])
print(entry_data["exptl"][0]["method"])
**Polymer Entity Information:**
```pythonentry_data = fetch("4HHB", schema=Schema.ENTRY)
print(entry_data["struct"]["title"])
print(entry_data["exptl"][0]["method"])
**聚合物实体信息:**
```pythonGet protein/nucleic acid information
Get protein/nucleic acid information
entity_data = fetch("4HHB_1", schema=Schema.POLYMER_ENTITY)
print(entity_data["entity_poly"]["pdbx_seq_one_letter_code"])
**Using GraphQL for Flexible Queries:**
```python
from rcsbapi.data import fetchentity_data = fetch("4HHB_1", schema=Schema.POLYMER_ENTITY)
print(entity_data["entity_poly"]["pdbx_seq_one_letter_code"])
**使用GraphQL进行灵活查询:**
```python
from rcsbapi.data import fetchCustom GraphQL query
Custom GraphQL query
query = """
{
entry(entry_id: "4HHB") {
struct {
title
}
exptl {
method
}
rcsb_entry_info {
resolution_combined
deposited_atom_count
}
}
}
"""
data = fetch(query_type="graphql", query=query)
undefinedquery = """
{
entry(entry_id: "4HHB") {
struct {
title
}
exptl {
method
}
rcsb_entry_info {
resolution_combined
deposited_atom_count
}
}
}
"""
data = fetch(query_type="graphql", query=query)
undefined3. Downloading Structure Files
3. 下载结构文件
Retrieve coordinate files in various formats:
Download Methods:
- PDB format (legacy text format):
https://files.rcsb.org/download/{PDB_ID}.pdb - mmCIF format (modern standard):
https://files.rcsb.org/download/{PDB_ID}.cif - BinaryCIF (compressed binary): Use ModelServer API for efficient access
- Biological assembly: (for assembly 1)
https://files.rcsb.org/download/{PDB_ID}.pdb1
Example Download:
python
import requests
pdb_id = "4HHB"获取多种格式的坐标文件:
下载方式:
- PDB格式(传统文本格式):
https://files.rcsb.org/download/{PDB_ID}.pdb - mmCIF格式(现代标准格式):
https://files.rcsb.org/download/{PDB_ID}.cif - BinaryCIF(压缩二进制格式):使用ModelServer API高效访问
- 生物组装体:(对应组装体1)
https://files.rcsb.org/download/{PDB_ID}.pdb1
下载示例:
python
import requests
pdb_id = "4HHB"Download PDB format
Download PDB format
pdb_url = f"https://files.rcsb.org/download/{pdb_id}.pdb"
response = requests.get(pdb_url)
with open(f"{pdb_id}.pdb", "w") as f:
f.write(response.text)
pdb_url = f"https://files.rcsb.org/download/{pdb_id}.pdb"
response = requests.get(pdb_url)
with open(f"{pdb_id}.pdb", "w") as f:
f.write(response.text)
Download mmCIF format
Download mmCIF format
cif_url = f"https://files.rcsb.org/download/{pdb_id}.cif"
response = requests.get(cif_url)
with open(f"{pdb_id}.cif", "w") as f:
f.write(response.text)
undefinedcif_url = f"https://files.rcsb.org/download/{pdb_id}.cif"
response = requests.get(cif_url)
with open(f"{pdb_id}.cif", "w") as f:
f.write(response.text)
undefined4. Working with Structure Data
4. 处理结构数据
Common operations with retrieved structures:
Parse and Analyze Coordinates:
Use BioPython or other structural biology libraries to work with downloaded files:
python
from Bio.PDB import PDBParser
parser = PDBParser()
structure = parser.get_structure("protein", "4HHB.pdb")对检索到的结构执行常见操作:
解析并分析坐标:
使用BioPython或其他结构生物学库处理下载的文件:
python
from Bio.PDB import PDBParser
parser = PDBParser()
structure = parser.get_structure("protein", "4HHB.pdb")Iterate through atoms
Iterate through atoms
for model in structure:
for chain in model:
for residue in chain:
for atom in residue:
print(atom.get_coord())
**Extract Metadata:**
```python
from rcsbapi.data import fetch, Schemafor model in structure:
for chain in model:
for residue in chain:
for atom in residue:
print(atom.get_coord())
**提取元数据:**
```python
from rcsbapi.data import fetch, SchemaGet experimental details
Get experimental details
data = fetch("4HHB", schema=Schema.ENTRY)
resolution = data.get("rcsb_entry_info", {}).get("resolution_combined")
method = data.get("exptl", [{}])[0].get("method")
deposition_date = data.get("rcsb_accession_info", {}).get("deposit_date")
print(f"Resolution: {resolution} Å")
print(f"Method: {method}")
print(f"Deposited: {deposition_date}")
undefineddata = fetch("4HHB", schema=Schema.ENTRY)
resolution = data.get("rcsb_entry_info", {}).get("resolution_combined")
method = data.get("exptl", [{}])[0].get("method")
deposition_date = data.get("rcsb_accession_info", {}).get("deposit_date")
print(f"Resolution: {resolution} Å")
print(f"Method: {method}")
print(f"Deposited: {deposition_date}")
undefined5. Batch Operations
5. 批量操作
Process multiple structures efficiently:
python
from rcsbapi.data import fetch, Schema
pdb_ids = ["4HHB", "1MBN", "1GZX"] # Hemoglobin, myoglobin, etc.
results = {}
for pdb_id in pdb_ids:
try:
data = fetch(pdb_id, schema=Schema.ENTRY)
results[pdb_id] = {
"title": data["struct"]["title"],
"resolution": data.get("rcsb_entry_info", {}).get("resolution_combined"),
"organism": data.get("rcsb_entity_source_organism", [{}])[0].get("scientific_name")
}
except Exception as e:
print(f"Error fetching {pdb_id}: {e}")高效处理多个结构:
python
from rcsbapi.data import fetch, Schema
pdb_ids = ["4HHB", "1MBN", "1GZX"] # Hemoglobin, myoglobin, etc.
results = {}
for pdb_id in pdb_ids:
try:
data = fetch(pdb_id, schema=Schema.ENTRY)
results[pdb_id] = {
"title": data["struct"]["title"],
"resolution": data.get("rcsb_entry_info", {}).get("resolution_combined"),
"organism": data.get("rcsb_entity_source_organism", [{}])[0].get("scientific_name")
}
except Exception as e:
print(f"Error fetching {pdb_id}: {e}")Display results
Display results
for pdb_id, info in results.items():
print(f"\n{pdb_id}: {info['title']}")
print(f" Resolution: {info['resolution']} Å")
print(f" Organism: {info['organism']}")
undefinedfor pdb_id, info in results.items():
print(f"\n{pdb_id}: {info['title']}")
print(f" Resolution: {info['resolution']} Å")
print(f" Organism:{info['organism']}")
undefinedPython Package Installation
Python包安装
Install the official RCSB PDB Python API client:
bash
undefined安装官方RCSB PDB Python API客户端:
bash
undefinedCurrent recommended package
Current recommended package
uv pip install rcsb-api
uv pip install rcsb-api
For legacy code (deprecated, use rcsb-api instead)
For legacy code (deprecated, use rcsb-api instead)
uv pip install rcsbsearchapi
The `rcsb-api` package provides unified access to both Search and Data APIs through the `rcsbapi.search` and `rcsbapi.data` modules.uv pip install rcsbsearchapi
`rcsb-api`包通过`rcsbapi.search`和`rcsbapi.data`模块提供对搜索API和数据API的统一访问。Common Use Cases
常见使用场景
Drug Discovery
—
- Search for structures of drug targets
- Analyze ligand binding sites
- Compare protein-ligand complexes
- Identify similar binding pockets
###药物研发
搜索药物靶点的结构
分析配体结合位点
比较蛋白质-配体复合物
识别相似结合口袋
###蛋白质工程
查找用于建模的同源结构
分析序列-结构关系
比较突变体结构
研究生蛋白质定性和动力学
###结构生物学研究
下载结构用于计算分析
构建基于结构的比对
分析结构特征(二级结构结构域)
比较实验方法和质量指标
###教育与可视化
检索结构用于教学
生成分子可视化效果
探索结构-功能关系
研究进化保守性
##核心概念
**PDB ID:**每个结构条目的唯一字符标识符(例如"HHB"AlphaFold和ModelArchive条目以"AF_"MA_"前缀开头。
**mmCIF/PDBx:**现代文件格式采用键值结构取代统PDB格式以适应大型结构。
**生物组装体:**大分子的功能形式可能包含来自不对称单元的多个链拷贝。
**分辨率:**晶体结构衡量标准数值越小=细节越丰富高质量结构典型范围:.5-.5 Å。
**实体:**结构中的唯分子组件(蛋白质链DNA配体
##资源
工具在目录包含参文档:
references/references/api_reference.md
全面API文档涵盖
详细API端点规范
高级查询模式和示例
数据架构参考
请求限制和最佳实践
常见题排查
需要深入了解API复杂查询构建详数据架构信息时请参考此文档。
##额外资源
- RCSB PDB官网:https://www.rcsb.org
- PDB教育门户:https://pdb101.rcsb.org
- API文档:https://www.rcsb.org/docs/programmatic-access/web-apis-overview
- Python包文档:https://rcsbapi.readthedocs.io/
- 数据API文档:https://data.rcsb.org/
- GitHub仓库:https://github.com/rcsb/py-rcsb-api
Protein Engineering
—
- Find homologous structures for modeling
- Analyze sequence-structure relationships
- Compare mutant structures
- Study protein stability and dynamics
—
Structural Biology Research
—
- Download structures for computational analysis
- Build structure-based alignments
- Analyze structural features (secondary structure, domains)
- Compare experimental methods and quality metrics
—
Education and Visualization
—
- Retrieve structures for teaching
- Generate molecular visualizations
- Explore structure-function relationships
- Study evolutionary conservation
—
Key Concepts
—
PDB ID: Unique 4-character identifier (e.g., "4HHB") for each structure entry. AlphaFold and ModelArchive entries start with "AF_" or "MA_" prefixes.
mmCIF/PDBx: Modern file format that uses key-value structure, replacing legacy PDB format for large structures.
Biological Assembly: The functional form of a macromolecule, which may contain multiple copies of chains from the asymmetric unit.
Resolution: Measure of detail in crystallographic structures (lower values = higher detail). Typical range: 1.5-3.5 Å for high-quality structures.
Entity: A unique molecular component in a structure (protein chain, DNA, ligand, etc.).
—
Resources
—
This skill includes reference documentation in the directory:
references/—
references/api_reference.md
—
Comprehensive API documentation covering:
- Detailed API endpoint specifications
- Advanced query patterns and examples
- Data schema reference
- Rate limiting and best practices
- Troubleshooting common issues
Use this reference when you need in-depth information about API capabilities, complex query construction, or detailed data schema information.
—
Additional Resources
—
- RCSB PDB Website: https://www.rcsb.org
- PDB-101 Educational Portal: https://pdb101.rcsb.org
- API Documentation: https://www.rcsb.org/docs/programmatic-access/web-apis-overview
- Python Package Docs: https://rcsbapi.readthedocs.io/
- Data API Documentation: https://data.rcsb.org/
- GitHub Repository: https://github.com/rcsb/py-rcsb-api
—