alphafold-database
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAlphaFold Database
AlphaFold数据库
Overview
概述
AlphaFold DB is a public repository of AI-predicted 3D protein structures for over 200 million proteins, maintained by DeepMind and EMBL-EBI. Access structure predictions with confidence metrics, download coordinate files, retrieve bulk datasets, and integrate predictions into computational workflows.
AlphaFold DB是由DeepMind和EMBL-EBI维护的公共存储库,包含超过2亿个AI预测的3D蛋白质结构。您可以访问带置信度指标的结构预测结果、下载坐标文件、获取批量数据集,并将预测结果整合到计算工作流中。
When to Use This Skill
何时使用该技能
This skill should be used when working with AI-predicted protein structures in scenarios such as:
- Retrieving protein structure predictions by UniProt ID or protein name
- Downloading PDB/mmCIF coordinate files for structural analysis
- Analyzing prediction confidence metrics (pLDDT, PAE) to assess reliability
- Accessing bulk proteome datasets via Google Cloud Platform
- Comparing predicted structures with experimental data
- Performing structure-based drug discovery or protein engineering
- Building structural models for proteins lacking experimental structures
- Integrating AlphaFold predictions into computational pipelines
当您在以下场景中处理AI预测的蛋白质结构时,应使用此技能:
- 通过UniProt ID或蛋白质名称检索蛋白质结构预测结果
- 下载PDB/mmCIF坐标文件以进行结构分析
- 分析预测置信度指标(pLDDT、PAE)以评估可靠性
- 通过Google Cloud Platform访问批量蛋白质组数据集
- 比较预测结构与实验数据
- 进行基于结构的药物研发或蛋白质工程
- 为缺乏实验结构的蛋白质构建结构模型
- 将AlphaFold预测结果整合到计算流水线中
Core Capabilities
核心功能
1. Searching and Retrieving Predictions
1. 搜索与检索预测结果
Using Biopython (Recommended):
The Biopython library provides the simplest interface for retrieving AlphaFold structures:
python
from Bio.PDB import alphafold_db使用Biopython(推荐):
Biopython库为检索AlphaFold结构提供了最简单的接口:
python
from Bio.PDB import alphafold_dbGet all predictions for a UniProt accession
获取某个UniProt登录号的所有预测结果
predictions = list(alphafold_db.get_predictions("P00520"))
predictions = list(alphafold_db.get_predictions("P00520"))
Download structure file (mmCIF format)
下载结构文件(mmCIF格式)
for prediction in predictions:
cif_file = alphafold_db.download_cif_for(prediction, directory="./structures")
print(f"Downloaded: {cif_file}")
for prediction in predictions:
cif_file = alphafold_db.download_cif_for(prediction, directory="./structures")
print(f"已下载: {cif_file}")
Get Structure objects directly
直接获取Structure对象
from Bio.PDB import MMCIFParser
structures = list(alphafold_db.get_structural_models_for("P00520"))
**Direct API Access:**
Query predictions using REST endpoints:
```python
import requestsfrom Bio.PDB import MMCIFParser
structures = list(alphafold_db.get_structural_models_for("P00520"))
**直接API访问:**
使用REST端点查询预测结果:
```python
import requestsGet prediction metadata for a UniProt accession
获取某个UniProt登录号的预测元数据
uniprot_id = "P00520"
api_url = f"https://alphafold.ebi.ac.uk/api/prediction/{uniprot_id}"
response = requests.get(api_url)
prediction_data = response.json()
uniprot_id = "P00520"
api_url = f"https://alphafold.ebi.ac.uk/api/prediction/{uniprot_id}"
response = requests.get(api_url)
prediction_data = response.json()
Extract AlphaFold ID
提取AlphaFold ID
alphafold_id = prediction_data[0]['entryId']
print(f"AlphaFold ID: {alphafold_id}")
**Using UniProt to Find Accessions:**
Search UniProt to find protein accessions first:
```python
import urllib.parse, urllib.request
def get_uniprot_ids(query, query_type='PDB_ID'):
"""Query UniProt to get accession IDs"""
url = 'https://www.uniprot.org/uploadlists/'
params = {
'from': query_type,
'to': 'ACC',
'format': 'txt',
'query': query
}
data = urllib.parse.urlencode(params).encode('ascii')
with urllib.request.urlopen(urllib.request.Request(url, data)) as response:
return response.read().decode('utf-8').splitlines()alphafold_id = prediction_data[0]['entryId']
print(f"AlphaFold ID: {alphafold_id}")
**通过UniProt查找登录号:**
先搜索UniProt以找到蛋白质登录号:
```python
import urllib.parse, urllib.request
def get_uniprot_ids(query, query_type='PDB_ID'):
"""查询UniProt以获取登录号ID"""
url = 'https://www.uniprot.org/uploadlists/'
params = {
'from': query_type,
'to': 'ACC',
'format': 'txt',
'query': query
}
data = urllib.parse.urlencode(params).encode('ascii')
with urllib.request.urlopen(urllib.request.Request(url, data)) as response:
return response.read().decode('utf-8').splitlines()Example: Find UniProt IDs for a protein name
示例:根据蛋白质名称查找UniProt ID
protein_ids = get_uniprot_ids("hemoglobin", query_type="GENE_NAME")
undefinedprotein_ids = get_uniprot_ids("hemoglobin", query_type="GENE_NAME")
undefined2. Downloading Structure Files
2. 下载结构文件
AlphaFold provides multiple file formats for each prediction:
File Types Available:
- Model coordinates (): Atomic coordinates in mmCIF/PDBx format
model_v4.cif - Confidence scores (): Per-residue pLDDT scores (0-100)
confidence_v4.json - Predicted Aligned Error (): PAE matrix for residue pair confidence
predicted_aligned_error_v4.json
Download URLs:
python
import requests
alphafold_id = "AF-P00520-F1"
version = "v4"AlphaFold为每个预测结果提供多种文件格式:
可用文件类型:
- 模型坐标 ():mmCIF/PDBx格式的原子坐标
model_v4.cif - 置信度分数 ():每个残基的pLDDT分数(0-100)
confidence_v4.json - 预测对齐误差 ():残基对置信度的PAE矩阵
predicted_aligned_error_v4.json
下载链接:
python
import requests
alphafold_id = "AF-P00520-F1"
version = "v4"Model coordinates (mmCIF)
模型坐标(mmCIF格式)
model_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-model_{version}.cif"
response = requests.get(model_url)
with open(f"{alphafold_id}.cif", "w") as f:
f.write(response.text)
model_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-model_{version}.cif"
response = requests.get(model_url)
with open(f"{alphafold_id}.cif", "w") as f:
f.write(response.text)
Confidence scores (JSON)
置信度分数(JSON格式)
confidence_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-confidence_{version}.json"
response = requests.get(confidence_url)
confidence_data = response.json()
confidence_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-confidence_{version}.json"
response = requests.get(confidence_url)
confidence_data = response.json()
Predicted Aligned Error (JSON)
预测对齐误差(JSON格式)
pae_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-predicted_aligned_error_{version}.json"
response = requests.get(pae_url)
pae_data = response.json()
**PDB Format (Alternative):**
```pythonpae_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-predicted_aligned_error_{version}.json"
response = requests.get(pae_url)
pae_data = response.json()
**PDB格式(替代方案):**
```pythonDownload as PDB format instead of mmCIF
下载PDB格式而非mmCIF格式
pdb_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-model_{version}.pdb"
response = requests.get(pdb_url)
with open(f"{alphafold_id}.pdb", "wb") as f:
f.write(response.content)
undefinedpdb_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-model_{version}.pdb"
response = requests.get(pdb_url)
with open(f"{alphafold_id}.pdb", "wb") as f:
f.write(response.content)
undefined3. Working with Confidence Metrics
3. 处理置信度指标
AlphaFold predictions include confidence estimates critical for interpretation:
pLDDT (per-residue confidence):
python
import json
import requestsAlphaFold的预测结果包含对解读至关重要的置信度估计:
pLDDT(每个残基的置信度):
python
import json
import requestsLoad confidence scores
加载置信度分数
alphafold_id = "AF-P00520-F1"
confidence_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-confidence_v4.json"
confidence = requests.get(confidence_url).json()
alphafold_id = "AF-P00520-F1"
confidence_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-confidence_v4.json"
confidence = requests.get(confidence_url).json()
Extract pLDDT scores
提取pLDDT分数
plddt_scores = confidence['confidenceScore']
plddt_scores = confidence['confidenceScore']
Interpret confidence levels
解读置信度等级
pLDDT > 90: Very high confidence
pLDDT > 90:极高置信度
pLDDT 70-90: High confidence
pLDDT 70-90:高置信度
pLDDT 50-70: Low confidence
pLDDT 50-70:低置信度
pLDDT < 50: Very low confidence
pLDDT < 50:极低置信度
high_confidence_residues = [i for i, score in enumerate(plddt_scores) if score > 90]
print(f"High confidence residues: {len(high_confidence_residues)}/{len(plddt_scores)}")
**PAE (Predicted Aligned Error):**
PAE indicates confidence in relative domain positions:
```python
import numpy as np
import matplotlib.pyplot as plthigh_confidence_residues = [i for i, score in enumerate(plddt_scores) if score > 90]
print(f"高置信度残基数量: {len(high_confidence_residues)}/{len(plddt_scores)}")
**PAE(预测对齐误差):**
PAE表示对相对结构域位置的置信度:
```python
import numpy as np
import matplotlib.pyplot as pltLoad PAE matrix
加载PAE矩阵
pae_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-predicted_aligned_error_v4.json"
pae = requests.get(pae_url).json()
pae_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-predicted_aligned_error_v4.json"
pae = requests.get(pae_url).json()
Visualize PAE matrix
可视化PAE矩阵
pae_matrix = np.array(pae['distance'])
plt.figure(figsize=(10, 8))
plt.imshow(pae_matrix, cmap='viridis_r', vmin=0, vmax=30)
plt.colorbar(label='PAE (Å)')
plt.title(f'Predicted Aligned Error: {alphafold_id}')
plt.xlabel('Residue')
plt.ylabel('Residue')
plt.savefig(f'{alphafold_id}_pae.png', dpi=300, bbox_inches='tight')
pae_matrix = np.array(pae['distance'])
plt.figure(figsize=(10, 8))
plt.imshow(pae_matrix, cmap='viridis_r', vmin=0, vmax=30)
plt.colorbar(label='PAE (Å)')
plt.title(f'预测对齐误差: {alphafold_id}')
plt.xlabel('残基')
plt.ylabel('残基')
plt.savefig(f'{alphafold_id}_pae.png', dpi=300, bbox_inches='tight')
Low PAE values (<5 Å) indicate confident relative positioning
低PAE值(<5 Å)表示相对位置置信度高
High PAE values (>15 Å) suggest uncertain domain arrangements
高PAE值(>15 Å)表示结构域排列不确定
undefinedundefined4. Bulk Data Access via Google Cloud
4. 通过Google Cloud访问批量数据
For large-scale analyses, use Google Cloud datasets:
Google Cloud Storage:
bash
undefined对于大规模分析,使用Google Cloud数据集:
Google Cloud Storage:
bash
undefinedInstall gsutil
安装gsutil
uv pip install gsutil
uv pip install gsutil
List available data
列出可用数据
gsutil ls gs://public-datasets-deepmind-alphafold-v4/
gsutil ls gs://public-datasets-deepmind-alphafold-v4/
Download entire proteomes (by taxonomy ID)
下载完整蛋白质组(按分类学ID)
gsutil -m cp gs://public-datasets-deepmind-alphafold-v4/proteomes/proteome-tax_id-9606-*.tar .
gsutil -m cp gs://public-datasets-deepmind-alphafold-v4/proteomes/proteome-tax_id-9606-*.tar .
Download specific files
下载特定文件
gsutil cp gs://public-datasets-deepmind-alphafold-v4/accession_ids.csv .
**BigQuery Metadata Access:**
```python
from google.cloud import bigquerygsutil cp gs://public-datasets-deepmind-alphafold-v4/accession_ids.csv .
**BigQuery元数据访问:**
```python
from google.cloud import bigqueryInitialize client
初始化客户端
client = bigquery.Client()
client = bigquery.Client()
Query metadata
查询元数据
query = """
SELECT
entryId,
uniprotAccession,
organismScientificName,
globalMetricValue,
fractionPlddtVeryHigh
FROM
WHERE organismScientificName = 'Homo sapiens'
AND fractionPlddtVeryHigh > 0.8
LIMIT 100
"""
bigquery-public-data.deepmind_alphafold.metadataresults = client.query(query).to_dataframe()
print(f"Found {len(results)} high-confidence human proteins")
**Download by Species:**
```python
import subprocess
def download_proteome(taxonomy_id, output_dir="./proteomes"):
"""Download all AlphaFold predictions for a species"""
pattern = f"gs://public-datasets-deepmind-alphafold-v4/proteomes/proteome-tax_id-{taxonomy_id}-*_v4.tar"
cmd = f"gsutil -m cp {pattern} {output_dir}/"
subprocess.run(cmd, shell=True, check=True)query = """
SELECT
entryId,
uniprotAccession,
organismScientificName,
globalMetricValue,
fractionPlddtVeryHigh
FROM
WHERE organismScientificName = 'Homo sapiens'
AND fractionPlddtVeryHigh > 0.8
LIMIT 100
"""
bigquery-public-data.deepmind_alphafold.metadataresults = client.query(query).to_dataframe()
print(f"找到{len(results)}个高置信度人类蛋白质")
**按物种下载:**
```python
import subprocess
def download_proteome(taxonomy_id, output_dir="./proteomes"):
"""下载某个物种的所有AlphaFold预测结果"""
pattern = f"gs://public-datasets-deepmind-alphafold-v4/proteomes/proteome-tax_id-{taxonomy_id}-*_v4.tar"
cmd = f"gsutil -m cp {pattern} {output_dir}/"
subprocess.run(cmd, shell=True, check=True)Download E. coli proteome (tax ID: 83333)
下载大肠杆菌蛋白质组(分类ID:83333)
download_proteome(83333)
download_proteome(83333)
Download human proteome (tax ID: 9606)
下载人类蛋白质组(分类ID:9606)
download_proteome(9606)
undefineddownload_proteome(9606)
undefined5. Parsing and Analyzing Structures
5. 解析与分析结构
Work with downloaded AlphaFold structures using BioPython:
python
from Bio.PDB import MMCIFParser, PDBIO
import numpy as np使用BioPython处理下载的AlphaFold结构:
python
from Bio.PDB import MMCIFParser, PDBIO
import numpy as npParse mmCIF file
解析mmCIF文件
parser = MMCIFParser(QUIET=True)
structure = parser.get_structure("protein", "AF-P00520-F1-model_v4.cif")
parser = MMCIFParser(QUIET=True)
structure = parser.get_structure("protein", "AF-P00520-F1-model_v4.cif")
Extract coordinates
提取坐标
coords = []
for model in structure:
for chain in model:
for residue in chain:
if 'CA' in residue: # Alpha carbons only
coords.append(residue['CA'].get_coord())
coords = np.array(coords)
print(f"Structure has {len(coords)} residues")
coords = []
for model in structure:
for chain in model:
for residue in chain:
if 'CA' in residue: # 仅提取α碳
coords.append(residue['CA'].get_coord())
coords = np.array(coords)
print(f"结构包含{len(coords)}个残基")
Calculate distances
计算距离
from scipy.spatial.distance import pdist, squareform
distance_matrix = squareform(pdist(coords))
from scipy.spatial.distance import pdist, squareform
distance_matrix = squareform(pdist(coords))
Identify contacts (< 8 Å)
识别接触点(<8 Å)
contacts = np.where((distance_matrix > 0) & (distance_matrix < 8))
print(f"Number of contacts: {len(contacts[0]) // 2}")
**Extract B-factors (pLDDT values):**
AlphaFold stores pLDDT scores in the B-factor column:
```python
from Bio.PDB import MMCIFParser
parser = MMCIFParser(QUIET=True)
structure = parser.get_structure("protein", "AF-P00520-F1-model_v4.cif")contacts = np.where((distance_matrix > 0) & (distance_matrix < 8))
print(f"接触点数量: {len(contacts[0]) // 2}")
**提取B因子(pLDDT值):**
AlphaFold将pLDDT分数存储在B因子列中:
```python
from Bio.PDB import MMCIFParser
parser = MMCIFParser(QUIET=True)
structure = parser.get_structure("protein", "AF-P00520-F1-model_v4.cif")Extract pLDDT from B-factors
从B因子中提取pLDDT
plddt_scores = []
for model in structure:
for chain in model:
for residue in chain:
if 'CA' in residue:
plddt_scores.append(residue['CA'].get_bfactor())
plddt_scores = []
for model in structure:
for chain in model:
for residue in chain:
if 'CA' in residue:
plddt_scores.append(residue['CA'].get_bfactor())
Identify high-confidence regions
识别高置信度区域
high_conf_regions = [(i, score) for i, score in enumerate(plddt_scores, 1) if score > 90]
print(f"High confidence residues: {len(high_conf_regions)}")
undefinedhigh_conf_regions = [(i, score) for i, score in enumerate(plddt_scores, 1) if score > 90]
print(f"高置信度残基数量: {len(high_conf_regions)}")
undefined6. Batch Processing Multiple Proteins
6. 批量处理多个蛋白质
Process multiple predictions efficiently:
python
from Bio.PDB import alphafold_db
import pandas as pd
uniprot_ids = ["P00520", "P12931", "P04637"] # Multiple proteins
results = []
for uniprot_id in uniprot_ids:
try:
# Get prediction
predictions = list(alphafold_db.get_predictions(uniprot_id))
if predictions:
pred = predictions[0]
# Download structure
cif_file = alphafold_db.download_cif_for(pred, directory="./batch_structures")
# Get confidence data
alphafold_id = pred['entryId']
conf_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-confidence_v4.json"
conf_data = requests.get(conf_url).json()
# Calculate statistics
plddt_scores = conf_data['confidenceScore']
avg_plddt = np.mean(plddt_scores)
high_conf_fraction = sum(1 for s in plddt_scores if s > 90) / len(plddt_scores)
results.append({
'uniprot_id': uniprot_id,
'alphafold_id': alphafold_id,
'avg_plddt': avg_plddt,
'high_conf_fraction': high_conf_fraction,
'length': len(plddt_scores)
})
except Exception as e:
print(f"Error processing {uniprot_id}: {e}")高效处理多个预测结果:
python
from Bio.PDB import alphafold_db
import pandas as pd
uniprot_ids = ["P00520", "P12931", "P04637"] # 多个蛋白质
results = []
for uniprot_id in uniprot_ids:
try:
# 获取预测结果
predictions = list(alphafold_db.get_predictions(uniprot_id))
if predictions:
pred = predictions[0]
# 下载结构
cif_file = alphafold_db.download_cif_for(pred, directory="./batch_structures")
# 获取置信度数据
alphafold_id = pred['entryId']
conf_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-confidence_v4.json"
conf_data = requests.get(conf_url).json()
# 计算统计数据
plddt_scores = conf_data['confidenceScore']
avg_plddt = np.mean(plddt_scores)
high_conf_fraction = sum(1 for s in plddt_scores if s > 90) / len(plddt_scores)
results.append({
'uniprot_id': uniprot_id,
'alphafold_id': alphafold_id,
'avg_plddt': avg_plddt,
'high_conf_fraction': high_conf_fraction,
'length': len(plddt_scores)
})
except Exception as e:
print(f"处理{uniprot_id}时出错: {e}")Create summary DataFrame
创建汇总DataFrame
df = pd.DataFrame(results)
print(df)
undefineddf = pd.DataFrame(results)
print(df)
undefinedInstallation and Setup
安装与设置
Python Libraries
Python库
bash
undefinedbash
undefinedInstall Biopython for structure access
安装Biopython以访问结构
uv pip install biopython
uv pip install biopython
Install requests for API access
安装requests以访问API
uv pip install requests
uv pip install requests
For visualization and analysis
用于可视化与分析
uv pip install numpy matplotlib pandas scipy
uv pip install numpy matplotlib pandas scipy
For Google Cloud access (optional)
用于Google Cloud访问(可选)
uv pip install google-cloud-bigquery gsutil
undefineduv pip install google-cloud-bigquery gsutil
undefined3D-Beacons API Alternative
3D-Beacons API替代方案
AlphaFold can also be accessed via the 3D-Beacons federated API:
python
import requests也可通过3D-Beacons联邦API访问AlphaFold:
python
import requestsQuery via 3D-Beacons
通过3D-Beacons查询
uniprot_id = "P00520"
url = f"https://www.ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/api/uniprot/summary/{uniprot_id}.json"
response = requests.get(url)
data = response.json()
uniprot_id = "P00520"
url = f"https://www.ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/api/uniprot/summary/{uniprot_id}.json"
response = requests.get(url)
data = response.json()
Filter for AlphaFold structures
筛选AlphaFold结构
af_structures = [s for s in data['structures'] if s['provider'] == 'AlphaFold DB']
undefinedaf_structures = [s for s in data['structures'] if s['provider'] == 'AlphaFold DB']
undefinedCommon Use Cases
常见用例
Structural Proteomics
结构蛋白质组学
- Download complete proteome predictions for analysis
- Identify high-confidence structural regions across proteins
- Compare predicted structures with experimental data
- Build structural models for protein families
- 下载完整蛋白质组预测结果进行分析
- 识别跨蛋白质的高置信度结构区域
- 比较预测结构与实验数据
- 为蛋白质家族构建结构模型
Drug Discovery
药物研发
- Retrieve target protein structures for docking studies
- Analyze binding site conformations
- Identify druggable pockets in predicted structures
- Compare structures across homologs
- 检索目标蛋白质结构用于对接研究
- 分析结合位点构象
- 识别预测结构中的可成药口袋
- 比较同源蛋白的结构
Protein Engineering
蛋白质工程
- Identify stable/unstable regions using pLDDT
- Design mutations in high-confidence regions
- Analyze domain architectures using PAE
- Model protein variants and mutations
- 使用pLDDT识别稳定/不稳定区域
- 在高置信度区域设计突变
- 使用PAE分析结构域架构
- 建模蛋白质变体与突变
Evolutionary Studies
进化研究
- Compare ortholog structures across species
- Analyze conservation of structural features
- Study domain evolution patterns
- Identify functionally important regions
- 比较跨物种的直系同源蛋白结构
- 分析结构特征的保守性
- 研究结构域进化模式
- 识别功能重要区域
Key Concepts
关键概念
UniProt Accession: Primary identifier for proteins (e.g., "P00520"). Required for querying AlphaFold DB.
AlphaFold ID: Internal identifier format: (e.g., "AF-P00520-F1").
AF-[UniProt accession]-F[fragment number]pLDDT (predicted Local Distance Difference Test): Per-residue confidence metric (0-100). Higher values indicate more confident predictions.
PAE (Predicted Aligned Error): Matrix indicating confidence in relative positions between residue pairs. Low values (<5 Å) suggest confident relative positioning.
Database Version: Current version is v4. File URLs include version suffix (e.g., ).
model_v4.cifFragment Number: Large proteins may be split into fragments. Fragment number appears in AlphaFold ID (e.g., F1, F2).
**UniProt登录号:**蛋白质的主要标识符(例如:"P00520"),查询AlphaFold DB时必需。
**AlphaFold ID:**内部标识符格式:(例如:"AF-P00520-F1")。
AF-[UniProt登录号]-F[片段编号]**pLDDT(预测局部距离差异测试):**每个残基的置信度指标(0-100),值越高表示预测置信度越高。
**PAE(预测对齐误差):**表示残基对相对位置置信度的矩阵,低数值(<5 Å)表示相对位置置信度高。
**数据库版本:**当前版本为v4,文件链接包含版本后缀(例如:)。
model_v4.cif**片段编号:**大型蛋白质可能会被拆分为多个片段,片段编号会出现在AlphaFold ID中(例如:F1、F2)。
Confidence Interpretation Guidelines
置信度解读指南
pLDDT Thresholds:
- >90: Very high confidence - suitable for detailed analysis
- 70-90: High confidence - generally reliable backbone structure
- 50-70: Low confidence - use with caution, flexible regions
- <50: Very low confidence - likely disordered or unreliable
PAE Guidelines:
- <5 Å: Confident relative positioning of domains
- 5-10 Å: Moderate confidence in arrangement
- >15 Å: Uncertain relative positions, domains may be mobile
pLDDT阈值:
- >90:极高置信度 - 适合详细分析
- 70-90:高置信度 - 主链结构通常可靠
- 50-70:低置信度 - 谨慎使用,多为柔性区域
- <50:极低置信度 - 可能无序或不可靠
PAE指南:
- <5 Å:结构域相对位置置信度高
- 5-10 Å:排列置信度中等
- >15 Å:相对位置不确定,结构域可能可移动
Resources
资源
references/api_reference.md
references/api_reference.md
Comprehensive API documentation covering:
- Complete REST API endpoint specifications
- File format details and data schemas
- Google Cloud dataset structure and access patterns
- Advanced query examples and batch processing strategies
- Rate limiting, caching, and best practices
- Troubleshooting common issues
Consult this reference for detailed API information, bulk download strategies, or when working with large-scale datasets.
全面的API文档涵盖:
- 完整的REST API端点规范
- 文件格式细节与数据模式
- Google Cloud数据集结构与访问模式
- 高级查询示例与批量处理策略
- 请求速率限制、缓存与最佳实践
- 常见问题排查
如需详细API信息、批量下载策略或处理大规模数据集,请参考此文档。
Important Notes
重要说明
Data Usage and Attribution
数据使用与归因
- AlphaFold DB is freely available under CC-BY-4.0 license
- Cite: Jumper et al. (2021) Nature and Varadi et al. (2022) Nucleic Acids Research
- Predictions are computational models, not experimental structures
- Always assess confidence metrics before downstream analysis
- AlphaFold DB在CC-BY-4.0许可下免费提供
- 引用文献:Jumper et al. (2021) Nature 和 Varadi et al. (2022) Nucleic Acids Research
- 预测结果是计算模型,而非实验结构
- 在进行下游分析前务必评估置信度指标
Version Management
版本管理
- Current database version: v4 (as of 2024-2025)
- File URLs include version suffix (e.g., )
_v4.cif - Check for database updates regularly
- Older versions may be deprecated over time
- 当前数据库版本:v4(截至2024-2025年)
- 文件链接包含版本后缀(例如:)
_v4.cif - 定期检查数据库更新
- 旧版本可能会逐渐被弃用
Data Quality Considerations
数据质量考量
- High pLDDT doesn't guarantee functional accuracy
- Low confidence regions may be disordered in vivo
- PAE indicates relative domain confidence, not absolute positioning
- Predictions lack ligands, post-translational modifications, and cofactors
- Multi-chain complexes are not predicted (single chains only)
- 高pLDDT值不保证功能准确性
- 低置信度区域在体内可能是无序的
- PAE表示结构域相对置信度,而非绝对位置
- 预测结果不包含配体、翻译后修饰和辅因子
- 不预测多链复合物(仅单链)
Performance Tips
性能技巧
- Use Biopython for simple single-protein access
- Use Google Cloud for bulk downloads (much faster than individual files)
- Cache downloaded files locally to avoid repeated downloads
- BigQuery free tier: 1 TB processed data per month
- Consider network bandwidth for large-scale downloads
- 简单的单蛋白质访问使用Biopython
- 批量下载使用Google Cloud(比单个文件下载快得多)
- 本地缓存已下载文件以避免重复下载
- BigQuery免费层级:每月可处理1TB数据
- 大规模下载时考虑网络带宽
Additional Resources
其他资源
- AlphaFold DB Website: https://alphafold.ebi.ac.uk/
- API Documentation: https://alphafold.ebi.ac.uk/api-docs
- Google Cloud Dataset: https://cloud.google.com/blog/products/ai-machine-learning/alphafold-protein-structure-database
- 3D-Beacons API: https://www.ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/
- AlphaFold Papers:
- Nature (2021): https://doi.org/10.1038/s41586-021-03819-2
- Nucleic Acids Research (2024): https://doi.org/10.1093/nar/gkad1011
- Biopython Documentation: https://biopython.org/docs/dev/api/Bio.PDB.alphafold_db.html
- GitHub Repository: https://github.com/google-deepmind/alphafold
- AlphaFold DB官网: https://alphafold.ebi.ac.uk/
- API文档: https://alphafold.ebi.ac.uk/api-docs
- Google Cloud数据集: https://cloud.google.com/blog/products/ai-machine-learning/alphafold-protein-structure-database
- 3D-Beacons API: https://www.ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/
- AlphaFold论文:
- Nature (2021):https://doi.org/10.1038/s41586-021-03819-2
- Nucleic Acids Research (2024):https://doi.org/10.1093/nar/gkad1011
- Biopython文档: https://biopython.org/docs/dev/api/Bio.PDB.alphafold_db.html
- GitHub仓库: https://github.com/google-deepmind/alphafold