alphafold-database

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

AlphaFold Database

AlphaFold数据库

Programmatic access to DeepMind's AlphaFold Protein Structure Database (200M+ predicted structures).
通过编程方式访问DeepMind的AlphaFold蛋白质结构数据库(包含2亿+预测结构)。

Quick Reference

快速参考

python
undefined
python
undefined

Fetch structure via Biopython

Fetch structure via Biopython

from Bio.PDB import alphafold_db predictions = list(alphafold_db.get_predictions("P00520")) alphafold_db.download_cif_for(predictions[0], directory="./output")
from Bio.PDB import alphafold_db predictions = list(alphafold_db.get_predictions("P00520")) alphafold_db.download_cif_for(predictions[0], directory="./output")

Direct API call

Direct API call

import requests resp = requests.get("https://alphafold.ebi.ac.uk/api/prediction/P00520") entry_id = resp.json()[0]['entryId'] # AF-P00520-F1
import requests resp = requests.get("https://alphafold.ebi.ac.uk/api/prediction/P00520") entry_id = resp.json()[0]['entryId'] # AF-P00520-F1

Download structure file

Download structure file

When to Use

使用场景

  • Obtain 3D coordinates for proteins without experimental structures
  • Assess prediction quality via pLDDT and PAE metrics
  • Download structure files (mmCIF, PDB) for visualization or docking
  • Retrieve proteome-scale datasets for computational analysis
  • 获取无实验结构的蛋白质3D坐标
  • 通过pLDDT和PAE指标评估预测质量
  • 下载结构文件(mmCIF、PDB)用于可视化或对接
  • 检索蛋白质组规模的数据集以进行计算分析

Key Concepts

核心概念

TermDescription
UniProt AccessionProtein identifier (e.g.,
P00520
) used to query
AlphaFold IDFormat:
AF-{UniProt}-F{fragment}
(e.g.,
AF-P00520-F1
)
pLDDTPer-residue confidence (0-100); >90 = reliable, <50 = disordered
PAEPredicted Aligned Error; <5A = high confidence domain positions
See
references/confidence-scores.md
for detailed interpretation guidance.
术语描述
UniProt Accession用于查询的蛋白质标识符(例如:
P00520
AlphaFold ID格式:
AF-{UniProt}-F{fragment}
(例如:
AF-P00520-F1
pLDDT每个残基的置信度(0-100);>90表示可靠,<50表示无序
PAE预测对齐误差;<5Å表示结构域位置的置信度高
请参阅
references/confidence-scores.md
获取详细的解读指南。

File Types

文件类型

FileURL PatternContents
Coordinates
{id}-model_v4.cif
Atomic positions (mmCIF)
Confidence
{id}-confidence_v4.json
Per-residue pLDDT array
PAE Matrix
{id}-predicted_aligned_error_v4.json
Inter-residue error
Base URL:
https://alphafold.ebi.ac.uk/files/
文件类型URL格式内容
坐标文件
{id}-model_v4.cif
原子位置(mmCIF格式)
置信度文件
{id}-confidence_v4.json
每个残基的pLDDT数组
PAE矩阵文件
{id}-predicted_aligned_error_v4.json
残基间误差
基础URL:
https://alphafold.ebi.ac.uk/files/

Core Operations

核心操作

Fetch Structure Metadata

获取结构元数据

python
import requests
resp = requests.get(f"https://alphafold.ebi.ac.uk/api/prediction/{uniprot_id}")
metadata = resp.json()[0]
af_id = metadata['entryId']
python
import requests
resp = requests.get(f"https://alphafold.ebi.ac.uk/api/prediction/{uniprot_id}")
metadata = resp.json()[0]
af_id = metadata['entryId']

Download All Files

下载所有文件

Use
scripts/alphafold_utils.py
:
python
from scripts.alphafold_utils import download_alphafold_files
paths = download_alphafold_files("AF-P04637-F1", output_dir="./data")
使用
scripts/alphafold_utils.py
python
from scripts.alphafold_utils import download_alphafold_files
paths = download_alphafold_files("AF-P04637-F1", output_dir="./data")

Analyze Confidence

分析置信度

python
from scripts.alphafold_utils import get_plddt_scores
stats = get_plddt_scores("AF-P04637-F1")
print(f"Average pLDDT: {stats['mean']:.1f}")
python
from scripts.alphafold_utils import get_plddt_scores
stats = get_plddt_scores("AF-P04637-F1")
print(f"Average pLDDT: {stats['mean']:.1f}")

Bulk Proteome Access

批量蛋白质组访问

bash
undefined
bash
undefined

Google Cloud Storage

Google Cloud Storage

gsutil ls gs://public-datasets-deepmind-alphafold-v4/ gsutil -m cp "gs://public-datasets-deepmind-alphafold-v4/proteomes/proteome-tax_id-9606-*.tar" ./

See `references/bulk-access.md` for BigQuery queries and batch processing.
gsutil ls gs://public-datasets-deepmind-alphafold-v4/ gsutil -m cp "gs://public-datasets-deepmind-alphafold-v4/proteomes/proteome-tax_id-9606-*.tar" ./

请参阅`references/bulk-access.md`获取BigQuery查询和批量处理相关内容。

Caveats

注意事项

  • Predictions, not experiments: Verify critical findings experimentally
  • Confidence matters: Always check pLDDT before using regions
  • Single chains only: No multimers or complexes
  • No ligands: Missing cofactors, ions, PTMs
  • 仅为预测结果,非实验数据:关键发现需通过实验验证
  • 置信度至关重要:使用区域前务必检查pLDDT
  • 仅支持单链:不包含多聚体或复合物
  • 无配体信息:缺少辅因子、离子、翻译后修饰(PTMs)

Setup

环境搭建

bash
pip install biopython requests numpy matplotlib pandas scipy
bash
pip install biopython requests numpy matplotlib pandas scipy

Optional: pip install google-cloud-bigquery gsutil

Optional: pip install google-cloud-bigquery gsutil

undefined
undefined

Links

相关链接