alphafold-database-fetch-and-analyze
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAlphaFold Database: Fetch and Analyze
AlphaFold数据库:获取与分析
Prerequisites
前提条件
- : Read the
uvskill and follow its Setup instructions to ensureuvis installed and on PATH.uv - User Notification: If LICENSE_NOTIFICATION.txt does not already exist in this skill directory then (1) prominently notify the user to check the terms at https://alphafold.ebi.ac.uk/, then (2) create the file recording the notification text and timestamp.
- :阅读
uv技能文档并按照其设置说明操作,确保uv已安装且在PATH路径中。uv - 用户通知:如果本技能目录中不存在LICENSE_NOTIFICATION.txt文件,则(1) 显著提醒用户查看https://alphafold.ebi.ac.uk/上的条款,然后(2) 创建该文件并记录通知文本和时间戳。
Overview
概述
Downloads AlphaFold predicted structures (mmCIF) and Predicted Aligned Error
(PAE) matrices from the AlphaFold Database for a given UniProt ID, then performs
automated heuristic analysis on structural confidence (pLDDT), intrinsically
disordered regions, rigid domain boundaries, and inter-domain flexibility.
Do NOT use when:
- The user only has a protein name, gene name, or amino acid sequence (no UniProt ID) — ask them to look up the ID on UniProt.
- The user wants to search for structural homologs (use Foldseek).
- The user wants to run AlphaFold predictions on a custom sequence.
- The user needs experimental PDB structures (use RCSB PDB).
针对给定的UniProt ID,从AlphaFold数据库下载AlphaFold预测的结构文件(mmCIF格式)和预测对齐误差(PAE)矩阵,然后对结构置信度(pLDDT)、固有无序区域、刚性结构域边界以及结构域间灵活性进行自动化启发式分析。
禁止使用场景:
- 用户仅提供蛋白质名称、基因名称或氨基酸序列(无UniProt ID)——请让用户在UniProt上查询该ID。
- 用户想要搜索结构同源物(使用Foldseek工具)。
- 用户想要针对自定义序列运行AlphaFold预测。
- 用户需要实验性PDB结构(使用RCSB PDB工具)。
Core Rules
核心规则
- Use the Wrapper: ALWAYS execute the provided helper scripts to query the database rather than accessing the database directly. The scripts automatically enforce the required rate limit gracefully.
- Do not attempt to calculate domain boundaries or assess structural disorder yourself; always rely on the output provided by the script.
- If this skill is used, ensure this is mentioned in the output.
- 使用封装脚本:始终执行提供的辅助脚本查询数据库,而非直接访问数据库。脚本会自动优雅地执行所需的速率限制。
- 请勿自行计算结构域边界或评估结构无序性;始终依赖脚本输出的结果。
- 如果使用了本技能,请确保在输出中提及这一点。
Utility Scripts
实用脚本
1. Fetch Structure Files
Downloads the structure file, , and API
metadata JSON () for a UniProt ID. Handles fragment fallback for
very large proteins.
.cif_predicted_aligned_error.json-metadata.jsonExamples:
bash
uv run scripts/fetch_structure.py P00520 -o /path/to/output/
uv run scripts/fetch_structure.py P04637 -o /path/to/custom_results/Always specify with an absolute path or a path relative to the user's
project root, never a path relative to the skill directory.
-o2. Analyze pLDDT Confidence
Reads pLDDT confidence metrics from a saved AFDB metadata JSON file (produced by
) and prints a heuristic confidence assessment (structured,
disordered, mixed).
fetch_structure.pyExample:
bash
uv run scripts/analyze_plddt.py ./data/AF-P00520-F1-metadata.json3. Analyze PAE / Domain Boundaries
Reads a downloaded PAE JSON file and detects rigid domain boundaries using a
sliding-window PAE heuristic.
Example:
bash
uv run scripts/analyze_pae.py ./data/AF-P00520-F1-predicted_aligned_error_v6.json1. 获取结构文件
下载指定UniProt ID对应的结构文件、文件以及API元数据JSON文件()。针对超大蛋白质会处理片段回退逻辑。
.cif_predicted_aligned_error.json-metadata.json示例:
bash
uv run scripts/fetch_structure.py P00520 -o /path/to/output/
uv run scripts/fetch_structure.py P04637 -o /path/to/custom_results/请始终使用绝对路径或相对于用户项目根目录的路径指定参数,切勿使用相对于技能目录的路径。
-o2. 分析pLDDT置信度
从已保存的AFDB元数据JSON文件(由生成)中读取pLDDT置信度指标,并输出启发式置信度评估结果(结构化、无序化、混合)。
fetch_structure.py示例:
bash
uv run scripts/analyze_plddt.py ./data/AF-P00520-F1-metadata.json3. 分析PAE / 结构域边界
读取已下载的PAE JSON文件,使用滑动窗口PAE启发式方法检测刚性结构域边界。
示例:
bash
uv run scripts/analyze_pae.py ./data/AF-P00520-F1-predicted_aligned_error_v6.jsonInterpreting the Output
输出解读
The script prints analysis to stdout. Read it carefully and synthesize the
results for the user:
- Isoform / Large Protein Warning (MANDATORY): Check the script output for
any lines. If the script reports that no canonical entry was found and an isoform was used, or if the protein is very large (>2700 AAs), you MUST prominently relay this warning to the user. Do not omit this warning.
[!] WARNING - Synthesize the Structural Analysis: Combine the "pLDDT Conclusion" and the "PAE Structural Conclusion" into a single, cohesive overall summary. Describe the protein's overall folding confidence, the presence of disordered regions, and its rigid domain layout.
- Highlight the supporting metrics:
- Overall Global pLDDT and the breakdown of fraction confidence (especially Very Low vs. Very High).
- Domain Boundary Analysis (number of distinct global domains and their specific residue ranges).
- Explicit Disorder Warning: If the analysis concludes that the protein is highly intrinsically disordered (e.g., high fraction of <50 pLDDT or lack of rigid domains), issue a separate, prominent warning. Advise the user against proceeding with whole-protein downstream structural analysis (like Foldseek or docking). If small ordered domains exist amidst the disorder, advise the user to restrict any future analysis strictly to those specific residue boundaries.
- Remind the user that per-residue pLDDT is embedded in the B-factor column of the downloaded mmCIF file.
脚本会将分析结果打印到标准输出。请仔细阅读并为用户综合结果:
- 异构体/超大蛋白质警告(必填):检查脚本输出中的行。如果脚本报告未找到标准条目而使用了异构体,或者蛋白质非常大(>2700个氨基酸),你必须显著向用户传达此警告,不得省略。
[!] WARNING - 综合结构分析:将“pLDDT结论”和“PAE结构结论”合并为一个连贯的整体总结。描述蛋白质的整体折叠置信度、无序区域的存在情况以及刚性结构域布局。
- 突出支持性指标:
- 整体全局pLDDT值以及置信度比例细分(尤其是极低与极高部分)。
- 结构域边界分析(不同全局结构域的数量及其具体残基范围)。
- 明确的无序性警告:如果分析得出蛋白质高度固有无序的结论(例如,pLDDT<50的比例很高或缺乏刚性结构域),请单独发出显著警告。建议用户不要继续进行全蛋白质下游结构分析(如Foldseek或对接)。如果无序区域中存在小的有序结构域,请建议用户将未来的任何分析严格限制在这些特定残基范围内。
- 提醒用户:每个残基的pLDDT值已嵌入到下载的mmCIF文件的B因子列中。