foldseek-structural-search
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePrerequisites
前提条件
- : Read the
uvskill and follow its Setup instructions to ensureuvis installed and on PATH.uv - User Notification: If LICENSE_NOTIFICATION.txt does not already exist in this skill directory then (1) prominently notify the user to check the terms at https://search.foldseek.com/search and https://github.com/steineggerlab/foldseek, then (2) create the file recording the notification text and timestamp.
- :阅读
uv技能文档并遵循其设置说明,确保uv已安装且在PATH路径中。uv - 用户通知:若此技能目录中不存在LICENSE_NOTIFICATION.txt文件,则需(1)显著通知用户查看https://search.foldseek.com/search和 https://github.com/steineggerlab/foldseek上的条款,然后(2)创建该文件,记录通知文本和时间戳。
Goal
目标
Submit a user-provided 3D protein structure file (, , or )
to the Foldseek web server API to find structurally similar proteins. Report the
top structural hits, interpret key alignment metrics, summarize the inferred
protein functions, save the Markdown-formatted table to a file, and save
the full detailed results to a local JSON file.
.cif.mmcif.pdb.md将用户提供的蛋白质3D结构文件(.cif、.mmcif或.pdb格式)提交至Foldseek服务器API,查找结构相似的蛋白质。报告排名靠前的结构匹配结果,解读关键比对指标,总结推断出的蛋白质功能,将Markdown格式的结果表保存为.md文件,并将完整的详细结果保存至本地JSON文件。
Core Rules
核心规则
- File Requirement: This tool absolutely cannot search by sequence, name,
or accession ID. It strictly requires a ,
.pdb, or.ciffile path..mmcif - Strict Validation: Never bypass the input validation or the database allowlist check.
- Do Not Parse the JSON: Rely entirely on the generated file for your immediate summary. The JSON is saved purely for subsequent, specialized tool use.
.md - No Raw Parsing: Do not attempt to parse or read the raw 3D coordinates yourself; always pass the file to the script.
- Notification: If this skill is used, ensure this is mentioned in the output.
- 文件要求:此工具绝对不能通过序列、名称或登录号进行搜索,严格要求提供.pdb、.cif或.mmcif格式的文件路径。
- 严格验证:绝不能绕过输入验证或数据库白名单检查。
- 请勿解析JSON:直接使用生成的.md文件进行即时总结。JSON文件仅用于后续的专业工具处理。
- 禁止原始解析:请勿尝试自行解析或读取原始3D坐标,务必将文件传递给脚本处理。
- 通知要求:若使用此技能,需在输出中明确提及这一点。
Instructions
操作步骤
- Strict Input Validation: Verify that the user has explicitly provided a
valid path to a ,
.cif, or.mmciffile in their workspace..pdb- If the user provided a protein name, an amino acid sequence, or an accession ID (e.g., a UniProt ID) but NO downloaded structure file, halt immediately. Do not run the script.
- Inform the user that Foldseek requires a physical 3D coordinate file, and suggest downloading the structure first (e.g., using the AlphaFold fetch tool).
- Database Validation: Check if the user requested specific databases to
search.
- Allowed List: ,
afdb50,afdb-swissprot,pdb100,BFVD,mgnify_esm30,cath50,gmgcl_id,bfmd.afdb-proteome - If the user requests a database NOT on this list, halt immediately. Do not run the script. Inform the user that the database is unsupported and provide them with the allowed list.
- Allowed List:
- Generate File Names: Generate descriptive output file names for both the
JSON data and the Markdown table based on the input file (e.g.,
and
proteinA_foldseek_results.json).proteinA_foldseek_results.md - Execute the python script based on the user's request, redirecting the
standard output into your generated file:
.md- Default (No databases specified):
uv run scripts/search.py <path-to-file> -o <generated-filename.json> > <generated-filename.md> - Custom (Valid databases specified):
uv run scripts/search.py <path-to-file> -o <generated-filename.json> --databases <db1,db2,db3> > <generated-filename.md>
- Default (No databases specified):
- The script will query the databases, save the full JSON payload, and write a
Markdown-formatted table to your specified file.
.md - Read the Results: Open and read the newly generated file carefully to view the Markdown table.
.md - Interpret the Metrics: Summarize the top 3 to 5 structural matches that
have meaningfull annotations for the user. When reporting, assess the match
quality using these specific fields:
- Prob (Probability): Values approaching 1.0 (100%) indicate extreme confidence that the fold is a true structural homologue.
- Q-Cov (Query Coverage): High percentages mean the match covers the majority of the query protein's overall shape, rather than just a small local motif.
- E-value & Seq Identity: Use these to provide additional evolutionary context.
- Perform Functional Analysis: Analyze the text descriptions embedded
within the column for the reported matches.
Target ID- Explicitly report the specific protein names/functions of the top structural homologues.
- Provide a synthesized overview summarizing the entire variety of different functions, domains, or protein families found across the whole list of homologues (e.g., "Most hits are portal proteins, but there is also a distinct cluster of viral capsid matches...").
- Explicitly inform the user of both newly created files (and
.json) and their locations so they can be seamlessly used in subsequent analysis steps..md
- 严格输入验证:确认用户已明确提供工作区中有效的.cif、.mmcif或.pdb文件路径。
- 若用户提供的是蛋白质名称、氨基酸序列或登录号(如UniProt ID)但未提供已下载的结构文件,立即停止操作,请勿运行脚本。
- 告知用户Foldseek需要物理3D坐标文件,并建议先下载结构(例如使用AlphaFold获取工具)。
- 数据库验证:检查用户是否指定了要搜索的特定数据库。
- 允许列表:、
afdb50、afdb-swissprot、pdb100、BFVD、mgnify_esm30、cath50、gmgcl_id、bfmd。afdb-proteome - 若用户请求的数据库不在此列表中,立即停止操作,请勿运行脚本。告知用户该数据库不受支持,并提供允许列表。
- 允许列表:
- 生成文件名:根据输入文件为JSON数据和Markdown表生成描述性的输出文件名(例如和
proteinA_foldseek_results.json)。proteinA_foldseek_results.md - 根据用户请求执行Python脚本,将标准输出重定向到生成的.md文件:
- 默认(未指定数据库):
uv run scripts/search.py <path-to-file> -o <generated-filename.json> > <generated-filename.md> - 自定义(指定有效数据库):
uv run scripts/search.py <path-to-file> -o <generated-filename.json> --databases <db1,db2,db3> > <generated-filename.md>
- 默认(未指定数据库):
- 脚本将查询数据库,保存完整的JSON负载,并将Markdown格式的表格写入指定的.md文件。
- 读取结果:仔细打开并读取新生成的.md文件,查看Markdown表格。
- 解读指标:为用户总结排名前3至5个带有有意义注释的结构匹配结果。报告时,使用以下特定字段评估匹配质量:
- Prob(概率):接近1.0(100%)的值表明对该折叠为真实结构同源物的置信度极高。
- Q-Cov(查询覆盖率):百分比越高,说明匹配覆盖了查询蛋白质的大部分整体结构,而非仅小部分局部基序。
- E值与序列一致性:使用这些指标提供额外的进化背景信息。
- 功能分析:分析报告匹配结果中列包含的文本描述。
Target ID- 明确报告排名靠前的结构同源物的具体蛋白质名称/功能。
- 提供综合概述,总结所有同源物中发现的不同功能、结构域或蛋白质家族的多样性(例如:“大多数匹配结果是门户蛋白,但也有一组明显的病毒衣壳匹配结果……”)。
- 明确告知用户新创建的两个文件(.json和.md)及其位置,以便后续分析步骤中可以无缝使用。
* If the API returns an error or the file is missing, inform the user clearly
* 若API返回错误或文件缺失,请清晰告知用户并要求他们验证文件路径。
and ask them to verify the file path.
—