string-database

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

STRING Database Skill

STRING数据库技能

This skill allows you to query the STRING database programmatically using a bundled Python CLI wrapper.
本技能允许你通过内置的Python CLI包装器以编程方式查询STRING数据库。

Prerequisites

前置条件

  1. uv
    : Read the
    uv
    skill and follow its Setup instructions to ensure
    uv
    is installed and on PATH.
  2. User Notification: If LICENSE_NOTIFICATION.txt does not already exist in this skill directory then (1) prominently notify the user to check the terms at https://string-db.org/cgi/access, then (2) create the file recording the notification text and timestamp.
  1. uv
    :阅读
    uv
    技能文档并按照其安装说明操作,确保
    uv
    已安装且添加至系统PATH中。
  2. 用户通知:若本技能目录中不存在LICENSE_NOTIFICATION.txt文件,则需(1)显著提醒用户查看https://string-db.org/cgi/access上的条款,(2)创建该文件并记录通知内容及时间戳。

Core Rules

核心规则

  1. MANDATORY: Ask for Species First: The STRING API requires NCBI Taxon IDs. You MUST NOT guess or assume a species. If the user does not explicitly state a species or Taxon ID, you MUST stop and ask: "Which species are you interested in? I need the NCBI Taxon ID to proceed." Even for well-known proteins like TP53, BRCA1, or MDM2 that are commonly associated with human studies, you MUST still ask — do not default to Human.
  2. Never print output to stdout: The
    --output <file.tsv>
    is required. Never read large outputs into context. Instead use jq, python or file operations (
    grep
    ,
    head
    ) to process large output.
  3. Map Identifiers first: If you only have common gene names (e.g., 'TP53'), map them to STRING IDs first as this guarantees much faster server responses. Use the
    map
    command for this.
  4. Notification: If this skill is used, ensure this is mentioned in the output.
  1. 强制要求:首先询问物种信息:STRING API需要NCBI分类单元ID(NCBI Taxon IDs)。禁止猜测或默认物种。如果用户未明确说明物种或分类单元ID,必须暂停操作并询问:“你感兴趣的是哪个物种?我需要NCBI分类单元ID才能继续。”即使是TP53、BRCA1或MDM2这类常与人类研究相关的知名蛋白质,也必须询问,不得默认设置为人类。
  2. 切勿向标准输出(stdout)打印结果:必须使用
    --output <file.tsv>
    参数指定输出文件。切勿将大型输出内容读入上下文,应使用jq、Python或文件操作(如
    grep
    head
    )处理大型输出。
  3. 优先映射标识符:如果你只有通用基因名称(如'TP53'),请先将其映射为STRING ID,这能确保服务器响应速度大幅提升。可使用
    map
    命令完成此操作。
  4. 通知要求:若使用了本技能,需确保在输出中提及这一点。

Tool Execution

工具执行

The CLI is at
scripts/string_cli.py
and should be run using
uv run
:
bash
uv run scripts/string_cli.py <command> [options] --output /tmp/out.tsv
CLI工具位于
scripts/string_cli.py
,需使用
uv run
命令运行:
bash
uv run scripts/string_cli.py <command> [options] --output /tmp/out.tsv

Feature Domains (Progressive Disclosure)

功能领域(渐进式披露)

Read the following reference files based on the user's request:
  • Mapping Identifiers - Map common protein names to STRING IDs.
  • Interactions & Network - Find interacting proteins, network topologies, mediators, homology, and visual network images.
  • Enrichment & Functional Annotations - Analyze pathway enrichment (GO, KEGG, Pfam), PPI significance, or find all proteins associated with a specific term (e.g. Melanoma).
  • Values/Ranks Enrichment - Submit full experimental datasets (e.g., logFC, p-values) for rank-based enrichment analysis using the async background API.
To begin, read the reference file most appropriate to the current task to discover the correct CLI command.
根据用户请求阅读以下参考文件:
  • 标识符映射 - 将通用蛋白质名称映射为STRING ID。
  • 相互作用与网络 - 查找相互作用的蛋白质、网络拓扑结构、介导因子、同源性及可视化网络图像。
  • 富集与功能注释 - 分析通路富集(GO、KEGG、Pfam)、蛋白质-蛋白质相互作用(PPI)显著性,或查找与特定术语(如黑色素瘤)相关的所有蛋白质。
  • 数值/排名富集 - 提交完整实验数据集(如logFC、p值),通过异步后台API进行基于排名的富集分析。
开始操作前,请阅读与当前任务最匹配的参考文件,以找到正确的CLI命令。