human-protein-atlas-database

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Human Protein Atlas (HPA) Database Integration

人类蛋白质图谱(HPA)数据库集成

This skill provides semi-quantitative protein expression and spatial localisation data from the Human Protein Atlas (HPA). While RNA-seq (e.g., GTEx) tells us if a gene is being transcribed, HPA confirms if the protein product actually exists, where it is located within the cell (e.g. nucleus vs cytoplasm), and its concentration in systemic blood circulation. The data is based on Immunohistochemistry (IHC) across normal human tissues and cancer types.
本技能可从人类蛋白质图谱(HPA)获取半定量蛋白质表达和空间定位数据。RNA-seq(如GTEx)能告诉我们基因是否正在转录,而HPA则可以确认蛋白质产物是否真实存在、在细胞内的位置(如细胞核vs细胞质)以及其在全身血液循环中的浓度。这些数据基于正常人体组织和癌症类型的免疫组化(IHC)实验结果。

Prerequisites

前置条件

  1. uv
    : Read the
    uv
    skill and follow its Setup instructions to ensure
    uv
    is installed and on PATH.
  2. User Notification: If LICENSE_NOTIFICATION.txt does not already exist in this skill directory then (1) prominently notify the user to check the terms at https://www.proteinatlas.org/about/licence, then (2) create the file recording the notification text and timestamp.
  1. uv
    :阅读
    uv
    技能文档并按照其设置说明确保
    uv
    已安装且在PATH路径中。
  2. 用户通知:如果本技能目录中不存在LICENSE_NOTIFICATION.txt文件,则(1)显著通知用户查看https://www.proteinatlas.org/about/licence上的条款,然后(2)创建该文件并记录通知文本和时间戳。

When to Use

使用场景

Use this skill when you need to:
  • Map a gene symbol to its Ensembl ID for HPA queries.
  • Retrieve the semi-quantitative protein abundance in normal human tissues and cancer types based on IHC staining (High, Medium, Low, or Not Detected).
  • Find the specific organelles or subcellular structures where a protein has been localized (e.g., nucleoplasm, mitochondria).
  • Check the consistency/agreement between RNA-seq consensus and protein expression levels.
  • Search for genes based on specific protein expression criteria (e.g., "elevated in amygdala" or "secreted proteins").
Do NOT use when you need to:
  • Query eQTLs, pQTLs, or any variant-level associations. HPA provides wild-type expression data and knows nothing about QTLs.
  • Query gene expression in non-human species. HPA is strictly for human proteins.
  • Retrieve purely quantitative RNA expression without interest in the protein product (consider using the GTEx skill instead).
在以下场景中使用本技能:
  • 将基因符号映射为用于HPA查询的Ensembl ID。
  • 根据IHC染色结果获取正常人体组织和癌症类型中的半定量蛋白质丰度(高、中、低或未检测到)。
  • 查找蛋白质定位的特定细胞器或亚细胞结构(如核质、线粒体)。
  • 检查RNA-seq共识与蛋白质表达水平之间的一致性。
  • 根据特定蛋白质表达标准搜索基因(如“在杏仁核中高表达”或“分泌蛋白”)。
请勿在以下场景中使用:
  • 查询eQTL、pQTL或任何变异水平的关联。HPA仅提供野生型表达数据,不涉及QTL相关内容。
  • 查询非人类物种的基因表达。HPA仅针对人类蛋白质。
  • 仅需获取定量RNA表达而不关注蛋白质产物(可考虑使用GTEx技能)。

Command Selection Guide

命令选择指南

Pick the right command on the first try. Match the user's input to the correct subcommand below.
  • Map a gene symbol to Ensembl ID:
    resolve-ensembl-id
  • Get tissue protein expression levels:
    get-tissue-expression
  • Get subcellular location of a protein:
    get-subcellular-location
  • Get the full HPA metadata entry for a gene:
    get-atlas-entry
  • Search HPA for genes matching specific criteria:
    search-hpa
首次尝试就选择正确的命令。将用户输入与以下子命令匹配:
  • 将基因符号映射为Ensembl ID:
    resolve-ensembl-id
  • 获取组织蛋白质表达水平:
    get-tissue-expression
  • 获取蛋白质的亚细胞定位:
    get-subcellular-location
  • 获取基因的完整HPA元数据条目:
    get-atlas-entry
  • 根据特定条件搜索HPA中的基因:
    search-hpa

Quick Start

快速开始

bash
undefined
bash
undefined

Map the ERBB2 gene symbol to its Ensembl ID

将ERBB2基因符号映射为其Ensembl ID

uv run scripts/hpa_cli.py resolve-ensembl-id ERBB2 --output /tmp/erbb2_id.json
uv run scripts/hpa_cli.py resolve-ensembl-id ERBB2 --output /tmp/erbb2_id.json

Get subcellular location by Ensembl ID

通过Ensembl ID获取亚细胞定位

uv run scripts/hpa_cli.py get-subcellular-location ENSG00000141736 --output /tmp/erbb2_location.json

All subcommands write JSON to disk. Always save output in the `/tmp/` directory.
The default output file is `/tmp/hpa_output.json` if `--output` is not
specified.
uv run scripts/hpa_cli.py get-subcellular-location ENSG00000141736 --output /tmp/erbb2_location.json

所有子命令都会将JSON数据写入磁盘。请始终将输出保存到`/tmp/`目录。如果未指定`--output`参数,默认输出文件为`/tmp/hpa_output.json`。

Commands

命令详情

1.
resolve-ensembl-id
— Gene Symbol → Ensembl ID

1.
resolve-ensembl-id
— 基因符号 → Ensembl ID

Maps a common gene symbol (e.g., "TP53", "ERBB2") to its Ensembl gene ID. HPA endpoints are strictly Ensembl-based.
bash
uv run scripts/hpa_cli.py resolve-ensembl-id TP53 --output /tmp/tp53_id.json
Arguments:
  • gene_symbol
    (positional): The standard gene symbol (e.g., "TP53").
  • --output
    : Output file path (default:
    /tmp/hpa_output.json
    ).
将通用基因符号(如“TP53”、“ERBB2”)映射为其Ensembl基因ID。HPA的接口严格基于Ensembl ID。
bash
uv run scripts/hpa_cli.py resolve-ensembl-id TP53 --output /tmp/tp53_id.json
参数:
  • gene_symbol
    (位置参数):标准基因符号(如“TP53”)。
  • --output
    :输出文件路径(默认:
    /tmp/hpa_output.json
    )。

2.
get-tissue-expression
— Get Tissue Protein Levels

2.
get-tissue-expression
— 获取组织蛋白质水平

Returns a list of tissues and their corresponding protein expression levels (High, Medium, Low, or Not Detected) based on IHC staining.
bash
uv run scripts/hpa_cli.py get-tissue-expression ENSG00000130234 \
  --tissues "duodenum,thyroid gland" --output /tmp/tissue_expr.json
Arguments:
  • ensembl_id
    (positional): The Ensembl Gene ID.
  • --tissues
    : Comma-separated list of tissues to filter by (optional, defaults to all available tissues).
  • --output
    : Output file path (default:
    /tmp/hpa_output.json
    ).
返回基于IHC染色结果的组织列表及其对应的蛋白质表达水平(高、中、低或未检测到)。
bash
uv run scripts/hpa_cli.py get-tissue-expression ENSG00000130234 \
  --tissues "duodenum,thyroid gland" --output /tmp/tissue_expr.json
参数:
  • ensembl_id
    (位置参数):Ensembl基因ID。
  • --tissues
    :逗号分隔的组织过滤列表(可选,默认包含所有可用组织)。
  • --output
    :输出文件路径(默认:
    /tmp/hpa_output.json
    )。

3.
get-subcellular-location
— Get Subcellular Location

3.
get-subcellular-location
— 获取亚细胞定位

Retrieves the specific organelles or cellular structures where the protein has been localized.
bash
uv run scripts/hpa_cli.py get-subcellular-location ENSG00000141736 \
  --output /tmp/subcellular.json
Arguments:
  • ensembl_id
    (positional): The Ensembl Gene ID.
  • --output
    : Output file path.
获取蛋白质定位的特定细胞器或细胞结构信息。
bash
uv run scripts/hpa_cli.py get-subcellular-location ENSG00000141736 \
  --output /tmp/subcellular.json
参数:
  • ensembl_id
    (位置参数):Ensembl基因ID。
  • --output
    :输出文件路径。

4.
get-atlas-entry
— Get Full HPA Entry

4.
get-atlas-entry
— 获取完整HPA条目

Fetches the full metadata for a gene, including IHC scores, RNA-seq consensus, and subcellular location.
bash
uv run scripts/hpa_cli.py get-atlas-entry ENSG00000254647 \
  --output /tmp/ins_entry.json
Arguments:
  • ensembl_id
    (positional): The Ensembl Gene ID.
  • --format
    : Format of the returned entry, e.g., json (default:
    json
    ).
  • --output
    : Output file path.
获取基因的完整元数据,包括IHC评分、RNA-seq共识和亚细胞定位信息。
bash
uv run scripts/hpa_cli.py get-atlas-entry ENSG00000254647 \
  --output /tmp/ins_entry.json
参数:
  • ensembl_id
    (位置参数):Ensembl基因ID。
  • --format
    :返回条目的格式,例如json(默认:
    json
    )。
  • --output
    :输出文件路径。

5.
search-hpa
— Search by Attribute

5.
search-hpa
— 按属性搜索

Allows filtering for genes based on specific criteria (e.g., "elevated in amygdala").
bash
uv run scripts/hpa_cli.py search-hpa \
  --query "brain_category_rna:amygdala" \
  --output /tmp/search_results.json
Arguments:
  • --query
    : The search query string. Refer to references/search-api.md for details.
  • --output
    : Output file path.
允许根据特定条件过滤基因(如“在杏仁核中高表达”)。
bash
uv run scripts/hpa_cli.py search-hpa \
  --query "brain_category_rna:amygdala" \
  --output /tmp/search_results.json
参数:
  • --query
    :搜索查询字符串。详情请参考references/search-api.md。
  • --output
    :输出文件路径。

Core Rules

核心规则

  • Use the Wrapper: ALWAYS execute the provided helper scripts to query the database rather than accessing the database directly. The scripts automatically enforce fair use and implement retry logic.
  • Notification: If this skill is used, ensure this is mentioned in the output.
  • 使用封装脚本:请始终执行提供的辅助脚本来查询数据库,而非直接访问数据库。脚本会自动执行公平使用限制并实现重试逻辑。
  • 通知要求:如果使用了本技能,请确保在输出中提及这一点。

Common Errors

常见错误

  • If no results are returned, confirm the query is detailed enough starting with the api reference in references/search-api.md
  • If you cannot find the results, search the web for example HPA queries and use these to construct a better query.
  • The output is usually large. Use jq or write your own python data parsing library to process the search results. Never output to stdout, or cat the output file.
  • 如果未返回结果,请确认查询足够详细,可从references/search-api.md中的API参考开始排查。
  • 如果无法找到结果,请在网上搜索HPA查询示例,并用这些示例构建更合适的查询。
  • 输出内容通常较大。请使用jq或编写自己的Python数据解析库来处理搜索结果。切勿将输出打印到标准输出(stdout)或直接查看输出文件内容。