Human Protein Atlas (HPA) Database Integration
This skill provides semi-quantitative protein expression and spatial
localisation data from the Human Protein Atlas (HPA). While RNA-seq (e.g., GTEx)
tells us if a gene is being transcribed, HPA confirms if the protein product
actually exists, where it is located within the cell (e.g. nucleus vs
cytoplasm), and its concentration in systemic blood circulation. The data is
based on Immunohistochemistry (IHC) across normal human tissues and cancer
types.
Prerequisites
- : Read the skill and follow its Setup instructions to ensure
is installed and on PATH.
- User Notification: If LICENSE_NOTIFICATION.txt does not already exist in
this skill directory then (1) prominently notify the user to check the terms
at https://www.proteinatlas.org/about/licence, then (2) create the file
recording the notification text and timestamp.
When to Use
Use this skill when you need to:
- Map a gene symbol to its Ensembl ID for HPA queries.
- Retrieve the semi-quantitative protein abundance in normal human tissues and
cancer types based on IHC staining (High, Medium, Low, or Not Detected).
- Find the specific organelles or subcellular structures where a protein has
been localized (e.g., nucleoplasm, mitochondria).
- Check the consistency/agreement between RNA-seq consensus and protein
expression levels.
- Search for genes based on specific protein expression criteria (e.g.,
"elevated in amygdala" or "secreted proteins").
Do NOT use when you need to:
- Query eQTLs, pQTLs, or any variant-level associations. HPA provides
wild-type expression data and knows nothing about QTLs.
- Query gene expression in non-human species. HPA is strictly for human
proteins.
- Retrieve purely quantitative RNA expression without interest in the protein
product (consider using the GTEx skill instead).
Command Selection Guide
Pick the right command on the first try. Match the user's input to the
correct subcommand below.
- Map a gene symbol to Ensembl ID:
- Get tissue protein expression levels:
- Get subcellular location of a protein:
- Get the full HPA metadata entry for a gene:
- Search HPA for genes matching specific criteria:
Quick Start
bash
# Map the ERBB2 gene symbol to its Ensembl ID
uv run scripts/hpa_cli.py resolve-ensembl-id ERBB2 --output /tmp/erbb2_id.json
# Get subcellular location by Ensembl ID
uv run scripts/hpa_cli.py get-subcellular-location ENSG00000141736 --output /tmp/erbb2_location.json
All subcommands write JSON to disk. Always save output in the
directory.
The default output file is
if
is not
specified.
Commands
1. — Gene Symbol → Ensembl ID
Maps a common gene symbol (e.g., "TP53", "ERBB2") to its Ensembl gene ID. HPA
endpoints are strictly Ensembl-based.
bash
uv run scripts/hpa_cli.py resolve-ensembl-id TP53 --output /tmp/tp53_id.json
Arguments:
- (positional): The standard gene symbol (e.g., "TP53").
- : Output file path (default: ).
2. — Get Tissue Protein Levels
Returns a list of tissues and their corresponding protein expression levels
(High, Medium, Low, or Not Detected) based on IHC staining.
bash
uv run scripts/hpa_cli.py get-tissue-expression ENSG00000130234 \
--tissues "duodenum,thyroid gland" --output /tmp/tissue_expr.json
Arguments:
- (positional): The Ensembl Gene ID.
- : Comma-separated list of tissues to filter by (optional,
defaults to all available tissues).
- : Output file path (default: ).
3. — Get Subcellular Location
Retrieves the specific organelles or cellular structures where the protein has
been localized.
bash
uv run scripts/hpa_cli.py get-subcellular-location ENSG00000141736 \
--output /tmp/subcellular.json
Arguments:
- (positional): The Ensembl Gene ID.
- : Output file path.
4. — Get Full HPA Entry
Fetches the full metadata for a gene, including IHC scores, RNA-seq consensus,
and subcellular location.
bash
uv run scripts/hpa_cli.py get-atlas-entry ENSG00000254647 \
--output /tmp/ins_entry.json
Arguments:
- (positional): The Ensembl Gene ID.
- : Format of the returned entry, e.g., json (default: ).
- : Output file path.
5. — Search by Attribute
Allows filtering for genes based on specific criteria (e.g., "elevated in
amygdala").
bash
uv run scripts/hpa_cli.py search-hpa \
--query "brain_category_rna:amygdala" \
--output /tmp/search_results.json
Arguments:
- : The search query string. Refer to references/search-api.md for
details.
- : Output file path.
Core Rules
- Use the Wrapper: ALWAYS execute the provided helper scripts to query the
database rather than accessing the database directly. The scripts
automatically enforce fair use and implement retry logic.
- Notification: If this skill is used, ensure this is mentioned in the
output.
Common Errors
- If no results are returned, confirm the query is detailed enough starting
with the api reference in references/search-api.md
- If you cannot find the results, search the web for example HPA queries and
use these to construct a better query.
- The output is usually large. Use jq or write your own python data parsing
library to process the search results. Never output to stdout, or cat the
output file.