clinvar-database

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

ClinVar Database

ClinVar数据库

Prerequisites

前提条件

  1. uv
    : Read the
    uv
    skill and follow its Setup instructions to ensure
    uv
    is installed and on PATH.
  2. User Notification: If LICENSE_NOTIFICATION.txt does not already exist in this skill directory then (1) prominently notify the user to check the terms at https://www.ncbi.nlm.nih.gov/clinvar/, then (2) create the file recording the notification text and timestamp.
  3. .env
    file
    : Make sure the
    .env
    file exists in your home directory. Create one if it does not exist.
  4. NCBI_API_KEY
    (optional): Raises the NCBI rate limit from 3 to 10 requests/second. The skill works without it, but a key is recommended if the user plans many queries or encounters a 429 error. The user can obtain one for free by registering at https://www.ncbi.nlm.nih.gov/account/settings/. If the variable is missing from
    .env
    , do NOT ask the user to paste it into the chat (this would leak the key into the agent's context). Instead, give the user this command — substituting
    ENV_FILE
    with the resolved literal path to the
    .env
    file
    :
    bash
    printf "Enter NCBI API key (typing hidden): " && read -s key && echo && echo "NCBI_API_KEY=$key" >> "ENV_FILE" && echo "Saved."
    The scripts load credentials automatically via
    dotenv
    . NEVER read, print, or inspect the
    .env
    file or its variables (e.g. no
    cat
    ,
    grep
    ,
    echo
    ,
    printenv
    , or
    os.environ.get
    on keys). Credentials must stay out of the agent's context. See the API Key section for more details.
  1. uv
    :阅读
    uv
    技能文档并遵循其设置说明,确保
    uv
    已安装且在PATH路径中。
  2. 用户通知:如果此技能目录中不存在LICENSE_NOTIFICATION.txt文件,则(1) 显著通知用户查看https://www.ncbi.nlm.nih.gov/clinvar/上的条款,然后(2) 创建该文件并记录通知文本和时间戳。
  3. .env
    文件
    :确保主目录中存在
    .env
    文件,若不存在则创建一个。
  4. NCBI_API_KEY
    (可选):将NCBI的请求速率限制从3次/秒提升至10次/秒。即使没有该密钥,技能也能正常工作,但如果用户计划进行大量查询或遇到429错误,建议使用密钥。用户可通过在https://www.ncbi.nlm.nih.gov/account/settings/注册免费获取。如果`.env`文件中缺少该变量,请勿让用户在聊天中粘贴密钥(这会导致密钥泄露到Agent的上下文中)。相反,请向用户提供以下命令——**将`ENV_FILE`替换为`.env`文件的实际路径**:
    bash
    printf "Enter NCBI API key (typing hidden): " && read -s key && echo && echo "NCBI_API_KEY=$key" >> "ENV_FILE" && echo "Saved."
    脚本会通过
    dotenv
    自动加载凭据。绝对不要读取、打印或检查
    .env
    文件或其中的变量(例如,对密钥使用
    cat
    grep
    echo
    printenv
    os.environ.get
    )。凭据必须远离Agent的上下文。更多详情请参见API密钥章节

Overview

概述

ClinVar is the primary consensus record for clinical classifications of human genomic variations. It provides the "clinical ground truth" for pathogenicity labels (Pathogenic, Likely Pathogenic, Benign, VUS) based on assertions from global laboratories.
ClinVar是人类基因组变异临床分类的主要共识记录库。它基于全球实验室的断言,提供致病性标签(致病性、可能致病性、良性、意义未明VUS)的“临床金标准”。

When to Use

适用场景

Use when you need to:
  • Find the current clinical significance and star rating (review status) for a specific variant.
  • Fetch clinician notes, assertion criteria, or rationales for previous clinical laboratory classifications.
  • Retrieve the preferred condition name and associated HPO terms for a specific variant.
  • Find a list of variant controls (e.g., "Find all Pathogenic variants in the HBB gene within 50bp of a signal").
  • Check for conflicting interpretations for a given variant and identify the organizations submitting each classification.
Do NOT use when you need to:
  • Find specific allele frequencies in global populations (use gnomAD).
  • Describe the normal biological role of a protein and typical inheritance patterns (use OMIM).
  • Predict mechanistic effects of novel mutations, like frameshifts or exon skipping (use AlphaGenome).
  • Find recommended surveillance schedules for patients with a pathogenic variant (use GeneReviews).
  • Generate or view 3D structural models of affected proteins (use PDB / AlphaFold).
适用于以下需求:
  • 查询特定变异的当前临床意义和星级评分(评审状态)。
  • 获取临床医生注释、断言标准或既往临床实验室分类的依据。
  • 检索特定变异对应的首选病症名称及相关HPO术语。
  • 查找变异对照列表(例如:“查找HBB基因中信号位点50bp范围内的所有致病性变异”)。
  • 检查给定变异的冲突解读,并识别提交各分类结果的机构。
不适用于以下需求:
  • 查找全球人群中的特定等位基因频率(请使用gnomAD)。
  • 描述蛋白质的正常生物学作用和典型遗传模式(请使用OMIM)。
  • 预测新突变的机制效应,如移码突变或外显子跳跃(请使用AlphaGenome)。
  • 查找致病性变异患者的推荐监测方案(请使用GeneReviews)。
  • 生成或查看受影响蛋白质的3D结构模型(请使用PDB / AlphaFold)。

Quick Start

快速开始

ClinVar queries are executed via a robust Python wrapper script to handle strict rate limiting and XML/JSON parsing.
Example: Search for BRCA1 variants
bash
uv run scripts/clinvar_api.py search --query "BRCA1[gene]" --output results.json
ClinVar查询通过一个健壮的Python包装脚本执行,以处理严格的速率限制和XML/JSON解析。
示例:搜索BRCA1变异
bash
uv run scripts/clinvar_api.py search --query "BRCA1[gene]" --output results.json

Core Rules

核心规则

  • Retmax Constraint: The search command defaults to
    --retmax 200
    . For any "List all" or gene-wide request, you MUST explicitly set
    --retmax
    higher (e.g., 1000) to ensure data completeness.
  • Use the Wrapper: Prefer the wrapper script for standard queries. It handles rate limiting, retries, and the complex XML parsing for you. If the script's parsed output does not contain the specific fields you need, you may modify the script or query the NCBI E-utilities API directly — but be aware that the raw XML schemas are complex and vary between record types.
  • If the rate limit is hit, the script will throw a clear error. Follow the prerequisite instructions above to help the user add
    NCBI_API_KEY
    to the
    .env
    file.
  • Notification: If this skill is used, ensure this is mentioned in the output.
  • Retmax限制:搜索命令默认
    --retmax 200
    。对于任何“列出全部”或全基因范围的请求,必须显式设置更高的
    --retmax
    值(例如1000)以确保数据完整性。
  • 使用包装脚本:标准查询优先使用包装脚本,它会为你处理速率限制、重试和复杂的XML解析。如果脚本的解析输出不包含你需要的特定字段,你可以修改脚本或直接查询NCBI E-utilities API——但请注意,原始XML模式复杂且因记录类型而异。
  • 如果触发速率限制,脚本会抛出明确的错误。请按照上述前提条件说明帮助用户将
    NCBI_API_KEY
    添加到
    .env
    文件中。
  • 通知要求:如果使用此技能,请确保在输出中提及这一点。

Utility Scripts

实用脚本

1.
count
— Count Matching Variants

1.
count
— 统计匹配变异数量

Purpose: Check how many variants match a query without fetching IDs. Use to decide whether a full
search
is warranted.
Arguments:
  • --query
    : (Required) NCBI Entrez search query string.
  • --output
    : (Required) Output JSON file path.
Example:
uv run scripts/clinvar_api.py count \ --query "TP53[gene] AND \"uncertain significance\"[clinsig]" \ --output count.json
Output:
{"total_count": <int>}
用途:在不获取ID的情况下检查有多少变异匹配查询,用于判断是否需要执行完整的
search
操作。
参数:
  • --query
    :(必填)NCBI Entrez搜索查询字符串。
  • --output
    :(必填)输出JSON文件路径。
示例:
uv run scripts/clinvar_api.py count \ --query "TP53[gene] AND \"uncertain significance\"[clinsig]" \ --output count.json
输出:
{"total_count": <int>}

2.
search
— Search Variants

2.
search
— 搜索变异

Purpose: Identify variants based on genomic location, gene symbols, or clinical attributes using NCBI Entrez search syntax. The search command automatically paginates through all matching results to ensure complete, deterministic retrieval.
bash
undefined
用途:使用NCBI Entrez搜索语法,基于基因组位置、基因符号或临床属性识别变异。搜索命令会自动分页遍历所有匹配结果,以确保完整、确定性地检索数据。
bash
undefined

Fetch ALL matching variants (default behavior)

获取所有匹配变异(默认行为)

uv run scripts/clinvar_api.py search
--query "BRCA1[gene]" --output results.json
uv run scripts/clinvar_api.py search
--query "BRCA1[gene]" --output results.json

Search by Chromosome and Position Range

按染色体和位置范围搜索

uv run scripts/clinvar_api.py search
--query "11[chr] AND 5225000:5226000[chrpos]" --output results.json
uv run scripts/clinvar_api.py search
--query "11[chr] AND 5225000:5226000[chrpos]" --output results.json

Combine terms using Entrez syntax

使用Entrez语法组合条件

uv run scripts/clinvar_api.py search
--query "HBB[gene] AND pathogenic[clinsig]" --output results.json
uv run scripts/clinvar_api.py search
--query "HBB[gene] AND pathogenic[clinsig]" --output results.json

Cap results at 50

限制结果为50条

uv run scripts/clinvar_api.py search
--query "TP53[gene]" --retmax 50 --output results.json

*Arguments:*

-   `--query`: (Required) NCBI Entrez search query string.
-   `--retmax`: Maximum total number of variant IDs to return. **Default is 0,
    which means "fetch all matching results."** Set to a positive integer to cap
    the result set.
-   `--page_size`: Number of IDs to fetch per API request (default: 500, max:
    10000 per NCBI limits).
-   `--output`: (Required) Output JSON file path.

*Output:* A JSON object containing:

-   `total_count` — Total number of matching variants in ClinVar.
-   `fetched_count` — Number of IDs actually retrieved.
-   `variant_ids` — List of ClinVar Variation ID strings.
uv run scripts/clinvar_api.py search
--query "TP53[gene]" --retmax 50 --output results.json

*参数:*

-   `--query`:(必填)NCBI Entrez搜索查询字符串。
-   `--retmax`:返回的变异ID最大总数。**默认值为0,表示“获取所有匹配结果”**。设置为正整数可限制结果集大小。
-   `--page_size`:每个API请求获取的ID数量(默认:500,上限:NCBI限制的10000)。
-   `--output`:(必填)输出JSON文件路径。

*输出:* 一个JSON对象,包含:

-   `total_count` — ClinVar中匹配变异的总数。
-   `fetched_count` — 实际检索到的ID数量。
-   `variant_ids` — ClinVar变异ID字符串列表。

3.
summary
— Get Interpretation Summary

3.
summary
— 获取解读摘要

Purpose: Retrieve top-line clinical significance labels, star ratings (review status), and basic phenotype data for rapid variant screening.
bash
undefined
用途:检索核心临床意义标签、星级评分(评审状态)和基本表型数据,用于快速变异筛选。
bash
undefined

Get summary for one or more Variation IDs

获取一个或多个变异ID的摘要

uv run scripts/clinvar_api.py summary
--variant_ids 12345 67890 --output summary.json

*Arguments:*

-   `--variant_ids`: (Required) One or more ClinVar Variation IDs.
-   `--output`: (Required) Output JSON file path.

*Output:* A JSON list of summary objects, each containing:

-   `variant_id`, `title`, `clinical_significance`, `review_status`, \
    `last_evaluated`, `phenotypes`
-   `genes` — list of `{gene_id, symbol, strand}`
-   `variation_type` — e.g., single nucleotide variant, Deletion, Insertion
-   `molecular_consequences` — list of strings (e.g., ["missense variant", \
    "nonsense"])
uv run scripts/clinvar_api.py summary
--variant_ids 12345 67890 --output summary.json

*参数:*

-   `--variant_ids`:(必填)一个或多个ClinVar变异ID。
-   `--output`:(必填)输出JSON文件路径。

*输出:* 一个JSON摘要对象列表,每个对象包含:

-   `variant_id`、`title`、`clinical_significance`、`review_status`、`last_evaluated`、`phenotypes`
-   `genes` — `{gene_id, symbol, strand}`列表
-   `variation_type` — 例如,单核苷酸变异、缺失、插入
-   `molecular_consequences` — 字符串列表(例如,["错义变异", "无义变异"])

4.
evidence
— Get Clinical Evidence

4.
evidence
— 获取临床证据

Purpose: Fetch the full clinical record for a single variant, including free-text clinician rationales, assertion methods, and specific submitter notes.
bash
undefined
用途:获取单个变异的完整临床记录,包括自由文本形式的临床医生依据、断言方法和特定提交者注释。
bash
undefined

Get full evidence for a single Variation ID

获取单个变异ID的完整证据

uv run scripts/clinvar_api.py evidence
--variant_id 12345 --output evidence.json

*Arguments:*

-   `--variant_id`: (Required) A single ClinVar Variation ID.
-   `--output`: (Required) Output JSON file path.

*Output:* A JSON object containing:

-   `variant_id`
-   `allele_info` — `{chromosome, position_start, position_stop,
    reference_allele, alternate_allele, cytogenetic_band, dbsnp_rsid}` (GRCh38
    preferred)
-   `conditions` — list of `{name, medgen_cui, omim_id, orphanet_id, hpo_terms}`
-   `functional_consequences` — list of `{value, sequence_ontology_id}`
-   `structural_variant_details` — `{outer_start, inner_start, inner_stop,
    outer_stop, copy_number}` (present only for CNVs, otherwise null)
-   `citation_references` — list of PubMed IDs cited in the global "Citations"
    section
-   `submissions` — list of per-submitter records, each containing:
    -   `submitter_name`, `classification`, `curator_notes`,
        `assertion_criteria`
    -   `date_last_evaluated` — when the submitter last reviewed the
        classification
uv run scripts/clinvar_api.py evidence
--variant_id 12345 --output evidence.json

*参数:*

-   `--variant_id`:(必填)单个ClinVar变异ID。
-   `--output`:(必填)输出JSON文件路径。

*输出:* 一个JSON对象,包含:

-   `variant_id`
-   `allele_info` — `{chromosome, position_start, position_stop, reference_allele, alternate_allele, cytogenetic_band, dbsnp_rsid}`(优先使用GRCh38版本)
-   `conditions` — `{name, medgen_cui, omim_id, orphanet_id, hpo_terms}`列表
-   `functional_consequences` — `{value, sequence_ontology_id}`列表
-   `structural_variant_details` — `{outer_start, inner_start, inner_stop, outer_stop, copy_number}`(仅针对CNV存在,否则为null)
-   `citation_references` — 全局“引用”部分中引用的PubMed ID列表
-   `submissions` — 每个提交者的记录列表,包含:
    -   `submitter_name`、`classification`、`curator_notes`、`assertion_criteria`
    -   `date_last_evaluated` — 提交者上次评审分类的时间

Typical Workflows

典型工作流

Count-First Workflow (Recommended)

先统计后检索工作流(推荐)

For large or unknown result sets, use
count
first to decide whether to proceed, then
search
(which auto-paginates and returns
total_count
/
fetched_count
), then
summary
to screen.
bash
undefined
对于大型或未知大小的结果集,先使用
count
判断是否继续,然后使用
search
(自动分页并返回
total_count
/
fetched_count
),再使用
summary
进行筛选。
bash
undefined

Step 1: Gauge size (optional — search also returns total_count)

步骤1:评估结果规模(可选——search也会返回total_count)

uv run scripts/clinvar_api.py count
--query "HBB[gene] AND pathogenic[clinsig]" --output count.json
uv run scripts/clinvar_api.py count
--query "HBB[gene] AND pathogenic[clinsig]" --output count.json

Step 2: Fetch all variant IDs (auto-paginates)

步骤2:获取所有变异ID(自动分页)

uv run scripts/clinvar_api.py search
--query "HBB[gene] AND pathogenic[clinsig]" --output ids.json
uv run scripts/clinvar_api.py search
--query "HBB[gene] AND pathogenic[clinsig]" --output ids.json

Step 3: Get summaries (extract variant_ids from search output)

步骤3:获取摘要(从search输出中提取variant_ids)

uv run scripts/clinvar_api.py summary
--variant_ids 12345 67890 --output summary.json
undefined
uv run scripts/clinvar_api.py summary
--variant_ids 12345 67890 --output summary.json
undefined

Deep Dive: search → evidence

深度分析:search → evidence

When you need the full clinical picture for a specific variant — including submitter rationales, PubMed citations, ontology-linked conditions, and allele coordinates — use
evidence
.
bash
uv run scripts/clinvar_api.py evidence \
  --variant_id 12345 --output evidence.json
当需要特定变异的完整临床信息——包括提交者依据、PubMed引用、关联病症本体和等位基因坐标时,使用
evidence
bash
uv run scripts/clinvar_api.py evidence \
  --variant_id 12345 --output evidence.json

Workflow: Robust Variant Discovery (Triangulation)

工作流:稳健变异发现(三角验证)

ClinVar metadata is inconsistent. To fulfill "List all" requests, do not rely on a single filter. Perform the following in a single turn and merge results:
  1. Search by exact label (e.g.,
    "3 prime UTR variant"[molecular_consequence]
    ).
  2. Search by HGVS nomenclature pattern (e.g.,
    c.*
    ).
  3. Search by genomic coordinate range (using
    [chrpos]
    ).
This "triangulation" ensures structural variants with missing labels are not overlooked.
ClinVar元数据存在不一致性。为满足“列出全部”请求,不要依赖单一筛选条件。请在单次操作中执行以下步骤并合并结果:
  1. 按精确标签搜索(例如:
    "3 prime UTR variant"[molecular_consequence]
    )。
  2. 按HGVS命名模式搜索(例如:
    c.*
    )。
  3. 按基因组坐标范围搜索(使用
    [chrpos]
    )。
这种“三角验证”可确保不会遗漏标签缺失的结构变异。

Verifying Coding vs. Non-Coding Status via HGVS

通过HGVS验证编码/非编码状态

molecular_consequences
alone can be ambiguous (e.g.,
splice donor variant
appears in both coding and non-coding contexts). Always cross-check the
title
field for HGVS patterns:
  • c.-…
    — 5' UTR (non-coding)
  • c.*…
    — 3' UTR (non-coding)
  • c.123+N
    /
    c.123-N
    — intronic (non-coding)
  • p.Trp146Arg
    etc. — protein effect (coding)
A variant with UTR/intronic HGVS and no
p.
annotation is non-coding, even with splicing labels. Conversely, any
p.
annotation indicates a coding effect.
molecular_consequences
可能存在歧义(例如,
剪接供体变异
同时出现在编码和非编码场景中)。请始终交叉检查
title
字段中的HGVS模式:
  • c.-…
    — 5'非翻译区(非编码)
  • c.*…
    — 3'非翻译区(非编码)
  • c.123+N
    /
    c.123-N
    — 内含子区(非编码)
  • p.Trp146Arg
    等 — 蛋白效应(编码)
具有UTR/内含子HGVS且无
p.
注释的变异为非编码变异,即使带有剪接标签。反之,任何
p.
注释均表示存在编码效应。

ClinVar Metadata Reference

ClinVar元数据参考

  • 3' UTR
    • Search String:
      "3 prime UTR variant"[mol_consequence]
    • HGVS:
      c.*
  • 5' UTR
    • Search String:
      "5 prime UTR variant"[mol_consequence]
    • HGVS:
      c.-
  • To find "high-confidence" variants or expert-reviewed consensus, use the
    review_status
    filter. This is the most efficient way to distinguish between single-laboratory assertions and panel-reviewed ground truth.
  • 3' UTR
    • 搜索字符串:
      "3 prime UTR variant"[mol_consequence]
    • HGVS:
      c.*
  • 5' UTR
    • 搜索字符串:
      "5 prime UTR variant"[mol_consequence]
    • HGVS:
      c.-
  • 要查找“高置信度”变异或专家评审共识,请使用
    review_status
    筛选器。这是区分单一实验室断言和专家组评审金标准的最有效方式。

When to Use Which Fields

字段使用场景

  • Quick pathogenicity label — Use
    summary
    clinical_significance
  • Gene symbol and strand — Use
    summary
    genes
  • Variant type (SNV, del, etc.) — Use
    summary
    variation_type
  • Protein-level effect — Use
    summary
    molecular_consequences
  • Genomic coordinates (GRCh38) — Use
    evidence
    allele_info
  • Linked conditions (ontology) — Use
    evidence
    conditions
  • SO functional consequence — Use
    evidence
    functional_consequences
  • CNV breakpoints/copy number — Use
    evidence
    structural_variant_details
  • PubMed references — Use
    evidence
    citation_references
  • Date of last lab review — Use both →
    last_evaluated
  • Clinician rationales — Use
    evidence
    submissions[].curator_notes
  • 快速致病性标签 — 使用
    summary
    clinical_significance
  • 基因符号和链信息 — 使用
    summary
    genes
  • 变异类型(SNV、缺失等) — 使用
    summary
    variation_type
  • 蛋白水平效应 — 使用
    summary
    molecular_consequences
  • 基因组坐标(GRCh38) — 使用
    evidence
    allele_info
  • 关联病症(本体) — 使用
    evidence
    conditions
  • SO功能注释 — 使用
    evidence
    functional_consequences
  • CNV断点/拷贝数 — 使用
    evidence
    structural_variant_details
  • PubMed参考文献 — 使用
    evidence
    citation_references
  • 上次实验室评审日期 — 同时使用
    summary
    evidence
    中的
    last_evaluated
  • 临床医生依据 — 使用
    evidence
    submissions[].curator_notes

Retrieving Genomic Coordinates (Default HG38/GRCh38)

检索基因组坐标(默认HG38/GRCh38)

To get precise genomic coordinates in the format
<chrom>:<pos>:<ref>><alt>
(e.g.,
chr5:70951945:G>A
), you must use the
evidence
command, as these details are not available in the
summary
output.
You MUST always include genomic coordinates in the format
<chrom>:<pos>:<ref>><alt>
when listing or presenting variants, even if not explicitly requested by the user. If coordinates are missing from the summary, use the
evidence
command or dbSNP fallback to retrieve them.
  1. Fetch Evidence: Use
    uv run scripts/clinvar_api.py evidence --variant_id <ID> --output evidence.json
    .
  2. Extract VCF Attributes: The
    evidence
    command parses the XML. Extract:
    • Chromosome:
      Chr
    • Position:
      positionVCF
      (or
      start
      )
    • Ref:
      referenceAlleleVCF
      (or
      referenceAllele
      )
    • Alt:
      alternateAlleleVCF
      (or
      alternateAllele
      ) from the
      SequenceLocation
      element with
      Assembly="GRCh38"
      .
Fallback for Imprecise Coordinates (Gene Range): ClinVar often returns the full gene range for non-coding variants. If the extracted coordinates correspond to the gene range instead of a specific position, use the
dbsnp-database
skill to resolve the precise coordinates using the
dbsnp_rsid
or HGVS title: 1.Check for
dbsnp_rsid
in the
evidence
output. 2. Run
uv run scripts/dbsnp_cli.py resolve-rsid {rsid}
to get precise GRCh38 coordinates. 3. Format as
<chrom>:<pos>:<ref>><alt>
using the SPDI or HGVS data from dbSNP.
要获取
<chrom>:<pos>:<ref>><alt>
格式的精确基因组坐标(例如:
chr5:70951945:G>A
),必须使用
evidence
命令,因为这些细节在
summary
输出中不可用。
在列出或展示变异时,无论用户是否明确要求,都必须始终包含
<chrom>:<pos>:<ref>><alt>
格式的基因组坐标。如果摘要中缺少坐标,请使用
evidence
命令或dbSNP回退方式检索。
  1. 获取证据:使用
    uv run scripts/clinvar_api.py evidence --variant_id <ID> --output evidence.json
  2. 提取VCF属性
    evidence
    命令会解析XML,提取:
    • 染色体:
      Chr
    • 位置:
      positionVCF
      (或
      start
    • 参考等位基因:
      referenceAlleleVCF
      (或
      referenceAllele
    • 替代等位基因:
      alternateAlleleVCF
      (或
      alternateAllele
      ),来自
      Assembly="GRCh38"
      SequenceLocation
      元素。
坐标不精确时的回退方案(基因范围):ClinVar通常为非编码变异返回完整基因范围。如果提取的坐标对应基因范围而非特定位置,请使用
dbsnp-database
技能,通过
dbsnp_rsid
或HGVS标题解析精确坐标:1.检查
evidence
输出中的
dbsnp_rsid
;2.运行
uv run scripts/dbsnp_cli.py resolve-rsid {rsid}
获取精确GRCh38坐标;3.使用dbSNP中的SPDI或HGVS数据格式化为
<chrom>:<pos>:<ref>><alt>

Structural Variant Note

结构变异说明

The
structural_variant_details
field is only populated for copy number variants (CNVs). For standard SNVs and small indels this field will be
null
. Use the
allele_info
fields (
position_start
,
position_stop
,
reference_allele
,
alternate_allele
) instead.
structural_variant_details
字段仅针对拷贝数变异(CNV)填充。对于标准SNV和小插入缺失,该字段为
null
,请改用
allele_info
字段(
position_start
position_stop
reference_allele
alternate_allele
)。

CNV / Large Deletion Note

CNV/大片段缺失说明

Large copy-number variants (CNVs) frequently have empty
molecular_consequences
. If a variant title mentions "del" and coordinates overlap your target region, it is relevant regardless of missing labels.
大片段拷贝数变异(CNV)的
molecular_consequences
字段经常为空。如果变异标题提及“del”且坐标与目标区域重叠,则无论标签是否缺失,该变异均为相关变异。

Obtaining and Using an API Key

获取和使用API密钥

To increase the rate limit to 10 requests per second, you need to obtain an NCBI API key and add it to the
.env
file. You can obtain a key by following the instructions at NCBI ClinVar API docs
Once you have a key, follow the prerequisite instructions to add it to the
.env
file.
bash
uv run scripts/clinvar_api.py search --query "BRCA1[gene]" --output results.json
If a
RateLimitError
is encountered, follow the prerequisite instructions to help the user add
NCBI_API_KEY
to the
.env
file, providing the NCBI ClinVar API docs URL for instructions on how to obtain one.
要将速率限制提升至10次/秒,需要获取NCBI API密钥并添加到
.env
文件中。可按照NCBI ClinVar API文档中的说明获取密钥。
获取密钥后,请按照前提条件说明将其添加到
.env
文件中。
bash
uv run scripts/clinvar_api.py search --query "BRCA1[gene]" --output results.json
如果遇到
RateLimitError
,请按照前提条件说明帮助用户将
NCBI_API_KEY
添加到
.env
文件中,并提供NCBI ClinVar API文档链接以指导用户获取密钥。

Best Practices

最佳实践

  • Always use
    uv run
    to execute
    python
    .
  • If
    jq
    is unavailable pivot immediately to using Python one-liners for processing JSON (e.g.,
    uv run python3 -c "import json; ..."
    ).
  • Use
    count
    before
    search
    to understand the result set size.
  • The
    search
    command fetches all results by default and includes
    total_count
    and
    fetched_count
    in the output — always verify these match to confirm complete retrieval.
  • Entrez results are unsorted. To order by date, fetch all results and sort locally by
    last_evaluated
    .
  • 始终使用
    uv run
    执行Python脚本。
  • 如果
    jq
    不可用,请立即改用Python单行代码处理JSON(例如:
    uv run python3 -c "import json; ..."
    )。
  • 在执行
    search
    前使用
    count
    了解结果集大小。
  • search
    命令默认获取所有结果,并在输出中包含
    total_count
    fetched_count
    ——请始终验证两者是否匹配以确保检索完整。
  • Entrez结果未排序。如需按日期排序,请获取所有结果后在本地按
    last_evaluated
    排序。

Common Mistakes

常见错误

  • Attempting to parse the E-utilities XML yourself — Always use the provided
    clinvar_api.py
    client which handles the unpredictable XML schemas robustly.
  • Getting HTTP 429 Too Many Requests — The client throws an exception telling you to pause. Follow the prerequisite instructions to help the user add
    NCBI_API_KEY
    to the
    .env
    file, then retry.
  • Sending raw DNA sequences to the API — The API expects HGVS nomenclature, RS IDs, or proper Entrez coordinate syntax (
    11[chr] AND 1234[chrpos]
    ), not raw ATCG strings.
  • For synonymous or non-coding variants — HGVS nomenclature (e.g., CAPN3 AND "c.551C>T") is more reliable than coordinate searches ([chrpos]), as many ClinVar records for these types lack precise genomic mappings.
  • Case sensitivity in molecular consequences — ClinVar returns mixed-case strings. Always use case-insensitive matching (
    .lower()
    ) when filtering.
  • Parsing
    search
    output as a bare list
    search
    returns a JSON object with
    total_count
    ,
    fetched_count
    , and
    variant_ids
    — not a bare list.
  • 尝试自行解析E-utilities XML — 始终使用提供的
    clinvar_api.py
    客户端,它能稳健处理不可预测的XML模式。
  • 收到HTTP 429请求过多错误 — 客户端会抛出异常提示暂停。请按照前提条件说明帮助用户将
    NCBI_API_KEY
    添加到
    .env
    文件中,然后重试。
  • 向API发送原始DNA序列 — API接受HGVS命名、RS ID或正确的Entrez坐标语法(
    11[chr] AND 1234[chrpos]
    ),而非原始ATCG字符串。
  • 针对同义或非编码变异 — HGVS命名(例如CAPN3 AND "c.551C>T")比坐标搜索([chrpos])更可靠,因为许多此类ClinVar记录缺少精确的基因组映射。
  • 分子功能注释的大小写敏感性 — ClinVar返回的字符串大小写混合。筛选时请始终使用不区分大小写的匹配(
    .lower()
    )。
  • search
    输出解析为纯列表
    search
    返回包含
    total_count
    fetched_count
    variant_ids
    的JSON对象,而非纯列表。