clinvar-database

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

ClinVar Database

ClinVar数据库

Prerequisites

前提条件

uv
: Read the
```
uv
```
skill and follow its Setup instructions to ensure
```
uv
```
is installed and on PATH.
User Notification: If LICENSE_NOTIFICATION.txt does not already exist in this skill directory then (1) prominently notify the user to check the terms at https://www.ncbi.nlm.nih.gov/clinvar/, then (2) create the file recording the notification text and timestamp.
.env
file: Make sure the
```
.env
```
file exists in your home directory. Create one if it does not exist.
NCBI_API_KEY
(optional): Raises the NCBI rate limit from 3 to 10 requests/second. The skill works without it, but a key is recommended if the user plans many queries or encounters a 429 error. The user can obtain one for free by registering at https://www.ncbi.nlm.nih.gov/account/settings/. If the variable is missing from
```
.env
```
, do NOT ask the user to paste it into the chat (this would leak the key into the agent's context). Instead, give the user this command — substituting
ENV_FILE
with the resolved literal path to the
.env
file:
bash
```
printf "Enter NCBI API key (typing hidden): " && read -s key && echo && echo "NCBI_API_KEY=$key" >> "ENV_FILE" && echo "Saved."
```
The scripts load credentials automatically via
```
dotenv
```
. NEVER read, print, or inspect the
```
.env
```
file or its variables (e.g. no
```
cat
```
,
```
grep
```
,
```
echo
```
,
```
printenv
```
, or
```
os.environ.get
```
on keys). Credentials must stay out of the agent's context. See the API Key section for more details.

uv
：阅读
```
uv
```
技能文档并遵循其设置说明，确保
```
uv
```
已安装且在PATH路径中。
用户通知：如果此技能目录中不存在LICENSE_NOTIFICATION.txt文件，则(1) 显著通知用户查看https://www.ncbi.nlm.nih.gov/clinvar/上的条款，然后(2) 创建该文件并记录通知文本和时间戳。
.env
文件：确保主目录中存在
```
.env
```
文件，若不存在则创建一个。
NCBI_API_KEY
（可选）：将NCBI的请求速率限制从3次/秒提升至10次/秒。即使没有该密钥，技能也能正常工作，但如果用户计划进行大量查询或遇到429错误，建议使用密钥。用户可通过在https://www.ncbi.nlm.nih.gov/account/settings/注册免费获取。如果`.env`文件中缺少该变量，请勿让用户在聊天中粘贴密钥（这会导致密钥泄露到Agent的上下文中）。相反，请向用户提供以下命令——**将`ENV_FILE`替换为`.env`文件的实际路径**：
bash
```
printf "Enter NCBI API key (typing hidden): " && read -s key && echo && echo "NCBI_API_KEY=$key" >> "ENV_FILE" && echo "Saved."
```
脚本会通过
```
dotenv
```
自动加载凭据。绝对不要读取、打印或检查
```
.env
```
文件或其中的变量（例如，对密钥使用
```
cat
```
、
```
grep
```
、
```
echo
```
、
```
printenv
```
或
```
os.environ.get
```
）。凭据必须远离Agent的上下文。更多详情请参见API密钥章节。

Overview

概述

ClinVar is the primary consensus record for clinical classifications of human genomic variations. It provides the "clinical ground truth" for pathogenicity labels (Pathogenic, Likely Pathogenic, Benign, VUS) based on assertions from global laboratories.

ClinVar是人类基因组变异临床分类的主要共识记录库。它基于全球实验室的断言，提供致病性标签（致病性、可能致病性、良性、意义未明VUS）的“临床金标准”。

When to Use

适用场景

Use when you need to:

Find the current clinical significance and star rating (review status) for a specific variant.
Fetch clinician notes, assertion criteria, or rationales for previous clinical laboratory classifications.
Retrieve the preferred condition name and associated HPO terms for a specific variant.
Find a list of variant controls (e.g., "Find all Pathogenic variants in the HBB gene within 50bp of a signal").
Check for conflicting interpretations for a given variant and identify the organizations submitting each classification.

Do NOT use when you need to:

Find specific allele frequencies in global populations (use gnomAD).
Describe the normal biological role of a protein and typical inheritance patterns (use OMIM).
Predict mechanistic effects of novel mutations, like frameshifts or exon skipping (use AlphaGenome).
Find recommended surveillance schedules for patients with a pathogenic variant (use GeneReviews).
Generate or view 3D structural models of affected proteins (use PDB / AlphaFold).

适用于以下需求：

查询特定变异的当前临床意义和星级评分（评审状态）。
获取临床医生注释、断言标准或既往临床实验室分类的依据。
检索特定变异对应的首选病症名称及相关HPO术语。
查找变异对照列表（例如：“查找HBB基因中信号位点50bp范围内的所有致病性变异”）。
检查给定变异的冲突解读，并识别提交各分类结果的机构。

不适用于以下需求：

查找全球人群中的特定等位基因频率（请使用gnomAD）。
描述蛋白质的正常生物学作用和典型遗传模式（请使用OMIM）。
预测新突变的机制效应，如移码突变或外显子跳跃（请使用AlphaGenome）。
查找致病性变异患者的推荐监测方案（请使用GeneReviews）。
生成或查看受影响蛋白质的3D结构模型（请使用PDB / AlphaFold）。

Quick Start

快速开始

ClinVar queries are executed via a robust Python wrapper script to handle strict rate limiting and XML/JSON parsing.

Example: Search for BRCA1 variants

bash

uv run scripts/clinvar_api.py search --query "BRCA1[gene]" --output results.json

ClinVar查询通过一个健壮的Python包装脚本执行，以处理严格的速率限制和XML/JSON解析。

示例：搜索BRCA1变异

bash

uv run scripts/clinvar_api.py search --query "BRCA1[gene]" --output results.json

Core Rules

核心规则

Retmax Constraint: The search command defaults to
```
--retmax 200
```
. For any "List all" or gene-wide request, you MUST explicitly set
```
--retmax
```
higher (e.g., 1000) to ensure data completeness.
Use the Wrapper: Prefer the wrapper script for standard queries. It handles rate limiting, retries, and the complex XML parsing for you. If the script's parsed output does not contain the specific fields you need, you may modify the script or query the NCBI E-utilities API directly — but be aware that the raw XML schemas are complex and vary between record types.
If the rate limit is hit, the script will throw a clear error. Follow the prerequisite instructions above to help the user add
```
NCBI_API_KEY
```
to the
```
.env
```
file.
Notification: If this skill is used, ensure this is mentioned in the output.

Retmax限制：搜索命令默认
```
--retmax 200
```
。对于任何“列出全部”或全基因范围的请求，必须显式设置更高的
```
--retmax
```
值（例如1000）以确保数据完整性。
使用包装脚本：标准查询优先使用包装脚本，它会为你处理速率限制、重试和复杂的XML解析。如果脚本的解析输出不包含你需要的特定字段，你可以修改脚本或直接查询NCBI E-utilities API——但请注意，原始XML模式复杂且因记录类型而异。
如果触发速率限制，脚本会抛出明确的错误。请按照上述前提条件说明帮助用户将
```
NCBI_API_KEY
```
添加到
```
.env
```
文件中。
通知要求：如果使用此技能，请确保在输出中提及这一点。

Utility Scripts

实用脚本

count

— Count Matching Variants

count

— 统计匹配变异数量

Purpose: Check how many variants match a query without fetching IDs. Use to decide whether a full

search

is warranted.

Arguments:

```
--query
```
: (Required) NCBI Entrez search query string.
```
--output
```
: (Required) Output JSON file path.

Example:

uv run scripts/clinvar_api.py count \ --query "TP53[gene] AND \"uncertain significance\"[clinsig]" \ --output count.json

Output:

{"total_count": <int>}

用途：在不获取ID的情况下检查有多少变异匹配查询，用于判断是否需要执行完整的

search

操作。

参数：

```
--query
```
：（必填）NCBI Entrez搜索查询字符串。
```
--output
```
：（必填）输出JSON文件路径。

示例：

uv run scripts/clinvar_api.py count \ --query "TP53[gene] AND \"uncertain significance\"[clinsig]" \ --output count.json

输出：

{"total_count": <int>}

search

— Search Variants

search

— 搜索变异

Purpose: Identify variants based on genomic location, gene symbols, or clinical attributes using NCBI Entrez search syntax. The search command automatically paginates through all matching results to ensure complete, deterministic retrieval.

bash

undefined

用途：使用NCBI Entrez搜索语法，基于基因组位置、基因符号或临床属性识别变异。搜索命令会自动分页遍历所有匹配结果，以确保完整、确定性地检索数据。

bash

undefined

Fetch ALL matching variants (default behavior)

获取所有匹配变异（默认行为）

uv run scripts/clinvar_api.py search
--query "BRCA1[gene]" --output results.json

Search by Chromosome and Position Range

按染色体和位置范围搜索

uv run scripts/clinvar_api.py search
--query "11[chr] AND 5225000:5226000[chrpos]" --output results.json

Combine terms using Entrez syntax

使用Entrez语法组合条件

uv run scripts/clinvar_api.py search
--query "HBB[gene] AND pathogenic[clinsig]" --output results.json

Cap results at 50

限制结果为50条

uv run scripts/clinvar_api.py search
--query "TP53[gene]" --retmax 50 --output results.json


*Arguments:*

-   `--query`: (Required) NCBI Entrez search query string.
-   `--retmax`: Maximum total number of variant IDs to return. **Default is 0,
    which means "fetch all matching results."** Set to a positive integer to cap
    the result set.
-   `--page_size`: Number of IDs to fetch per API request (default: 500, max:
    10000 per NCBI limits).
-   `--output`: (Required) Output JSON file path.

*Output:* A JSON object containing:

-   `total_count` — Total number of matching variants in ClinVar.
-   `fetched_count` — Number of IDs actually retrieved.
-   `variant_ids` — List of ClinVar Variation ID strings.

uv run scripts/clinvar_api.py search
--query "TP53[gene]" --retmax 50 --output results.json


*参数：*

-   `--query`：（必填）NCBI Entrez搜索查询字符串。
-   `--retmax`：返回的变异ID最大总数。**默认值为0，表示“获取所有匹配结果”**。设置为正整数可限制结果集大小。
-   `--page_size`：每个API请求获取的ID数量（默认：500，上限：NCBI限制的10000）。
-   `--output`：（必填）输出JSON文件路径。

*输出：* 一个JSON对象，包含：

-   `total_count` — ClinVar中匹配变异的总数。
-   `fetched_count` — 实际检索到的ID数量。
-   `variant_ids` — ClinVar变异ID字符串列表。

summary

— Get Interpretation Summary

summary

— 获取解读摘要

Purpose: Retrieve top-line clinical significance labels, star ratings (review status), and basic phenotype data for rapid variant screening.

bash

undefined

用途：检索核心临床意义标签、星级评分（评审状态）和基本表型数据，用于快速变异筛选。

bash

undefined

Get summary for one or more Variation IDs

获取一个或多个变异ID的摘要

uv run scripts/clinvar_api.py summary
--variant_ids 12345 67890 --output summary.json


*Arguments:*

-   `--variant_ids`: (Required) One or more ClinVar Variation IDs.
-   `--output`: (Required) Output JSON file path.

*Output:* A JSON list of summary objects, each containing:

-   `variant_id`, `title`, `clinical_significance`, `review_status`, \
    `last_evaluated`, `phenotypes`
-   `genes` — list of `{gene_id, symbol, strand}`
-   `variation_type` — e.g., single nucleotide variant, Deletion, Insertion
-   `molecular_consequences` — list of strings (e.g., ["missense variant", \
    "nonsense"])

uv run scripts/clinvar_api.py summary
--variant_ids 12345 67890 --output summary.json


*参数：*

-   `--variant_ids`：（必填）一个或多个ClinVar变异ID。
-   `--output`：（必填）输出JSON文件路径。

*输出：* 一个JSON摘要对象列表，每个对象包含：

-   `variant_id`、`title`、`clinical_significance`、`review_status`、`last_evaluated`、`phenotypes`
-   `genes` — `{gene_id, symbol, strand}`列表
-   `variation_type` — 例如，单核苷酸变异、缺失、插入
-   `molecular_consequences` — 字符串列表（例如，["错义变异", "无义变异"]）

evidence

— Get Clinical Evidence

evidence

— 获取临床证据

Purpose: Fetch the full clinical record for a single variant, including free-text clinician rationales, assertion methods, and specific submitter notes.

bash

undefined

用途：获取单个变异的完整临床记录，包括自由文本形式的临床医生依据、断言方法和特定提交者注释。

bash

undefined

Get full evidence for a single Variation ID

获取单个变异ID的完整证据

uv run scripts/clinvar_api.py evidence
--variant_id 12345 --output evidence.json


*Arguments:*

-   `--variant_id`: (Required) A single ClinVar Variation ID.
-   `--output`: (Required) Output JSON file path.

*Output:* A JSON object containing:

-   `variant_id`
-   `allele_info` — `{chromosome, position_start, position_stop,
    reference_allele, alternate_allele, cytogenetic_band, dbsnp_rsid}` (GRCh38
    preferred)
-   `conditions` — list of `{name, medgen_cui, omim_id, orphanet_id, hpo_terms}`
-   `functional_consequences` — list of `{value, sequence_ontology_id}`
-   `structural_variant_details` — `{outer_start, inner_start, inner_stop,
    outer_stop, copy_number}` (present only for CNVs, otherwise null)
-   `citation_references` — list of PubMed IDs cited in the global "Citations"
    section
-   `submissions` — list of per-submitter records, each containing:
    -   `submitter_name`, `classification`, `curator_notes`,
        `assertion_criteria`
    -   `date_last_evaluated` — when the submitter last reviewed the
        classification

uv run scripts/clinvar_api.py evidence
--variant_id 12345 --output evidence.json


*参数：*

-   `--variant_id`：（必填）单个ClinVar变异ID。
-   `--output`：（必填）输出JSON文件路径。

*输出：* 一个JSON对象，包含：

-   `variant_id`
-   `allele_info` — `{chromosome, position_start, position_stop, reference_allele, alternate_allele, cytogenetic_band, dbsnp_rsid}`（优先使用GRCh38版本）
-   `conditions` — `{name, medgen_cui, omim_id, orphanet_id, hpo_terms}`列表
-   `functional_consequences` — `{value, sequence_ontology_id}`列表
-   `structural_variant_details` — `{outer_start, inner_start, inner_stop, outer_stop, copy_number}`（仅针对CNV存在，否则为null）
-   `citation_references` — 全局“引用”部分中引用的PubMed ID列表
-   `submissions` — 每个提交者的记录列表，包含：
    -   `submitter_name`、`classification`、`curator_notes`、`assertion_criteria`
    -   `date_last_evaluated` — 提交者上次评审分类的时间

Typical Workflows

典型工作流

Count-First Workflow (Recommended)

先统计后检索工作流（推荐）

For large or unknown result sets, use

count

first to decide whether to proceed, then

search

(which auto-paginates and returns

total_count

fetched_count

), then

summary

to screen.

bash

undefined

对于大型或未知大小的结果集，先使用

count

判断是否继续，然后使用

search

（自动分页并返回

total_count

fetched_count

），再使用

summary

进行筛选。

bash

undefined

Step 1: Gauge size (optional — search also returns total_count)

步骤1：评估结果规模（可选——search也会返回total_count）

uv run scripts/clinvar_api.py count
--query "HBB[gene] AND pathogenic[clinsig]" --output count.json

Step 2: Fetch all variant IDs (auto-paginates)

步骤2：获取所有变异ID（自动分页）

uv run scripts/clinvar_api.py search
--query "HBB[gene] AND pathogenic[clinsig]" --output ids.json

Step 3: Get summaries (extract variant_ids from search output)

步骤3：获取摘要（从search输出中提取variant_ids）

uv run scripts/clinvar_api.py summary
--variant_ids 12345 67890 --output summary.json

undefined

uv run scripts/clinvar_api.py summary
--variant_ids 12345 67890 --output summary.json

undefined

Deep Dive: search → evidence

深度分析：search → evidence

When you need the full clinical picture for a specific variant — including submitter rationales, PubMed citations, ontology-linked conditions, and allele coordinates — use

evidence

bash

uv run scripts/clinvar_api.py evidence \
  --variant_id 12345 --output evidence.json

当需要特定变异的完整临床信息——包括提交者依据、PubMed引用、关联病症本体和等位基因坐标时，使用

evidence

。

bash

uv run scripts/clinvar_api.py evidence \
  --variant_id 12345 --output evidence.json

Workflow: Robust Variant Discovery (Triangulation)

工作流：稳健变异发现（三角验证）

ClinVar metadata is inconsistent. To fulfill "List all" requests, do not rely on a single filter. Perform the following in a single turn and merge results:

Search by exact label (e.g.,

"3 prime UTR variant"[molecular_consequence]

Search by HGVS nomenclature pattern (e.g.,
```
c.*
```
).
Search by genomic coordinate range (using
```
[chrpos]
```
).

This "triangulation" ensures structural variants with missing labels are not overlooked.

ClinVar元数据存在不一致性。为满足“列出全部”请求，不要依赖单一筛选条件。请在单次操作中执行以下步骤并合并结果：

按精确标签搜索（例如：

"3 prime UTR variant"[molecular_consequence]

）。

按HGVS命名模式搜索（例如：
```
c.*
```
）。
按基因组坐标范围搜索（使用
```
[chrpos]
```
）。

这种“三角验证”可确保不会遗漏标签缺失的结构变异。

Verifying Coding vs. Non-Coding Status via HGVS

通过HGVS验证编码/非编码状态

molecular_consequences

alone can be ambiguous (e.g.,

splice donor variant

appears in both coding and non-coding contexts). Always cross-check the

title

field for HGVS patterns:

```
c.-…
```
— 5' UTR (non-coding)
```
c.*…
```
— 3' UTR (non-coding)
```
c.123+N
```
/
```
c.123-N
```
— intronic (non-coding)
```
p.Trp146Arg
```
etc. — protein effect (coding)

A variant with UTR/intronic HGVS and no

p.

annotation is non-coding, even with splicing labels. Conversely, any

p.

annotation indicates a coding effect.

仅

molecular_consequences

可能存在歧义（例如，

剪接供体变异

同时出现在编码和非编码场景中）。请始终交叉检查

title

字段中的HGVS模式：

```
c.-…
```
— 5'非翻译区（非编码）
```
c.*…
```
— 3'非翻译区（非编码）
```
c.123+N
```
/
```
c.123-N
```
— 内含子区（非编码）
```
p.Trp146Arg
```
等 — 蛋白效应（编码）

具有UTR/内含子HGVS且无

p.

注释的变异为非编码变异，即使带有剪接标签。反之，任何

p.

注释均表示存在编码效应。

ClinVar Metadata Reference

ClinVar元数据参考

3' UTR

Search String:
```
"3 prime UTR variant"[mol_consequence]
```
HGVS:
```
c.*
```

5' UTR

Search String:
```
"5 prime UTR variant"[mol_consequence]
```
HGVS:
```
c.-
```

To find "high-confidence" variants or expert-reviewed consensus, use the
```
review_status
```
filter. This is the most efficient way to distinguish between single-laboratory assertions and panel-reviewed ground truth.

3' UTR

搜索字符串：
```
"3 prime UTR variant"[mol_consequence]
```
HGVS：
```
c.*
```

5' UTR

搜索字符串：
```
"5 prime UTR variant"[mol_consequence]
```
HGVS：
```
c.-
```

要查找“高置信度”变异或专家评审共识，请使用
```
review_status
```
筛选器。这是区分单一实验室断言和专家组评审金标准的最有效方式。

When to Use Which Fields

字段使用场景

Quick pathogenicity label — Use
```
summary
```
→
```
clinical_significance
```
Gene symbol and strand — Use
```
summary
```
→
```
genes
```
Variant type (SNV, del, etc.) — Use
```
summary
```
→
```
variation_type
```
Protein-level effect — Use
```
summary
```
→
```
molecular_consequences
```
Genomic coordinates (GRCh38) — Use
```
evidence
```
→
```
allele_info
```
Linked conditions (ontology) — Use
```
evidence
```
→
```
conditions
```
SO functional consequence — Use
```
evidence
```
→
```
functional_consequences
```
CNV breakpoints/copy number — Use
```
evidence
```
→
```
structural_variant_details
```
PubMed references — Use
```
evidence
```
→
```
citation_references
```
Date of last lab review — Use both →
```
last_evaluated
```
Clinician rationales — Use
```
evidence
```
→
```
submissions[].curator_notes
```

快速致病性标签 — 使用
```
summary
```
→
```
clinical_significance
```
基因符号和链信息 — 使用
```
summary
```
→
```
genes
```
变异类型（SNV、缺失等） — 使用
```
summary
```
→
```
variation_type
```
蛋白水平效应 — 使用
```
summary
```
→
```
molecular_consequences
```
基因组坐标（GRCh38） — 使用
```
evidence
```
→
```
allele_info
```
关联病症（本体） — 使用
```
evidence
```
→
```
conditions
```
SO功能注释 — 使用
```
evidence
```
→
```
functional_consequences
```
CNV断点/拷贝数 — 使用
```
evidence
```
→
```
structural_variant_details
```
PubMed参考文献 — 使用
```
evidence
```
→
```
citation_references
```
上次实验室评审日期 — 同时使用
```
summary
```
和
```
evidence
```
中的
```
last_evaluated
```
临床医生依据 — 使用
```
evidence
```
→
```
submissions[].curator_notes
```

Retrieving Genomic Coordinates (Default HG38/GRCh38)

检索基因组坐标（默认HG38/GRCh38）

To get precise genomic coordinates in the format

<chrom>:<pos>:<ref>><alt>

(e.g.,

chr5:70951945:G>A

), you must use the

evidence

command, as these details are not available in the

summary

output.

You MUST always include genomic coordinates in the format
<chrom>:<pos>:<ref>><alt>
when listing or presenting variants, even if not explicitly requested by the user. If coordinates are missing from the summary, use the
evidence
command or dbSNP fallback to retrieve them.

Fetch Evidence: Use

uv run scripts/clinvar_api.py evidence --variant_id <ID> --output evidence.json

Extract VCF Attributes: The

evidence

command parses the XML. Extract:

Chromosome:
```
Chr
```
Position:
```
positionVCF
```
(or
```
start
```
)
Ref:
```
referenceAlleleVCF
```
(or
```
referenceAllele
```
)

Alt:

alternateAlleleVCF

(or

alternateAllele

) from the

SequenceLocation

element with

Assembly="GRCh38"

Fallback for Imprecise Coordinates (Gene Range): ClinVar often returns the full gene range for non-coding variants. If the extracted coordinates correspond to the gene range instead of a specific position, use the

dbsnp-database

skill to resolve the precise coordinates using the

dbsnp_rsid

or HGVS title: 1.Check for

dbsnp_rsid

in the

evidence

output. 2. Run

uv run scripts/dbsnp_cli.py resolve-rsid {rsid}

to get precise GRCh38 coordinates. 3. Format as

<chrom>:<pos>:<ref>><alt>

using the SPDI or HGVS data from dbSNP.

要获取

<chrom>:<pos>:<ref>><alt>

格式的精确基因组坐标（例如：

chr5:70951945:G>A

），必须使用

evidence

命令，因为这些细节在

summary

输出中不可用。

在列出或展示变异时，无论用户是否明确要求，都必须始终包含
<chrom>:<pos>:<ref>><alt>
格式的基因组坐标。如果摘要中缺少坐标，请使用
evidence
命令或dbSNP回退方式检索。

获取证据：使用

uv run scripts/clinvar_api.py evidence --variant_id <ID> --output evidence.json

。

提取VCF属性：
```
evidence
```
命令会解析XML，提取：
- 染色体：
```
Chr
```
- 位置：
```
positionVCF
```
  （或
```
start
```
  ）
- 参考等位基因：
```
referenceAlleleVCF
```
  （或
```
referenceAllele
```
  ）
- 替代等位基因：
```
alternateAlleleVCF
```
  （或
```
alternateAllele
```
  ），来自
```
Assembly="GRCh38"
```
  的
```
SequenceLocation
```
  元素。

坐标不精确时的回退方案（基因范围）：ClinVar通常为非编码变异返回完整基因范围。如果提取的坐标对应基因范围而非特定位置，请使用

dbsnp-database

技能，通过

dbsnp_rsid

或HGVS标题解析精确坐标：1.检查

evidence

输出中的

dbsnp_rsid

；2.运行

uv run scripts/dbsnp_cli.py resolve-rsid {rsid}

获取精确GRCh38坐标；3.使用dbSNP中的SPDI或HGVS数据格式化为

<chrom>:<pos>:<ref>><alt>

。

Structural Variant Note

结构变异说明

The

structural_variant_details

field is only populated for copy number variants (CNVs). For standard SNVs and small indels this field will be

null

. Use the

allele_info

fields (

position_start

position_stop

reference_allele

alternate_allele

) instead.

structural_variant_details

字段仅针对拷贝数变异（CNV）填充。对于标准SNV和小插入缺失，该字段为

null

，请改用

allele_info

字段（

position_start

、

position_stop

、

reference_allele

、

alternate_allele

）。

CNV / Large Deletion Note

CNV/大片段缺失说明

Large copy-number variants (CNVs) frequently have empty

molecular_consequences

. If a variant title mentions "del" and coordinates overlap your target region, it is relevant regardless of missing labels.

大片段拷贝数变异（CNV）的

molecular_consequences

字段经常为空。如果变异标题提及“del”且坐标与目标区域重叠，则无论标签是否缺失，该变异均为相关变异。

Obtaining and Using an API Key

获取和使用API密钥

To increase the rate limit to 10 requests per second, you need to obtain an NCBI API key and add it to the

.env

file. You can obtain a key by following the instructions at NCBI ClinVar API docs

Once you have a key, follow the prerequisite instructions to add it to the

.env

file.

bash

uv run scripts/clinvar_api.py search --query "BRCA1[gene]" --output results.json

If a

RateLimitError

is encountered, follow the prerequisite instructions to help the user add

NCBI_API_KEY

to the

.env

file, providing the NCBI ClinVar API docs URL for instructions on how to obtain one.

要将速率限制提升至10次/秒，需要获取NCBI API密钥并添加到

.env

文件中。可按照NCBI ClinVar API文档中的说明获取密钥。

获取密钥后，请按照前提条件说明将其添加到

.env

文件中。

bash

uv run scripts/clinvar_api.py search --query "BRCA1[gene]" --output results.json

如果遇到

RateLimitError

，请按照前提条件说明帮助用户将

NCBI_API_KEY

添加到

.env

文件中，并提供NCBI ClinVar API文档链接以指导用户获取密钥。

Best Practices

最佳实践

Always use
```
uv run
```
to execute
```
python
```
.
If
```
jq
```
is unavailable pivot immediately to using Python one-liners for processing JSON (e.g.,
```
uv run python3 -c "import json; ..."
```
).
Use
```
count
```
before
```
search
```
to understand the result set size.
The
```
search
```
command fetches all results by default and includes
```
total_count
```
and
```
fetched_count
```
in the output — always verify these match to confirm complete retrieval.
Entrez results are unsorted. To order by date, fetch all results and sort locally by
```
last_evaluated
```
.

始终使用
```
uv run
```
执行Python脚本。
如果
```
jq
```
不可用，请立即改用Python单行代码处理JSON（例如：
```
uv run python3 -c "import json; ..."
```
）。
在执行
```
search
```
前使用
```
count
```
了解结果集大小。
```
search
```
命令默认获取所有结果，并在输出中包含
```
total_count
```
和
```
fetched_count
```
——请始终验证两者是否匹配以确保检索完整。
Entrez结果未排序。如需按日期排序，请获取所有结果后在本地按
```
last_evaluated
```
排序。

Common Mistakes

常见错误

Attempting to parse the E-utilities XML yourself — Always use the provided
```
clinvar_api.py
```
client which handles the unpredictable XML schemas robustly.
Getting HTTP 429 Too Many Requests — The client throws an exception telling you to pause. Follow the prerequisite instructions to help the user add
```
NCBI_API_KEY
```
to the
```
.env
```
file, then retry.
Sending raw DNA sequences to the API — The API expects HGVS nomenclature, RS IDs, or proper Entrez coordinate syntax (
```
11[chr] AND 1234[chrpos]
```
), not raw ATCG strings.
For synonymous or non-coding variants — HGVS nomenclature (e.g., CAPN3 AND "c.551C>T") is more reliable than coordinate searches ([chrpos]), as many ClinVar records for these types lack precise genomic mappings.
Case sensitivity in molecular consequences — ClinVar returns mixed-case strings. Always use case-insensitive matching (
```
.lower()
```
) when filtering.
Parsing
search
output as a bare list —
```
search
```
returns a JSON object with
```
total_count
```
,
```
fetched_count
```
, and
```
variant_ids
```
— not a bare list.

尝试自行解析E-utilities XML — 始终使用提供的
```
clinvar_api.py
```
客户端，它能稳健处理不可预测的XML模式。
收到HTTP 429请求过多错误 — 客户端会抛出异常提示暂停。请按照前提条件说明帮助用户将
```
NCBI_API_KEY
```
添加到
```
.env
```
文件中，然后重试。
向API发送原始DNA序列 — API接受HGVS命名、RS ID或正确的Entrez坐标语法（
```
11[chr] AND 1234[chrpos]
```
），而非原始ATCG字符串。
针对同义或非编码变异 — HGVS命名（例如CAPN3 AND "c.551C>T"）比坐标搜索（[chrpos]）更可靠，因为许多此类ClinVar记录缺少精确的基因组映射。
分子功能注释的大小写敏感性 — ClinVar返回的字符串大小写混合。筛选时请始终使用不区分大小写的匹配（
```
.lower()
```
）。
将
search
输出解析为纯列表 —
```
search
```
返回包含
```
total_count
```
、
```
fetched_count
```
和
```
variant_ids
```
的JSON对象，而非纯列表。

clinvar-database

Original

Translation

ClinVar Database

ClinVar数据库

Prerequisites

前提条件

Overview

概述

When to Use

适用场景

Quick Start

快速开始

Core Rules

核心规则

Utility Scripts

实用脚本

1. count — Count Matching Variants

1. count — 统计匹配变异数量

2. search — Search Variants

2. search — 搜索变异

Fetch ALL matching variants (default behavior)

获取所有匹配变异（默认行为）

Search by Chromosome and Position Range

按染色体和位置范围搜索

Combine terms using Entrez syntax

使用Entrez语法组合条件

Cap results at 50

限制结果为50条

3. summary — Get Interpretation Summary

3. summary — 获取解读摘要

Get summary for one or more Variation IDs

获取一个或多个变异ID的摘要

4. evidence — Get Clinical Evidence

4. evidence — 获取临床证据

Get full evidence for a single Variation ID

获取单个变异ID的完整证据

Typical Workflows

典型工作流

Count-First Workflow (Recommended)

先统计后检索工作流（推荐）

Step 1: Gauge size (optional — search also returns total_count)

步骤1：评估结果规模（可选——search也会返回total_count）

Step 2: Fetch all variant IDs (auto-paginates)

步骤2：获取所有变异ID（自动分页）

Step 3: Get summaries (extract variant_ids from search output)

步骤3：获取摘要（从search输出中提取variant_ids）

Deep Dive: search → evidence

深度分析：search → evidence

Workflow: Robust Variant Discovery (Triangulation)

工作流：稳健变异发现（三角验证）

Verifying Coding vs. Non-Coding Status via HGVS

通过HGVS验证编码/非编码状态

ClinVar Metadata Reference

ClinVar元数据参考

When to Use Which Fields

字段使用场景

Retrieving Genomic Coordinates (Default HG38/GRCh38)

检索基因组坐标（默认HG38/GRCh38）

Structural Variant Note

结构变异说明

CNV / Large Deletion Note

CNV/大片段缺失说明

Obtaining and Using an API Key

获取和使用API密钥

Best Practices

最佳实践

Common Mistakes

常见错误

1.
`count`
— Count Matching Variants

1.
`count`
— 统计匹配变异数量

2.
`search`
— Search Variants

2.
`search`
— 搜索变异

3.
`summary`
— Get Interpretation Summary

3.
`summary`
— 获取解读摘要

4.
`evidence`
— Get Clinical Evidence

4.
`evidence`
— 获取临床证据