clinvar-database
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseClinVar Database
ClinVar数据库
Prerequisites
前提条件
-
: Read the
uvskill and follow its Setup instructions to ensureuvis installed and on PATH.uv -
User Notification: If LICENSE_NOTIFICATION.txt does not already exist in this skill directory then (1) prominently notify the user to check the terms at https://www.ncbi.nlm.nih.gov/clinvar/, then (2) create the file recording the notification text and timestamp.
-
file: Make sure the
.envfile exists in your home directory. Create one if it does not exist..env -
(optional): Raises the NCBI rate limit from 3 to 10 requests/second. The skill works without it, but a key is recommended if the user plans many queries or encounters a 429 error. The user can obtain one for free by registering at https://www.ncbi.nlm.nih.gov/account/settings/. If the variable is missing from
NCBI_API_KEY, do NOT ask the user to paste it into the chat (this would leak the key into the agent's context). Instead, give the user this command — substituting.envwith the resolved literal path to theENV_FILEfile:.envbashprintf "Enter NCBI API key (typing hidden): " && read -s key && echo && echo "NCBI_API_KEY=$key" >> "ENV_FILE" && echo "Saved."The scripts load credentials automatically via. NEVER read, print, or inspect thedotenvfile or its variables (e.g. no.env,cat,grep,echo, orprintenvon keys). Credentials must stay out of the agent's context. See the API Key section for more details.os.environ.get
-
:阅读
uv技能文档并遵循其设置说明,确保uv已安装且在PATH路径中。uv -
用户通知:如果此技能目录中不存在LICENSE_NOTIFICATION.txt文件,则(1) 显著通知用户查看https://www.ncbi.nlm.nih.gov/clinvar/上的条款,然后(2) 创建该文件并记录通知文本和时间戳。
-
文件:确保主目录中存在
.env文件,若不存在则创建一个。.env -
(可选):将NCBI的请求速率限制从3次/秒提升至10次/秒。即使没有该密钥,技能也能正常工作,但如果用户计划进行大量查询或遇到429错误,建议使用密钥。用户可通过在https://www.ncbi.nlm.nih.gov/account/settings/注册免费获取。如果`.env`文件中缺少该变量,请勿让用户在聊天中粘贴密钥(这会导致密钥泄露到Agent的上下文中)。相反,请向用户提供以下命令——**将`ENV_FILE`替换为`.env`文件的实际路径**:
NCBI_API_KEYbashprintf "Enter NCBI API key (typing hidden): " && read -s key && echo && echo "NCBI_API_KEY=$key" >> "ENV_FILE" && echo "Saved."脚本会通过自动加载凭据。绝对不要读取、打印或检查dotenv文件或其中的变量(例如,对密钥使用.env、cat、grep、echo或printenv)。凭据必须远离Agent的上下文。更多详情请参见API密钥章节。os.environ.get
Overview
概述
ClinVar is the primary consensus record for clinical classifications of human
genomic variations. It provides the "clinical ground truth" for pathogenicity
labels (Pathogenic, Likely Pathogenic, Benign, VUS) based on assertions from
global laboratories.
ClinVar是人类基因组变异临床分类的主要共识记录库。它基于全球实验室的断言,提供致病性标签(致病性、可能致病性、良性、意义未明VUS)的“临床金标准”。
When to Use
适用场景
Use when you need to:
- Find the current clinical significance and star rating (review status) for a specific variant.
- Fetch clinician notes, assertion criteria, or rationales for previous clinical laboratory classifications.
- Retrieve the preferred condition name and associated HPO terms for a specific variant.
- Find a list of variant controls (e.g., "Find all Pathogenic variants in the HBB gene within 50bp of a signal").
- Check for conflicting interpretations for a given variant and identify the organizations submitting each classification.
Do NOT use when you need to:
- Find specific allele frequencies in global populations (use gnomAD).
- Describe the normal biological role of a protein and typical inheritance patterns (use OMIM).
- Predict mechanistic effects of novel mutations, like frameshifts or exon skipping (use AlphaGenome).
- Find recommended surveillance schedules for patients with a pathogenic variant (use GeneReviews).
- Generate or view 3D structural models of affected proteins (use PDB / AlphaFold).
适用于以下需求:
- 查询特定变异的当前临床意义和星级评分(评审状态)。
- 获取临床医生注释、断言标准或既往临床实验室分类的依据。
- 检索特定变异对应的首选病症名称及相关HPO术语。
- 查找变异对照列表(例如:“查找HBB基因中信号位点50bp范围内的所有致病性变异”)。
- 检查给定变异的冲突解读,并识别提交各分类结果的机构。
不适用于以下需求:
- 查找全球人群中的特定等位基因频率(请使用gnomAD)。
- 描述蛋白质的正常生物学作用和典型遗传模式(请使用OMIM)。
- 预测新突变的机制效应,如移码突变或外显子跳跃(请使用AlphaGenome)。
- 查找致病性变异患者的推荐监测方案(请使用GeneReviews)。
- 生成或查看受影响蛋白质的3D结构模型(请使用PDB / AlphaFold)。
Quick Start
快速开始
ClinVar queries are executed via a robust Python wrapper script to handle strict
rate limiting and XML/JSON parsing.
Example: Search for BRCA1 variants
bash
uv run scripts/clinvar_api.py search --query "BRCA1[gene]" --output results.jsonClinVar查询通过一个健壮的Python包装脚本执行,以处理严格的速率限制和XML/JSON解析。
示例:搜索BRCA1变异
bash
uv run scripts/clinvar_api.py search --query "BRCA1[gene]" --output results.jsonCore Rules
核心规则
- Retmax Constraint: The search command defaults to . For any "List all" or gene-wide request, you MUST explicitly set
--retmax 200higher (e.g., 1000) to ensure data completeness.--retmax - Use the Wrapper: Prefer the wrapper script for standard queries. It handles rate limiting, retries, and the complex XML parsing for you. If the script's parsed output does not contain the specific fields you need, you may modify the script or query the NCBI E-utilities API directly — but be aware that the raw XML schemas are complex and vary between record types.
- If the rate limit is hit, the script will throw a clear error. Follow the
prerequisite instructions above to help the user add to the
NCBI_API_KEYfile..env - Notification: If this skill is used, ensure this is mentioned in the output.
- Retmax限制:搜索命令默认。对于任何“列出全部”或全基因范围的请求,必须显式设置更高的
--retmax 200值(例如1000)以确保数据完整性。--retmax - 使用包装脚本:标准查询优先使用包装脚本,它会为你处理速率限制、重试和复杂的XML解析。如果脚本的解析输出不包含你需要的特定字段,你可以修改脚本或直接查询NCBI E-utilities API——但请注意,原始XML模式复杂且因记录类型而异。
- 如果触发速率限制,脚本会抛出明确的错误。请按照上述前提条件说明帮助用户将添加到
NCBI_API_KEY文件中。.env - 通知要求:如果使用此技能,请确保在输出中提及这一点。
Utility Scripts
实用脚本
1. count
— Count Matching Variants
count1. count
— 统计匹配变异数量
countPurpose: Check how many variants match a query without fetching IDs. Use to
decide whether a full is warranted.
searchArguments:
- : (Required) NCBI Entrez search query string.
--query - : (Required) Output JSON file path.
--output
Example: Output:
uv run scripts/clinvar_api.py count \ --query "TP53[gene] AND \"uncertain significance\"[clinsig]" \ --output count.json{"total_count": <int>}用途:在不获取ID的情况下检查有多少变异匹配查询,用于判断是否需要执行完整的操作。
search参数:
- :(必填)NCBI Entrez搜索查询字符串。
--query - :(必填)输出JSON文件路径。
--output
示例: 输出:
uv run scripts/clinvar_api.py count \ --query "TP53[gene] AND \"uncertain significance\"[clinsig]" \ --output count.json{"total_count": <int>}2. search
— Search Variants
search2. search
— 搜索变异
searchPurpose: Identify variants based on genomic location, gene symbols, or
clinical attributes using NCBI Entrez search syntax. The search command
automatically paginates through all matching results to ensure complete,
deterministic retrieval.
bash
undefined用途:使用NCBI Entrez搜索语法,基于基因组位置、基因符号或临床属性识别变异。搜索命令会自动分页遍历所有匹配结果,以确保完整、确定性地检索数据。
bash
undefinedFetch ALL matching variants (default behavior)
获取所有匹配变异(默认行为)
uv run scripts/clinvar_api.py search
--query "BRCA1[gene]" --output results.json
--query "BRCA1[gene]" --output results.json
uv run scripts/clinvar_api.py search
--query "BRCA1[gene]" --output results.json
--query "BRCA1[gene]" --output results.json
Search by Chromosome and Position Range
按染色体和位置范围搜索
uv run scripts/clinvar_api.py search
--query "11[chr] AND 5225000:5226000[chrpos]" --output results.json
--query "11[chr] AND 5225000:5226000[chrpos]" --output results.json
uv run scripts/clinvar_api.py search
--query "11[chr] AND 5225000:5226000[chrpos]" --output results.json
--query "11[chr] AND 5225000:5226000[chrpos]" --output results.json
Combine terms using Entrez syntax
使用Entrez语法组合条件
uv run scripts/clinvar_api.py search
--query "HBB[gene] AND pathogenic[clinsig]" --output results.json
--query "HBB[gene] AND pathogenic[clinsig]" --output results.json
uv run scripts/clinvar_api.py search
--query "HBB[gene] AND pathogenic[clinsig]" --output results.json
--query "HBB[gene] AND pathogenic[clinsig]" --output results.json
Cap results at 50
限制结果为50条
uv run scripts/clinvar_api.py search
--query "TP53[gene]" --retmax 50 --output results.json
--query "TP53[gene]" --retmax 50 --output results.json
*Arguments:*
- `--query`: (Required) NCBI Entrez search query string.
- `--retmax`: Maximum total number of variant IDs to return. **Default is 0,
which means "fetch all matching results."** Set to a positive integer to cap
the result set.
- `--page_size`: Number of IDs to fetch per API request (default: 500, max:
10000 per NCBI limits).
- `--output`: (Required) Output JSON file path.
*Output:* A JSON object containing:
- `total_count` — Total number of matching variants in ClinVar.
- `fetched_count` — Number of IDs actually retrieved.
- `variant_ids` — List of ClinVar Variation ID strings.uv run scripts/clinvar_api.py search
--query "TP53[gene]" --retmax 50 --output results.json
--query "TP53[gene]" --retmax 50 --output results.json
*参数:*
- `--query`:(必填)NCBI Entrez搜索查询字符串。
- `--retmax`:返回的变异ID最大总数。**默认值为0,表示“获取所有匹配结果”**。设置为正整数可限制结果集大小。
- `--page_size`:每个API请求获取的ID数量(默认:500,上限:NCBI限制的10000)。
- `--output`:(必填)输出JSON文件路径。
*输出:* 一个JSON对象,包含:
- `total_count` — ClinVar中匹配变异的总数。
- `fetched_count` — 实际检索到的ID数量。
- `variant_ids` — ClinVar变异ID字符串列表。3. summary
— Get Interpretation Summary
summary3. summary
— 获取解读摘要
summaryPurpose: Retrieve top-line clinical significance labels, star ratings
(review status), and basic phenotype data for rapid variant screening.
bash
undefined用途:检索核心临床意义标签、星级评分(评审状态)和基本表型数据,用于快速变异筛选。
bash
undefinedGet summary for one or more Variation IDs
获取一个或多个变异ID的摘要
uv run scripts/clinvar_api.py summary
--variant_ids 12345 67890 --output summary.json
--variant_ids 12345 67890 --output summary.json
*Arguments:*
- `--variant_ids`: (Required) One or more ClinVar Variation IDs.
- `--output`: (Required) Output JSON file path.
*Output:* A JSON list of summary objects, each containing:
- `variant_id`, `title`, `clinical_significance`, `review_status`, \
`last_evaluated`, `phenotypes`
- `genes` — list of `{gene_id, symbol, strand}`
- `variation_type` — e.g., single nucleotide variant, Deletion, Insertion
- `molecular_consequences` — list of strings (e.g., ["missense variant", \
"nonsense"])uv run scripts/clinvar_api.py summary
--variant_ids 12345 67890 --output summary.json
--variant_ids 12345 67890 --output summary.json
*参数:*
- `--variant_ids`:(必填)一个或多个ClinVar变异ID。
- `--output`:(必填)输出JSON文件路径。
*输出:* 一个JSON摘要对象列表,每个对象包含:
- `variant_id`、`title`、`clinical_significance`、`review_status`、`last_evaluated`、`phenotypes`
- `genes` — `{gene_id, symbol, strand}`列表
- `variation_type` — 例如,单核苷酸变异、缺失、插入
- `molecular_consequences` — 字符串列表(例如,["错义变异", "无义变异"])4. evidence
— Get Clinical Evidence
evidence4. evidence
— 获取临床证据
evidencePurpose: Fetch the full clinical record for a single variant, including
free-text clinician rationales, assertion methods, and specific submitter notes.
bash
undefined用途:获取单个变异的完整临床记录,包括自由文本形式的临床医生依据、断言方法和特定提交者注释。
bash
undefinedGet full evidence for a single Variation ID
获取单个变异ID的完整证据
uv run scripts/clinvar_api.py evidence
--variant_id 12345 --output evidence.json
--variant_id 12345 --output evidence.json
*Arguments:*
- `--variant_id`: (Required) A single ClinVar Variation ID.
- `--output`: (Required) Output JSON file path.
*Output:* A JSON object containing:
- `variant_id`
- `allele_info` — `{chromosome, position_start, position_stop,
reference_allele, alternate_allele, cytogenetic_band, dbsnp_rsid}` (GRCh38
preferred)
- `conditions` — list of `{name, medgen_cui, omim_id, orphanet_id, hpo_terms}`
- `functional_consequences` — list of `{value, sequence_ontology_id}`
- `structural_variant_details` — `{outer_start, inner_start, inner_stop,
outer_stop, copy_number}` (present only for CNVs, otherwise null)
- `citation_references` — list of PubMed IDs cited in the global "Citations"
section
- `submissions` — list of per-submitter records, each containing:
- `submitter_name`, `classification`, `curator_notes`,
`assertion_criteria`
- `date_last_evaluated` — when the submitter last reviewed the
classificationuv run scripts/clinvar_api.py evidence
--variant_id 12345 --output evidence.json
--variant_id 12345 --output evidence.json
*参数:*
- `--variant_id`:(必填)单个ClinVar变异ID。
- `--output`:(必填)输出JSON文件路径。
*输出:* 一个JSON对象,包含:
- `variant_id`
- `allele_info` — `{chromosome, position_start, position_stop, reference_allele, alternate_allele, cytogenetic_band, dbsnp_rsid}`(优先使用GRCh38版本)
- `conditions` — `{name, medgen_cui, omim_id, orphanet_id, hpo_terms}`列表
- `functional_consequences` — `{value, sequence_ontology_id}`列表
- `structural_variant_details` — `{outer_start, inner_start, inner_stop, outer_stop, copy_number}`(仅针对CNV存在,否则为null)
- `citation_references` — 全局“引用”部分中引用的PubMed ID列表
- `submissions` — 每个提交者的记录列表,包含:
- `submitter_name`、`classification`、`curator_notes`、`assertion_criteria`
- `date_last_evaluated` — 提交者上次评审分类的时间Typical Workflows
典型工作流
Count-First Workflow (Recommended)
先统计后检索工作流(推荐)
For large or unknown result sets, use first to decide whether to
proceed, then (which auto-paginates and returns /
), then to screen.
countsearchtotal_countfetched_countsummarybash
undefined对于大型或未知大小的结果集,先使用判断是否继续,然后使用(自动分页并返回/),再使用进行筛选。
countsearchtotal_countfetched_countsummarybash
undefinedStep 1: Gauge size (optional — search also returns total_count)
步骤1:评估结果规模(可选——search也会返回total_count)
uv run scripts/clinvar_api.py count
--query "HBB[gene] AND pathogenic[clinsig]" --output count.json
--query "HBB[gene] AND pathogenic[clinsig]" --output count.json
uv run scripts/clinvar_api.py count
--query "HBB[gene] AND pathogenic[clinsig]" --output count.json
--query "HBB[gene] AND pathogenic[clinsig]" --output count.json
Step 2: Fetch all variant IDs (auto-paginates)
步骤2:获取所有变异ID(自动分页)
uv run scripts/clinvar_api.py search
--query "HBB[gene] AND pathogenic[clinsig]" --output ids.json
--query "HBB[gene] AND pathogenic[clinsig]" --output ids.json
uv run scripts/clinvar_api.py search
--query "HBB[gene] AND pathogenic[clinsig]" --output ids.json
--query "HBB[gene] AND pathogenic[clinsig]" --output ids.json
Step 3: Get summaries (extract variant_ids from search output)
步骤3:获取摘要(从search输出中提取variant_ids)
uv run scripts/clinvar_api.py summary
--variant_ids 12345 67890 --output summary.json
--variant_ids 12345 67890 --output summary.json
undefineduv run scripts/clinvar_api.py summary
--variant_ids 12345 67890 --output summary.json
--variant_ids 12345 67890 --output summary.json
undefinedDeep Dive: search → evidence
深度分析:search → evidence
When you need the full clinical picture for a specific variant — including
submitter rationales, PubMed citations, ontology-linked conditions, and allele
coordinates — use .
evidencebash
uv run scripts/clinvar_api.py evidence \
--variant_id 12345 --output evidence.json当需要特定变异的完整临床信息——包括提交者依据、PubMed引用、关联病症本体和等位基因坐标时,使用。
evidencebash
uv run scripts/clinvar_api.py evidence \
--variant_id 12345 --output evidence.jsonWorkflow: Robust Variant Discovery (Triangulation)
工作流:稳健变异发现(三角验证)
ClinVar metadata is inconsistent. To fulfill "List all" requests, do not rely on
a single filter. Perform the following in a single turn and merge results:
- Search by exact label (e.g., ).
"3 prime UTR variant"[molecular_consequence] - Search by HGVS nomenclature pattern (e.g., ).
c.* - Search by genomic coordinate range (using ).
[chrpos]
This "triangulation" ensures structural variants with missing labels are not
overlooked.
ClinVar元数据存在不一致性。为满足“列出全部”请求,不要依赖单一筛选条件。请在单次操作中执行以下步骤并合并结果:
- 按精确标签搜索(例如:)。
"3 prime UTR variant"[molecular_consequence] - 按HGVS命名模式搜索(例如:)。
c.* - 按基因组坐标范围搜索(使用)。
[chrpos]
这种“三角验证”可确保不会遗漏标签缺失的结构变异。
Verifying Coding vs. Non-Coding Status via HGVS
通过HGVS验证编码/非编码状态
molecular_consequencessplice donor varianttitle- — 5' UTR (non-coding)
c.-… - — 3' UTR (non-coding)
c.*… - /
c.123+N— intronic (non-coding)c.123-N - etc. — protein effect (coding)
p.Trp146Arg
A variant with UTR/intronic HGVS and no annotation is non-coding, even with
splicing labels. Conversely, any annotation indicates a coding effect.
p.p.仅可能存在歧义(例如,同时出现在编码和非编码场景中)。请始终交叉检查字段中的HGVS模式:
molecular_consequences剪接供体变异title- — 5'非翻译区(非编码)
c.-… - — 3'非翻译区(非编码)
c.*… - /
c.123+N— 内含子区(非编码)c.123-N - 等 — 蛋白效应(编码)
p.Trp146Arg
具有UTR/内含子HGVS且无注释的变异为非编码变异,即使带有剪接标签。反之,任何注释均表示存在编码效应。
p.p.ClinVar Metadata Reference
ClinVar元数据参考
- 3' UTR
- Search String:
"3 prime UTR variant"[mol_consequence] - HGVS:
c.*
- Search String:
- 5' UTR
- Search String:
"5 prime UTR variant"[mol_consequence] - HGVS:
c.-
- Search String:
- To find "high-confidence" variants or expert-reviewed consensus, use the
filter. This is the most efficient way to distinguish between single-laboratory assertions and panel-reviewed ground truth.
review_status
- 3' UTR
- 搜索字符串:
"3 prime UTR variant"[mol_consequence] - HGVS:
c.*
- 搜索字符串:
- 5' UTR
- 搜索字符串:
"5 prime UTR variant"[mol_consequence] - HGVS:
c.-
- 搜索字符串:
- 要查找“高置信度”变异或专家评审共识,请使用筛选器。这是区分单一实验室断言和专家组评审金标准的最有效方式。
review_status
When to Use Which Fields
字段使用场景
- Quick pathogenicity label — Use →
summaryclinical_significance - Gene symbol and strand — Use →
summarygenes - Variant type (SNV, del, etc.) — Use →
summaryvariation_type - Protein-level effect — Use →
summarymolecular_consequences - Genomic coordinates (GRCh38) — Use →
evidenceallele_info - Linked conditions (ontology) — Use →
evidenceconditions - SO functional consequence — Use →
evidencefunctional_consequences - CNV breakpoints/copy number — Use →
evidencestructural_variant_details - PubMed references — Use →
evidencecitation_references - Date of last lab review — Use both →
last_evaluated - Clinician rationales — Use →
evidencesubmissions[].curator_notes
- 快速致病性标签 — 使用→
summaryclinical_significance - 基因符号和链信息 — 使用→
summarygenes - 变异类型(SNV、缺失等) — 使用→
summaryvariation_type - 蛋白水平效应 — 使用→
summarymolecular_consequences - 基因组坐标(GRCh38) — 使用→
evidenceallele_info - 关联病症(本体) — 使用→
evidenceconditions - SO功能注释 — 使用→
evidencefunctional_consequences - CNV断点/拷贝数 — 使用→
evidencestructural_variant_details - PubMed参考文献 — 使用→
evidencecitation_references - 上次实验室评审日期 — 同时使用和
summary中的evidencelast_evaluated - 临床医生依据 — 使用→
evidencesubmissions[].curator_notes
Retrieving Genomic Coordinates (Default HG38/GRCh38)
检索基因组坐标(默认HG38/GRCh38)
To get precise genomic coordinates in the format
(e.g., ), you must use the command, as these
details are not available in the output.
<chrom>:<pos>:<ref>><alt>chr5:70951945:G>AevidencesummaryYou MUST always include genomic coordinates in the format
when listing or presenting variants, even if not
explicitly requested by the user. If coordinates are missing from the summary,
use the command or dbSNP fallback to retrieve them.
<chrom>:<pos>:<ref>><alt>evidence- Fetch Evidence: Use .
uv run scripts/clinvar_api.py evidence --variant_id <ID> --output evidence.json - Extract VCF Attributes: The command parses the XML. Extract:
evidence- Chromosome:
Chr - Position: (or
positionVCF)start - Ref: (or
referenceAlleleVCF)referenceAllele - Alt: (or
alternateAlleleVCF) from thealternateAlleleelement withSequenceLocation.Assembly="GRCh38"
- Chromosome:
Fallback for Imprecise Coordinates (Gene Range): ClinVar often returns the
full gene range for non-coding variants. If the extracted coordinates correspond
to the gene range instead of a specific position, use the skill
to resolve the precise coordinates using the or HGVS title: 1.Check
for in the output. 2. Run to get precise GRCh38 coordinates. 3. Format as
using the SPDI or HGVS data from dbSNP.
dbsnp-databasedbsnp_rsiddbsnp_rsidevidenceuv run scripts/dbsnp_cli.py resolve-rsid {rsid}<chrom>:<pos>:<ref>><alt>要获取格式的精确基因组坐标(例如:),必须使用命令,因为这些细节在输出中不可用。
<chrom>:<pos>:<ref>><alt>chr5:70951945:G>Aevidencesummary在列出或展示变异时,无论用户是否明确要求,都必须始终包含格式的基因组坐标。如果摘要中缺少坐标,请使用命令或dbSNP回退方式检索。
<chrom>:<pos>:<ref>><alt>evidence- 获取证据:使用。
uv run scripts/clinvar_api.py evidence --variant_id <ID> --output evidence.json - 提取VCF属性:命令会解析XML,提取:
evidence- 染色体:
Chr - 位置:(或
positionVCF)start - 参考等位基因:(或
referenceAlleleVCF)referenceAllele - 替代等位基因:(或
alternateAlleleVCF),来自alternateAllele的Assembly="GRCh38"元素。SequenceLocation
- 染色体:
坐标不精确时的回退方案(基因范围):ClinVar通常为非编码变异返回完整基因范围。如果提取的坐标对应基因范围而非特定位置,请使用技能,通过或HGVS标题解析精确坐标:1.检查输出中的;2.运行获取精确GRCh38坐标;3.使用dbSNP中的SPDI或HGVS数据格式化为。
dbsnp-databasedbsnp_rsidevidencedbsnp_rsiduv run scripts/dbsnp_cli.py resolve-rsid {rsid}<chrom>:<pos>:<ref>><alt>Structural Variant Note
结构变异说明
The field is only populated for copy number
variants (CNVs). For standard SNVs and small indels this field will be .
Use the fields (, ,
, ) instead.
structural_variant_detailsnullallele_infoposition_startposition_stopreference_allelealternate_allelestructural_variant_detailsnullallele_infoposition_startposition_stopreference_allelealternate_alleleCNV / Large Deletion Note
CNV/大片段缺失说明
Large copy-number variants (CNVs) frequently have empty
. If a variant title mentions "del" and coordinates
overlap your target region, it is relevant regardless of missing labels.
molecular_consequences大片段拷贝数变异(CNV)的字段经常为空。如果变异标题提及“del”且坐标与目标区域重叠,则无论标签是否缺失,该变异均为相关变异。
molecular_consequencesObtaining and Using an API Key
获取和使用API密钥
To increase the rate limit to 10 requests per second, you need to obtain an NCBI
API key and add it to the file. You can obtain a key by following the
instructions at NCBI ClinVar API docs
.envOnce you have a key, follow the prerequisite instructions to add it to the
file.
.envbash
uv run scripts/clinvar_api.py search --query "BRCA1[gene]" --output results.jsonIf a is encountered, follow the prerequisite instructions to
help the user add to the file, providing the
NCBI ClinVar API docs URL for instructions on how to obtain one.
RateLimitErrorNCBI_API_KEY.env要将速率限制提升至10次/秒,需要获取NCBI API密钥并添加到文件中。可按照NCBI ClinVar API文档中的说明获取密钥。
.env获取密钥后,请按照前提条件说明将其添加到文件中。
.envbash
uv run scripts/clinvar_api.py search --query "BRCA1[gene]" --output results.jsonBest Practices
最佳实践
- Always use to execute
uv run.python - If is unavailable pivot immediately to using Python one-liners for processing JSON (e.g.,
jq).uv run python3 -c "import json; ..." - Use before
countto understand the result set size.search - The command fetches all results by default and includes
searchandtotal_countin the output — always verify these match to confirm complete retrieval.fetched_count - Entrez results are unsorted. To order by date, fetch all results and
sort locally by .
last_evaluated
- 始终使用执行Python脚本。
uv run - 如果不可用,请立即改用Python单行代码处理JSON(例如:
jq)。uv run python3 -c "import json; ..." - 在执行前使用
search了解结果集大小。count - 命令默认获取所有结果,并在输出中包含
search和total_count——请始终验证两者是否匹配以确保检索完整。fetched_count - Entrez结果未排序。如需按日期排序,请获取所有结果后在本地按排序。
last_evaluated
Common Mistakes
常见错误
- Attempting to parse the E-utilities XML yourself — Always use the
provided client which handles the unpredictable XML schemas robustly.
clinvar_api.py - Getting HTTP 429 Too Many Requests — The client throws an exception
telling you to pause. Follow the prerequisite instructions to help the user
add to the
NCBI_API_KEYfile, then retry..env - Sending raw DNA sequences to the API — The API expects HGVS
nomenclature, RS IDs, or proper Entrez coordinate syntax (), not raw ATCG strings.
11[chr] AND 1234[chrpos] - For synonymous or non-coding variants — HGVS nomenclature (e.g., CAPN3 AND "c.551C>T") is more reliable than coordinate searches ([chrpos]), as many ClinVar records for these types lack precise genomic mappings.
- Case sensitivity in molecular consequences — ClinVar returns mixed-case
strings. Always use case-insensitive matching () when filtering.
.lower() - Parsing output as a bare list —
searchreturns a JSON object withsearch,total_count, andfetched_count— not a bare list.variant_ids
- 尝试自行解析E-utilities XML — 始终使用提供的客户端,它能稳健处理不可预测的XML模式。
clinvar_api.py - 收到HTTP 429请求过多错误 — 客户端会抛出异常提示暂停。请按照前提条件说明帮助用户将添加到
NCBI_API_KEY文件中,然后重试。.env - 向API发送原始DNA序列 — API接受HGVS命名、RS ID或正确的Entrez坐标语法(),而非原始ATCG字符串。
11[chr] AND 1234[chrpos] - 针对同义或非编码变异 — HGVS命名(例如CAPN3 AND "c.551C>T")比坐标搜索([chrpos])更可靠,因为许多此类ClinVar记录缺少精确的基因组映射。
- 分子功能注释的大小写敏感性 — ClinVar返回的字符串大小写混合。筛选时请始终使用不区分大小写的匹配()。
.lower() - 将输出解析为纯列表 —
search返回包含search、total_count和fetched_count的JSON对象,而非纯列表。variant_ids