clinvar-database
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseClinVar Database
ClinVar数据库
Overview
概述
ClinVar is NCBI's freely accessible archive of reports on relationships between human genetic variants and phenotypes, with supporting evidence. The database aggregates information about genomic variation and its relationship to human health, providing standardized variant classifications used in clinical genetics and research.
ClinVar是NCBI旗下可免费访问的数据库,收录了人类基因变异与表型之间关联的报告及相关支持证据。该数据库整合了基因组变异及其与人类健康关系的信息,提供临床遗传学和研究中使用的标准化变异分类。
When to Use This Skill
何时使用该技能
This skill should be used when:
- Searching for variants by gene, condition, or clinical significance
- Interpreting clinical significance classifications (pathogenic, benign, VUS)
- Accessing ClinVar data programmatically via E-utilities API
- Downloading and processing bulk data from FTP
- Understanding review status and star ratings
- Resolving conflicting variant interpretations
- Annotating variant call sets with clinical significance
在以下场景中应使用本技能:
- 按基因、病症或临床意义搜索变异
- 解读临床意义分类(致病性、良性、意义未明VUS)
- 通过E-utilities API以编程方式访问ClinVar数据
- 从FTP下载并处理批量数据
- 了解评审状态和星级评分
- 解决变异解读的冲突
- 为变异调用集添加临床意义注释
Core Capabilities
核心功能
1. Search and Query ClinVar
1. 搜索和查询ClinVar
Web Interface Queries
网页界面查询
Search ClinVar using the web interface at https://www.ncbi.nlm.nih.gov/clinvar/
Common search patterns:
- By gene:
BRCA1[gene] - By clinical significance:
pathogenic[CLNSIG] - By condition:
breast cancer[disorder] - By variant:
NM_000059.3:c.1310_1313del[variant name] - By chromosome:
13[chr] - Combined:
BRCA1[gene] AND pathogenic[CLNSIG]
常见搜索模式:
- 按基因:
BRCA1[gene] - 按临床意义:
pathogenic[CLNSIG] - 按病症:
breast cancer[disorder] - 按变异:
NM_000059.3:c.1310_1313del[variant name] - 按染色体:
13[chr] - 组合搜索:
BRCA1[gene] AND pathogenic[CLNSIG]
Programmatic Access via E-utilities
通过E-utilities编程访问
Access ClinVar programmatically using NCBI's E-utilities API. Refer to for comprehensive API documentation including:
references/api_reference.md- esearch - Search for variants matching criteria
- esummary - Retrieve variant summaries
- efetch - Download full XML records
- elink - Find related records in other NCBI databases
Quick example using curl:
bash
undefined使用NCBI的E-utilities API以编程方式访问ClinVar。参考获取完整的API文档,包括:
references/api_reference.md- esearch - 搜索符合条件的变异
- esummary - 获取变异摘要
- efetch - 下载完整XML记录
- elink - 查找其他NCBI数据库中的相关记录
使用curl的快速示例:
bash
undefinedSearch for pathogenic BRCA1 variants
Search for pathogenic BRCA1 variants
**Best practices:**
- Test queries on the web interface before automating
- Use API keys to increase rate limits from 3 to 10 requests/second
- Implement exponential backoff for rate limit errors
- Set `Entrez.email` when using Biopython
**最佳实践:**
- 在自动化前先在网页界面测试查询语句
- 使用API密钥将请求速率限制从3次/秒提升至10次/秒
- 针对速率限制错误实现指数退避机制
- 使用Biopython时设置`Entrez.email`2. Interpret Clinical Significance
2. 解读临床意义
Understanding Classifications
理解分类术语
ClinVar uses standardized terminology for variant classifications. Refer to for detailed interpretation guidelines.
references/clinical_significance.mdKey germline classification terms (ACMG/AMP):
- Pathogenic (P) - Variant causes disease (~99% probability)
- Likely Pathogenic (LP) - Variant likely causes disease (~90% probability)
- Uncertain Significance (VUS) - Insufficient evidence to classify
- Likely Benign (LB) - Variant likely does not cause disease
- Benign (B) - Variant does not cause disease
Review status (star ratings):
- ★★★★ Practice guideline - Highest confidence
- ★★★ Expert panel review (e.g., ClinGen) - High confidence
- ★★ Multiple submitters, no conflicts - Moderate confidence
- ★ Single submitter with criteria - Standard weight
- ☆ No assertion criteria - Low confidence
Critical considerations:
- Always check review status - prefer ★★★ or ★★★★ ratings
- Conflicting interpretations require manual evaluation
- Classifications may change as new evidence emerges
- VUS (uncertain significance) variants lack sufficient evidence for clinical use
ClinVar使用标准化术语进行变异分类。参考获取详细的解读指南。
references/clinical_significance.md关键种系分类术语(ACMG/AMP):
- 致病性(P) - 变异会引发疾病(约99%的概率)
- 可能致病性(LP) - 变异很可能引发疾病(约90%的概率)
- 意义未明(VUS) - 缺乏足够证据进行分类
- 可能良性(LB) - 变异很可能不会引发疾病
- 良性(B) - 变异不会引发疾病
评审状态(星级评分):
- ★★★★ 实践指南 - 最高可信度
- ★★★ 专家小组评审(如ClinGen) - 高可信度
- ★★ 多个提交者,无冲突 - 中等可信度
- ★ 单一提交者且符合标准 - 常规权重
- ☆ 无断言标准 - 低可信度
关键注意事项:
- 始终检查评审状态 - 优先选择★★★或★★★★评分的结果
- 存在冲突的解读需要人工评估
- 随着新证据出现,分类可能会发生变化
- VUS(意义未明)变异缺乏足够的临床应用证据
3. Download Bulk Data from FTP
3. 从FTP下载批量数据
Access ClinVar FTP Site
访问ClinVar FTP站点
Download complete datasets from
ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/Refer to for comprehensive documentation on file formats and processing.
references/data_formats.mdUpdate schedule:
- Monthly releases: First Thursday of each month (complete dataset, archived)
- Weekly updates: Every Monday (incremental updates)
从下载完整数据集
ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/参考获取文件格式和处理的完整文档。
references/data_formats.md更新时间表:
- 月度发布:每月第一个周四(完整数据集,存档保存)
- 每周更新:每周一(增量更新)
Available Formats
可用格式
XML files (most comprehensive):
- VCV (Variation) files: - Variant-centric aggregation
xml/clinvar_variation/ - RCV (Record) files: - Variant-condition pairs
xml/RCV/ - Include full submission details, evidence, and metadata
VCF files (for genomic pipelines):
- GRCh37:
vcf_GRCh37/clinvar.vcf.gz - GRCh38:
vcf_GRCh38/clinvar.vcf.gz - Limitations: Excludes variants >10kb and complex structural variants
Tab-delimited files (for quick analysis):
- - Summary of all variants
tab_delimited/variant_summary.txt.gz - - PubMed citations
tab_delimited/var_citations.txt.gz - - Database cross-references
tab_delimited/cross_references.txt.gz
Example download:
bash
undefinedXML文件(最全面):
- VCV(变异)文件:- 以变异为中心的聚合数据
xml/clinvar_variation/ - RCV(记录)文件:- 变异-病症配对数据
xml/RCV/ - 包含完整的提交详情、证据和元数据
VCF文件(适用于基因组流程):
- GRCh37:
vcf_GRCh37/clinvar.vcf.gz - GRCh38:
vcf_GRCh38/clinvar.vcf.gz - 局限性:排除长度>10kb的变异和复杂结构变异
制表符分隔文件(适用于快速分析):
- - 所有变异的摘要
tab_delimited/variant_summary.txt.gz - - PubMed引用文献
tab_delimited/var_citations.txt.gz - - 数据库交叉引用
tab_delimited/cross_references.txt.gz
下载示例:
bash
undefinedDownload latest monthly XML release
Download latest monthly XML release
wget ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/clinvar_variation/ClinVarVariationRelease_00-latest.xml.gz
wget ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/clinvar_variation/ClinVarVariationRelease_00-latest.xml.gz
Download VCF for GRCh38
Download VCF for GRCh38
wget ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz
undefinedwget ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz
undefined4. Process and Analyze ClinVar Data
4. 处理和分析ClinVar数据
Working with XML Files
处理XML文件
Process XML files to extract variant details, classifications, and evidence.
Python example with xml.etree:
python
import gzip
import xml.etree.ElementTree as ET
with gzip.open('ClinVarVariationRelease.xml.gz', 'rt') as f:
for event, elem in ET.iterparse(f, events=('end',)):
if elem.tag == 'VariationArchive':
variation_id = elem.attrib.get('VariationID')
# Extract clinical significance, review status, etc.
elem.clear() # Free memory处理XML文件以提取变异详情、分类和证据。
使用xml.etree的Python示例:
python
import gzip
import xml.etree.ElementTree as ET
with gzip.open('ClinVarVariationRelease.xml.gz', 'rt') as f:
for event, elem in ET.iterparse(f, events=('end',)):
if elem.tag == 'VariationArchive':
variation_id = elem.attrib.get('VariationID')
# Extract clinical significance, review status, etc.
elem.clear() # Free memoryWorking with VCF Files
处理VCF文件
Annotate variant calls or filter by clinical significance using bcftools or Python.
Using bcftools:
bash
undefined使用bcftools或Python注释变异调用或按临床意义过滤。
使用bcftools:
bash
undefinedFilter pathogenic variants
Filter pathogenic variants
bcftools view -i 'INFO/CLNSIG~"Pathogenic"' clinvar.vcf.gz
bcftools view -i 'INFO/CLNSIG~"Pathogenic"' clinvar.vcf.gz
Extract specific genes
Extract specific genes
bcftools view -i 'INFO/GENEINFO~"BRCA"' clinvar.vcf.gz
bcftools view -i 'INFO/GENEINFO~"BRCA"' clinvar.vcf.gz
Annotate your VCF with ClinVar
Annotate your VCF with ClinVar
bcftools annotate -a clinvar.vcf.gz -c INFO your_variants.vcf
**Using PyVCF in Python:**
```python
import vcf
vcf_reader = vcf.Reader(filename='clinvar.vcf.gz')
for record in vcf_reader:
clnsig = record.INFO.get('CLNSIG', [])
if 'Pathogenic' in clnsig:
gene = record.INFO.get('GENEINFO', [''])[0]
print(f"{record.CHROM}:{record.POS} {gene} - {clnsig}")bcftools annotate -a clinvar.vcf.gz -c INFO your_variants.vcf
**使用Python的PyVCF:**
```python
import vcf
vcf_reader = vcf.Reader(filename='clinvar.vcf.gz')
for record in vcf_reader:
clnsig = record.INFO.get('CLNSIG', [])
if 'Pathogenic' in clnsig:
gene = record.INFO.get('GENEINFO', [''])[0]
print(f"{record.CHROM}:{record.POS} {gene} - {clnsig}")Working with Tab-Delimited Files
处理制表符分隔文件
Use pandas or command-line tools for rapid filtering and analysis.
Using pandas:
python
import pandas as pd使用pandas或命令行工具进行快速过滤和分析。
使用pandas:
python
import pandas as pdLoad variant summary
Load variant summary
df = pd.read_csv('variant_summary.txt.gz', sep='\t', compression='gzip')
df = pd.read_csv('variant_summary.txt.gz', sep='\t', compression='gzip')
Filter pathogenic variants in specific gene
Filter pathogenic variants in specific gene
pathogenic_brca = df[
(df['GeneSymbol'] == 'BRCA1') &
(df['ClinicalSignificance'].str.contains('Pathogenic', na=False))
]
pathogenic_brca = df[
(df['GeneSymbol'] == 'BRCA1') &
(df['ClinicalSignificance'].str.contains('Pathogenic', na=False))
]
Count variants by clinical significance
Count variants by clinical significance
sig_counts = df['ClinicalSignificance'].value_counts()
**Using command-line tools:**
```bashsig_counts = df['ClinicalSignificance'].value_counts()
**使用命令行工具:**
```bashExtract pathogenic variants for specific gene
Extract pathogenic variants for specific gene
zcat variant_summary.txt.gz |
awk -F'\t' '$7=="TP53" && $13~"Pathogenic"' |
cut -f1,5,7,13,14
awk -F'\t' '$7=="TP53" && $13~"Pathogenic"' |
cut -f1,5,7,13,14
undefinedzcat variant_summary.txt.gz |
awk -F'\t' '$7=="TP53" && $13~"Pathogenic"' |
cut -f1,5,7,13,14
awk -F'\t' '$7=="TP53" && $13~"Pathogenic"' |
cut -f1,5,7,13,14
undefined5. Handle Conflicting Interpretations
5. 处理冲突的解读结果
When multiple submitters provide different classifications for the same variant, ClinVar reports "Conflicting interpretations of pathogenicity."
Resolution strategy:
- Check review status (star rating) - higher ratings carry more weight
- Examine evidence and assertion criteria from each submitter
- Consider submission dates - newer submissions may reflect updated evidence
- Review population frequency data (e.g., gnomAD) for context
- Consult expert panel classifications (★★★) when available
- For clinical use, always defer to a genetics professional
Search query to exclude conflicts:
TP53[gene] AND pathogenic[CLNSIG] NOT conflicting[RVSTAT]当多个提交者对同一变异提供不同分类时,ClinVar会标记为“Conflicting interpretations of pathogenicity(致病性解读冲突)”。
解决策略:
- 检查评审状态(星级评分) - 评分越高权重越大
- 检查每个提交者的证据和断言标准
- 考虑提交日期 - 较新的提交可能反映更新的证据
- 结合人群频率数据(如gnomAD)进行分析
- 如有可用,参考专家小组的分类结果(★★★)
- 临床使用时,务必咨询遗传学专业人士
排除冲突结果的搜索查询:
TP53[gene] AND pathogenic[CLNSIG] NOT conflicting[RVSTAT]6. Track Classification Updates
6. 跟踪分类更新
Variant classifications may change over time as new evidence emerges.
Why classifications change:
- New functional studies or clinical data
- Updated population frequency information
- Revised ACMG/AMP guidelines
- Segregation data from additional families
Best practices:
- Document ClinVar version and access date for reproducibility
- Re-check classifications periodically for critical variants
- Subscribe to ClinVar mailing list for major updates
- Use monthly archived releases for stable datasets
随着新证据的出现,变异分类可能会随时间变化。
分类变化的原因:
- 新的功能研究或临床数据
- 更新的人群频率信息
- 修订后的ACMG/AMP指南
- 更多家庭的分离数据
最佳实践:
- 记录ClinVar版本和访问日期以保证可复现性
- 定期重新检查关键变异的分类
- 订阅ClinVar邮件列表获取重大更新
- 使用月度存档版本获取稳定数据集
7. Submit Data to ClinVar
7. 向ClinVar提交数据
Organizations can submit variant interpretations to ClinVar.
Submission methods:
- Web submission portal: https://submit.ncbi.nlm.nih.gov/subs/clinvar/
- API submission (requires service account): See
references/api_reference.md - Batch submission via Excel templates
Requirements:
- Organizational account with NCBI
- Assertion criteria (preferably ACMG/AMP guidelines)
- Supporting evidence for classification
Contact: clinvar@ncbi.nlm.nih.gov for submission account setup.
机构可向ClinVar提交变异解读结果。
提交方式:
- 网页提交门户:https://submit.ncbi.nlm.nih.gov/subs/clinvar/
- API提交(需要服务账户):详见
references/api_reference.md - 通过Excel模板批量提交
要求:
- 拥有NCBI机构账户
- 断言标准(优先使用ACMG/AMP指南)
- 分类的支持证据
联系方式:clinvar@ncbi.nlm.nih.gov(用于设置提交账户)
Workflow Examples
工作流示例
Example 1: Identify High-Confidence Pathogenic Variants in a Gene
示例1:识别某基因中高可信度的致病性变异
Objective: Find pathogenic variants in CFTR gene with expert panel review.
Steps:
- Search using web interface or E-utilities:
CFTR[gene] AND pathogenic[CLNSIG] AND (reviewed by expert panel[RVSTAT] OR practice guideline[RVSTAT]) - Review results, noting review status (should be ★★★ or ★★★★)
- Export variant list or retrieve full records via efetch
- Cross-reference with clinical presentation if applicable
目标: 查找CFTR基因中经专家小组评审的致病性变异。
步骤:
- 使用网页界面或E-utilities进行搜索:
CFTR[gene] AND pathogenic[CLNSIG] AND (reviewed by expert panel[RVSTAT] OR practice guideline[RVSTAT]) - 查看结果,确认评审状态(应为★★★或★★★★)
- 导出变异列表或通过efetch获取完整记录
- 如有需要,结合临床表型进行交叉参考
Example 2: Annotate VCF with ClinVar Classifications
示例2:用ClinVar分类注释VCF文件
Objective: Add clinical significance annotations to variant calls.
Steps:
- Download appropriate ClinVar VCF (match genome build: GRCh37 or GRCh38):
bash
wget ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz wget ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz.tbi - Annotate using bcftools:
bash
bcftools annotate -a clinvar.vcf.gz \ -c INFO/CLNSIG,INFO/CLNDN,INFO/CLNREVSTAT \ -o annotated_variants.vcf \ your_variants.vcf - Filter annotated VCF for pathogenic variants:
bash
bcftools view -i 'INFO/CLNSIG~"Pathogenic"' annotated_variants.vcf
目标: 为变异调用添加临床意义注释。
步骤:
- 下载合适的ClinVar VCF文件(匹配基因组版本:GRCh37或GRCh38):
bash
wget ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz wget ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz.tbi - 使用bcftools进行注释:
bash
bcftools annotate -a clinvar.vcf.gz \ -c INFO/CLNSIG,INFO/CLNDN,INFO/CLNREVSTAT \ -o annotated_variants.vcf \ your_variants.vcf - 过滤注释后的VCF文件以获取致病性变异:
bash
bcftools view -i 'INFO/CLNSIG~"Pathogenic"' annotated_variants.vcf
Example 3: Analyze Variants for a Specific Disease
示例3:分析特定疾病相关的变异
Objective: Study all variants associated with hereditary breast cancer.
Steps:
- Search by condition:
hereditary breast cancer[disorder] OR "Breast-ovarian cancer, familial"[disorder] - Download results as CSV or retrieve via E-utilities
- Filter by review status to prioritize high-confidence variants
- Analyze distribution across genes (BRCA1, BRCA2, PALB2, etc.)
- Examine variants with conflicting interpretations separately
目标: 研究所有与遗传性乳腺癌相关的变异。
步骤:
- 按病症搜索:
hereditary breast cancer[disorder] OR "Breast-ovarian cancer, familial"[disorder] - 将结果下载为CSV或通过E-utilities获取
- 按评审状态过滤,优先选择高可信度变异
- 分析变异在各基因(BRCA1、BRCA2、PALB2等)中的分布
- 单独分析存在冲突解读的变异
Example 4: Bulk Download and Database Construction
示例4:批量下载并构建本地数据库
Objective: Build a local ClinVar database for analysis pipeline.
Steps:
- Download monthly release for reproducibility:
bash
wget ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/clinvar_variation/ClinVarVariationRelease_YYYY-MM.xml.gz - Parse XML and load into database (PostgreSQL, MySQL, MongoDB)
- Index by gene, position, clinical significance, review status
- Implement version tracking for updates
- Schedule monthly updates from FTP site
目标: 构建本地ClinVar数据库用于分析流程。
步骤:
- 下载月度版本以保证可复现性:
bash
wget ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/clinvar_variation/ClinVarVariationRelease_YYYY-MM.xml.gz - 解析XML并加载到数据库(PostgreSQL、MySQL、MongoDB)
- 按基因、位置、临床意义、评审状态建立索引
- 实现版本跟踪以支持更新
- 安排每月从FTP站点获取更新
Important Limitations and Considerations
重要局限性和注意事项
Data Quality
数据质量
- Not all submissions have equal weight - Check review status (star ratings)
- Conflicting interpretations exist - Require manual evaluation
- Historical submissions may be outdated - Newer data may be more accurate
- VUS classification is not a clinical diagnosis - Means insufficient evidence
- 并非所有提交内容权重相同 - 检查评审状态(星级评分)
- 存在冲突的解读 - 需要人工评估
- 历史提交内容可能过时 - 较新的数据可能更准确
- VUS分类不是临床诊断 - 仅表示证据不足
Scope Limitations
范围局限性
- Not for direct clinical diagnosis - Always involve genetics professional
- Population-specific - Variant frequencies vary by ancestry
- Incomplete coverage - Not all genes or variants are well-studied
- Version dependencies - Coordinate genome build (GRCh37/GRCh38) across analyses
- 不能直接用于临床诊断 - 务必咨询遗传学专业人士
- 具有人群特异性 - 变异频率因祖先群体而异
- 覆盖不完整 - 并非所有基因或变异都经过充分研究
- 版本依赖 - 分析过程中需统一基因组版本(GRCh37/GRCh38)
Technical Limitations
技术局限性
- VCF files exclude large variants - Variants >10kb not in VCF format
- Rate limits on API - 3 req/sec without key, 10 req/sec with API key
- File sizes - Full XML releases are multi-GB compressed files
- No real-time updates - Website updated weekly, FTP monthly/weekly
- VCF文件不包含大型变异 - 长度>10kb的变异未纳入VCF格式
- API存在速率限制 - 无密钥时3次/秒,有API密钥时10次/秒
- 文件体积大 - 完整XML版本压缩后可达数GB
- 无实时更新 - 网站每周更新,FTP为月度/每周更新
Resources
资源
Reference Documentation
参考文档
This skill includes comprehensive reference documentation:
-
- Complete E-utilities API documentation with examples for esearch, esummary, efetch, and elink; includes rate limits, authentication, and Python/Biopython code samples
references/api_reference.md -
- Detailed guide to interpreting clinical significance classifications, review status star ratings, conflict resolution, and best practices for variant interpretation
references/clinical_significance.md -
- Documentation for XML, VCF, and tab-delimited file formats; FTP directory structure, processing examples, and format selection guidance
references/data_formats.md
本技能包含全面的参考文档:
-
- 完整的E-utilities API文档,包含esearch、esummary、efetch和elink的示例;包括速率限制、认证以及Python/Biopython代码示例
references/api_reference.md -
- 解读临床意义分类、评审状态星级评分、冲突解决和变异解读最佳实践的详细指南
references/clinical_significance.md -
- XML、VCF和制表符分隔文件格式的文档;FTP目录结构、处理示例和格式选择指南
references/data_formats.md
External Resources
外部资源
- ClinVar home: https://www.ncbi.nlm.nih.gov/clinvar/
- ClinVar documentation: https://www.ncbi.nlm.nih.gov/clinvar/docs/
- E-utilities documentation: https://www.ncbi.nlm.nih.gov/books/NBK25501/
- ACMG variant interpretation guidelines: Richards et al., 2015 (PMID: 25741868)
- ClinGen expert panels: https://clinicalgenome.org/
- ClinVar主页:https://www.ncbi.nlm.nih.gov/clinvar/
- ClinVar文档:https://www.ncbi.nlm.nih.gov/clinvar/docs/
- E-utilities文档:https://www.ncbi.nlm.nih.gov/books/NBK25501/
- ACMG变异解读指南:Richards et al., 2015 (PMID: 25741868)
- ClinGen专家小组:https://clinicalgenome.org/
Contact
联系方式
For questions about ClinVar or data submission: clinvar@ncbi.nlm.nih.gov
关于ClinVar或数据提交的问题:clinvar@ncbi.nlm.nih.gov