ucsc-conservation-and-tfbs
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseConservation Scores & TFBS Lookup (UCSC)
保守分数与TFBS查询(UCSC)
This skill provides access to evolutionary constraint scores and conserved
elements from the UCSC Genome Browser. It retrieves scores from the PHAST
package — specifically (identifying functional blocks) and
(measuring individual site constraint) — calculated from multiple alignments.
phastConsphyloPUse this skill to determine if a non-coding variant hits a site that hasn't
changed since a common ancestor (which is a strong signal for pathogenicity) or
to find conservation peaks across a regulatory element.
本技能可从UCSC Genome Browser获取进化约束分数和保守元件。它会检索PHAST包中的分数——具体包括(识别功能区块)和(测量单个位点的约束程度)——这些分数由多序列比对计算得出。
phastConsphyloP使用本技能可以判断非编码变异是否发生在自共同祖先以来未发生变化的位点(这是致病性的强烈信号),或者查找调控元件上的保守性峰值。
Prerequisites
前置条件
- : Read the
uvskill and follow its Setup instructions to ensureuvis installed and on PATH.uv - User Notification: If LICENSE_NOTIFICATION.txt does not already exist in this skill directory then (1) prominently notify the user to check the terms at https://genome.ucsc.edu/conditions.html and https://genome.ucsc.edu/goldenPath/help/api.html, then (2) create the file recording the notification text and timestamp.
- : 阅读
uv技能并遵循其设置说明,确保uv已安装且在PATH路径中。uv - 用户通知: 如果本技能目录中不存在LICENSE_NOTIFICATION.txt文件,则(1) 显著通知用户查看 https://genome.ucsc.edu/conditions.html 和 https://genome.ucsc.edu/goldenPath/help/api.html 上的条款,然后(2) 创建记录通知文本和时间戳的文件。
Core Rules
核心规则
- Use the Wrapper: ALWAYS execute the provided helper scripts to query the database rather than accessing the database directly. The scripts automatically enforce the required rate limit gracefully.
- Large Output Handling: Always pass --output to redirect output to a file. Parse it separately (using jq or your own code).
- Notification: If this skill is used, ensure this is mentioned in the output.
- 使用包装器脚本: 务必使用提供的辅助脚本查询数据库,而非直接访问数据库。这些脚本会自动优雅地执行所需的速率限制。
- 大输出处理: 始终通过--output参数将输出重定向到文件。单独解析该文件(使用jq或自定义代码)。
- 通知要求: 如果使用本技能,需确保在输出中提及这一点。
Utility Scripts
实用脚本
This skill includes scripts to query different types of genomic data from UCSC:
- : For Evolutionary Conservation scores (phyloP, phastCons).
scripts/get_conservation.py - : For Transcription Factor Binding Sites (TFBS).
scripts/get_tfbs.py - : For listing available tracks based on search or group constraints.
scripts/list_tracks.py
Always use the genome assembly by default, unless the user has specified
otherwise.
hg38本技能包含用于从UCSC查询不同类型基因组数据的脚本:
- : 用于获取进化保守分数(phyloP、phastCons)。
scripts/get_conservation.py - : 用于获取转录因子结合位点(TFBS)。
scripts/get_tfbs.py - : 用于根据搜索或组约束列出可用的轨道。
scripts/list_tracks.py
默认情况下始终使用基因组组装,除非用户另有指定。
hg38Fetching Conservation for Specific Variants
获取特定变异的保守性数据
To get the evolutionary constraint at a single base, or a list of specific
bases. This is optimal for single nucleotide variants (SNVs). is the
best metric for individual bases.
phyloPbash
uv run scripts/get_conservation.py --coordinates "chr1:215867804" "chr1:215867823" --output /tmp/cons_output.json获取单个碱基或特定碱基列表的进化约束程度。这适用于单核苷酸变异(SNVs)。是衡量单个碱基的最佳指标。
phyloPbash
uv run scripts/get_conservation.py --coordinates "chr1:215867804" "chr1:215867823" --output /tmp/cons_output.jsonFetching Regions and Conserved Elements
获取区域与保守元件
To identify "conservation peaks" across a non-coding regulatory element (like an
enhancer) to see if an ISM-predicted importance peak aligns with evolutionary
history. is best for functional windows due to HMM smoothing. The
flag will also retrieve predefined blocks under extreme
constraint.
phastCons--conserved-elementsbash
uv run scripts/get_conservation.py --coordinates "chr8:11748914-11749085" --conserved-elements --output /tmp/region_cons.json识别非编码调控元件(如增强子)上的“保守性峰值”,查看ISM预测的重要性峰值是否与进化历史一致。由于HMM平滑处理,是衡量功能窗口的最佳指标。标志还将检索处于极端约束下的预定义区块。
phastCons--conserved-elementsbash
uv run scripts/get_conservation.py --coordinates "chr8:11748914-11749085" --conserved-elements --output /tmp/region_cons.jsonLineage-Specific Constraints
谱系特异性约束
You can control the evolutionary depth using the flag. The
default () uses the 100-vertebrate Multiz alignment for both
hg38 and hg19, matching the UCSC Genome Browser's default comparative genomics
tracks.
--collectionvertebrate可以使用标志控制进化深度。默认值()对hg38和hg19均使用100脊椎动物Multiz比对,与UCSC Genome Browser的默认比较基因组轨道一致。
--collectionvertebratehg38 Collections
hg38集合
- (default): UCSC 100-vertebrate Multiz alignment. phyloP:
vertebrate, phastCons:phyloP100way.phastCons100way - : Hiller Lab 470-way mammalian alignment. phyloP:
mammal, phastCons:phyloP470wayBW.phastCons470way - : UCSC 30-primate Multiz alignment. phyloP:
primate, phastCons:phyloP30way.phastCons30way
- (默认): UCSC 100脊椎动物Multiz比对。phyloP:
vertebrate,phastCons:phyloP100way。phastCons100way - : Hiller实验室470种哺乳动物比对。phyloP:
mammal,phastCons:phyloP470wayBW。phastCons470way - : UCSC 30种灵长类动物Multiz比对。phyloP:
primate,phastCons:phyloP30way。phastCons30way
hg19 Collections
hg19集合
- (default): UCSC 100-vertebrate Multiz alignment. phyloP:
vertebrate, phastCons:phyloP100way.phastCons100way - : UCSC 46-vertebrate Multiz alignment (legacy). phyloP:
vertebrate46, phastCons:phyloP46wayAll.phastCons46way - : 46-way placental mammal subset. phyloP:
mammal, phastCons:phyloP46wayPlacental.phastCons46wayPlacental - : 46-way primate subset. phyloP:
primate, phastCons:phyloP46wayPrimates.phastCons46wayPrimates
bash
undefined- (默认): UCSC 100脊椎动物Multiz比对。phyloP:
vertebrate,phastCons:phyloP100way。phastCons100way - : UCSC 46脊椎动物Multiz比对(旧版)。phyloP:
vertebrate46,phastCons:phyloP46wayAll。phastCons46way - : 46种胎盘哺乳动物子集。phyloP:
mammal,phastCons:phyloP46wayPlacental。phastCons46wayPlacental - : 46种灵长类动物子集。phyloP:
primate,phastCons:phyloP46wayPrimates。phastCons46wayPrimates
bash
undefinedhg38 mammal (Hiller 470-way)
hg38哺乳动物(Hiller 470种比对)
uv run scripts/get_conservation.py --coordinates "chr5:1045330-1046172" --collection mammal --output /tmp/mammal_cons.json
uv run scripts/get_conservation.py --coordinates "chr5:1045330-1046172" --collection mammal --output /tmp/mammal_cons.json
hg19 with legacy 46-vertebrate alignment
hg19使用旧版46脊椎动物比对
uv run scripts/get_conservation.py --coordinates "chr5:1045330-1046172" --genome hg19 --collection vertebrate46 --output /tmp/vert46_cons.json
undefineduv run scripts/get_conservation.py --coordinates "chr5:1045330-1046172" --genome hg19 --collection vertebrate46 --output /tmp/vert46_cons.json
undefinedAnalyzing Evolutionary Acceleration
分析进化加速
To analyze whether a specific locus is undergoing evolutionary acceleration
(i.e. evolving more rapidly than the neutral drift baseline), use .
This will compute scalar statistics (mean, min, max) for scores and
provide a heuristic boolean to simplify your evaluation.
--analyzephyloPis_acceleratedbash
uv run scripts/get_conservation.py --coordinates "chr5:1045330-1046172" --analyze --output /tmp/accelerated_cons.json要分析特定位点是否正在经历进化加速(即进化速度快于中性漂变基线),请使用参数。这将计算分数的标量统计值(均值、最小值、最大值),并提供启发式布尔值以简化评估。
--analyzephyloPis_acceleratedbash
uv run scripts/get_conservation.py --coordinates "chr5:1045330-1046172" --analyze --output /tmp/accelerated_cons.jsonFetching Transcription Factor Binding Sites (TFBS)
获取转录因子结合位点(TFBS)
To identify transcription factor binding sites for a given genomic interval.
This is useful for interpreting non-coding variants that might disrupt TF
binding.
Run with and . You can query
multiple tracks at once.
scripts/get_tfbs.py--coordinates--tracksbash
uv run scripts/get_tfbs.py --coordinates "chr11:1001000-1010000" --tracks encRegTfbsClustered --output /tmp/tfbs_encode.jsonJASPAR tracks may return very large result sets. Use to keep only
items whose field contains the given substring (case-insensitive):
--tf-filterTFNamebash
uv run scripts/get_tfbs.py --coordinates "chr6:36670000-36690000" --tracks jaspar2024 --tf-filter TP53 --output /tmp/tp53_sites.json识别给定基因组区间的转录因子结合位点。这有助于解释可能破坏转录因子结合的非编码变异。
运行并传入和参数。可以同时查询多个轨道。
scripts/get_tfbs.py--coordinates--tracksbash
uv run scripts/get_tfbs.py --coordinates "chr11:1001000-1010000" --tracks encRegTfbsClustered --output /tmp/tfbs_encode.jsonJASPAR轨道可能返回非常大的结果集。使用参数仅保留字段包含指定子字符串的条目(不区分大小写):
--tf-filterTFNamebash
uv run scripts/get_tfbs.py --coordinates "chr6:36670000-36690000" --tracks jaspar2024 --tf-filter TP53 --output /tmp/tp53_sites.jsonCommon Verified Tracks (hg38)
常用已验证轨道(hg38)
- ENCODE: (TF Clusters)
encRegTfbsClustered - JASPAR: ,
jaspar2026(Predicted TFBS)jaspar2024 - ReMap: (ChIP-seq Atlas)
ReMapTFs
[!CAUTION] Tracks likeorjasparwithout years are often "container" tracks and will fail with a 400 error. Always use the specific subtrack name (e.g.,ReMap).jaspar2026
- ENCODE: (转录因子簇)
encRegTfbsClustered - JASPAR: ,
jaspar2026(预测TFBS)jaspar2024 - ReMap: (ChIP-seq图谱)
ReMapTFs
[!CAUTION] 不带年份的轨道如或jaspar通常是“容器”轨道,会返回400错误。请始终使用特定的子轨道名称(例如ReMap)。jaspar2026
Listing Available Tracks
列出可用轨道
To list available tracks (such as different versions of JASPAR, or purely to
discover what tracks exist for a particular genome assembly):
bash
uv run scripts/list_tracks.py --search "jaspar" --output /tmp/jaspar_tracks.jsonYou can also filter by functional group:
bash
uv run scripts/list_tracks.py --group "regulation" --output /tmp/regulation_tracks.json要列出可用轨道(例如不同版本的JASPAR,或仅发现特定基因组组装的可用轨道):
bash
uv run scripts/list_tracks.py --search "jaspar" --output /tmp/jaspar_tracks.json也可以按功能组筛选:
bash
uv run scripts/list_tracks.py --group "regulation" --output /tmp/regulation_tracks.jsonAnti-Patterns
反模式
- DON'T query mammalian () constraint if you are explicitly looking for deep evolutionary roots across all vertebrates. Use the default
--collection mammalcollection.vertebrate - DON'T use this skill for determining the ancestral state reconstruction of a nucleotide (this skill provides measures of how much sites have changed, not what the ancestral nucleotide was).
- DON'T assume low conservation strictly means neutral/useless sequence; it could also reflect a high local mutation rate which conservation scores alone cannot distinguish.
- DON'T print output on standard out, or run cat on output to files. The output is too large. Use jq or write your own code to parse the output files.
- DON'T use hg19 unless the user has explicitly asked for it. The default should be to always use hg38.
- 请勿在明确寻找所有脊椎动物的深层进化根源时查询哺乳动物()约束。请使用默认的
--collection mammal集合。vertebrate - 请勿使用本技能确定核苷酸的祖先状态重建(本技能提供的是位点变化程度的测量值,而非祖先核苷酸的具体信息)。
- 请勿假设低保守性严格意味着序列是中性/无用的;这也可能反映局部突变率高,而仅靠保守分数无法区分这一点。
- 请勿在标准输出打印结果,或对输出文件执行cat命令。输出内容过大。请使用jq或编写自定义代码解析输出文件。
- 请勿使用hg19,除非用户明确要求。默认应始终使用hg38。