encode-ccres-database
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseENCODE Database Skill
ENCODE 数据库技能
This skill allows you to query the ENCODE Registry of cCREs (candidate
cis-Regulatory Elements) via the SCREEN GraphQL API. It helps identify
functional non-coding DNA elements (like Promoters, Enhancers, and insulators)
by analyzing biochemical signatures (DNase, H3K4me3, H3K27ac, CTCF).
本技能允许您通过SCREEN GraphQL API查询ENCODE候选顺式调控元件注册库(cCREs)。它通过分析生化特征(DNase、H3K4me3、H3K27ac、CTCF)帮助识别功能性非编码DNA元件(如启动子、增强子和绝缘子)。
Prerequisites
前置条件
- : Read the
uvskill and follow its Setup instructions to ensureuvis installed and on PATH.uv - User Notification: If LICENSE_NOTIFICATION.txt does not already exist in this skill directory then (1) prominently notify the user to check the terms at https://www.encodeproject.org/help/rest-api/, then (2) create the file recording the notification text and timestamp.
- :阅读
uv技能文档并遵循其安装说明,确保uv已安装且在系统PATH中。uv - 用户通知:如果此技能目录中尚未存在LICENSE_NOTIFICATION.txt文件,则(1) 显著通知用户查看https://www.encodeproject.org/help/rest-api/上的条款,然后(2) 创建记录通知文本和时间戳的文件。
Core Rules
核心规则
- Use the Wrapper: ALWAYS execute the provided helper scripts to query the database rather than accessing the database directly. The scripts automatically enforce the required rate limit gracefully.
- Parsing Output: Do NOT use to read the entire JSON output file into context, as it can be extremely large. You MUST use
catto efficiently parse and extract relevant fields.jq - Notification: If this skill is used, ensure this is mentioned in the output.
- 使用封装脚本:始终执行提供的辅助脚本查询数据库,而非直接访问数据库。这些脚本会自动优雅地执行所需的速率限制。
- 输出解析:请勿使用将整个JSON输出文件读入上下文,因为文件可能极大。您必须使用
cat来高效解析和提取相关字段。jq - 通知要求:如果使用了此技能,请确保在输出中提及这一点。
Quick Start
快速开始
bash
undefinedbash
undefinedSearch cCREs by coordinates
Search cCREs by coordinates
uv run scripts/screen_api.py search --chromosome chr11
--start 5205263 --end 5207263
--output /tmp/search.json
--start 5205263 --end 5207263
--output /tmp/search.json
uv run scripts/screen_api.py search --chromosome chr11
--start 5205263 --end 5207263
--output /tmp/search.json
--start 5205263 --end 5207263
--output /tmp/search.json
Get details for a specific cCRE
Get details for a specific cCRE
uv run scripts/screen_api.py details EH38E2941922
--output /tmp/details.json
--output /tmp/details.json
All subcommands write JSON to disk. Always save output in a temporary location
like `/tmp/`.uv run scripts/screen_api.py details EH38E2941922
--output /tmp/details.json
--output /tmp/details.json
所有子命令都会将JSON写入磁盘。请始终将输出保存到`/tmp/`等临时位置。Identifying High-Confidence ("Type A") Biosamples
识别高可信度("A型")生物样本
Biosamples in ENCODE are often categorized by their data completeness. "Type
A" (or high-confidence) biosamples are those that have experimental data for
all four core epigenetic markers: DNase, H3K4me3, H3K27ac, and CTCF.
The and commands automatically enrich their output with
an boolean flag for each biosample.
biosamplesdetailsis_type_aExample: Finding high-confidence cell types
bash
uv run scripts/screen_api.py biosamples --output /tmp/biosamples.jsonENCODE中的生物样本通常按数据完整性分类。"A型"(或高可信度)生物样本是指拥有全部四种核心表观遗传标记实验数据的样本:DNase、H3K4me3、H3K27ac和CTCF。
biosamplesdetailsis_type_a示例:查找高可信度细胞类型
bash
uv run scripts/screen_api.py biosamples --output /tmp/biosamples.jsonUse jq to filter for Type A biosamples
Use jq to filter for Type A biosamples
jq '.data.ccREBiosampleQuery.biosamples[] | select(.is_type_a == true) | .displayname' /tmp/biosamples.json
undefinedjq '.data.ccREBiosampleQuery.biosamples[] | select(.is_type_a == true) | .displayname' /tmp/biosamples.json
undefinedParsing Output (CRITICAL)
输出解析(关键)
Do NOT use to read the entire JSON output file into context, as it
can be extremely large. Instead, you MUST use to efficiently parse and
extract the relevant fields from the JSON file saved by the script. If is
not available on the system, write your own Python filtering code (e.g.,
) to extract the necessary data.
catjqjqpython3 -c "import json..."For a complete reference of the JSON structure returned by eachmcommand (so you
know which fields to query with ), read
.
jqreferences/json_output_structure.md请勿使用将整个JSON输出文件读入上下文,因为文件可能极大。 相反,您必须使用从脚本保存的JSON文件中高效解析和提取相关字段。如果系统中未安装,请编写自定义Python过滤代码(如)来提取必要数据。
catjqjqpython3 -c "import json..."如需了解每个命令返回的JSON结构完整参考(以便知道用查询哪些字段),请阅读。
jqreferences/json_output_structure.mdAvailable Commands
可用命令
-
: Search cCREs by coordinates, accessions, or epigenetic signals.
searchbashuv run scripts/screen_api.py search \ --chromosome chr11 --start 5205263 --end 5207263 \ --output /tmp/search.json -
: Find nearby genes for given cCRE accessions.
nearby-genesbashuv run scripts/screen_api.py nearby-genes \ EH38E1516972 --output /tmp/nearby.json -
: Get detailed information and biosample-specific max Z-scores for a specific cCRE.
detailsbashuv run scripts/screen_api.py details EH38E2941922 \ --output /tmp/details.json -
: Get biosample metadata for an assembly.
biosamplesbashuv run scripts/screen_api.py biosamples \ --output /tmp/biosamples.json -
: Get orthologous cCREs in another assembly.
orthologsbashuv run scripts/screen_api.py orthologs EH38E2941922 \ --output /tmp/orthologs.json -
: Find linked genes via methods like HiC or eQTLs.
linked-genesbashuv run scripts/screen_api.py linked-genes \ EH38E1516972 --output /tmp/linked.json -
: Get gene expression (TPM) across all biosamples for a named gene. Internally resolves the gene symbol to an Ensembl gene ID, then queries per-biosample RNA-seq quantifications.
gene-expressionbashuv run scripts/screen_api.py gene-expression GAPDH \ --output /tmp/gene_expr.json -
: Get ENTEx data for a cCRE or genomic region.
entexbashuv run scripts/screen_api.py entex \ --accession EH38E1310345 \ --output /tmp/entex.jsonbashuv run scripts/screen_api.py entex \ --region chr1:1000068:1000409 \ --output /tmp/entex.json -
: Query genome-wide association studies, SNPs, or enrichment data.
gwasbashuv run scripts/screen_api.py gwas studies \ --output /tmp/gwas.jsonbashuv run scripts/screen_api.py gwas snps --study \ Ahola-Olli_AV-27989323-Eotaxin_levels \ --output /tmp/gwas_snps.json
You can supply the or flag to explicitly
request a specific assembly for most commands. By default, the script targets
but will automatically fall back to if no results are found or
if the query fails.
--assembly mm10--assembly grch38grch38mm10-
:通过坐标、登录号或表观遗传信号搜索cCREs。
searchbashuv run scripts/screen_api.py search \ --chromosome chr11 --start 5205263 --end 5207263 \ --output /tmp/search.json -
:查找给定cCRE登录号的邻近基因。
nearby-genesbashuv run scripts/screen_api.py nearby-genes \ EH38E1516972 --output /tmp/nearby.json -
:获取特定cCRE的详细信息及生物样本特异性最大Z值。
detailsbashuv run scripts/screen_api.py details EH38E2941922 \ --output /tmp/details.json -
:获取某个组装版本的生物样本元数据。
biosamplesbashuv run scripts/screen_api.py biosamples \ --output /tmp/biosamples.json -
:获取另一个组装版本中的同源cCREs。
orthologsbashuv run scripts/screen_api.py orthologs EH38E2941922 \ --output /tmp/orthologs.json -
:通过HiC或eQTL等方法查找关联基因。
linked-genesbashuv run scripts/screen_api.py linked-genes \ EH38E1516972 --output /tmp/linked.json -
:获取指定基因在所有生物样本中的基因表达量(TPM)。内部会将基因符号解析为Ensembl基因ID,然后查询每个生物样本的RNA-seq定量数据。
gene-expressionbashuv run scripts/screen_api.py gene-expression GAPDH \ --output /tmp/gene_expr.json -
:获取cCRE或基因组区域的ENTEx数据。
entexbashuv run scripts/screen_api.py entex \ --accession EH38E1310345 \ --output /tmp/entex.jsonbashuv run scripts/screen_api.py entex \ --region chr1:1000068:1000409 \ --output /tmp/entex.json -
:查询全基因组关联研究、SNP或富集数据。
gwasbashuv run scripts/screen_api.py gwas studies \ --output /tmp/gwas.jsonbashuv run scripts/screen_api.py gwas snps --study \ Ahola-Olli_AV-27989323-Eotaxin_levels \ --output /tmp/gwas_snps.json
您可以为大多数命令提供或标志,明确请求特定的组装版本。默认情况下,脚本以为目标,但如果未找到结果或查询失败,会自动回退到。
--assembly mm10--assembly grch38grch38mm10ENCODE Portal REST API (Direct Access)
ENCODE Portal REST API(直接访问)
For accessing raw experiments, ChIP-seq peaks, or other datasets that are not
represented as cCREs in SCREEN, use the script.
It allows custom queries to the ENCODE Portal REST API.
scripts/encode_portal_api.py如需访问未在SCREEN中以cCREs形式呈现的原始实验、ChIP-seq峰或其他数据集,请使用脚本。它允许对ENCODE Portal REST API进行自定义查询。
scripts/encode_portal_api.pyUsage
使用方法
bash
uv run scripts/encode_portal_api.py search "type=Experiment&target.label=ZNF549" --output /tmp/znf549_experiments.jsonbash
uv run scripts/encode_portal_api.py search "type=Experiment&target.label=ZNF549" --output /tmp/znf549_experiments.jsonData Analysis Tips
数据分析技巧
When analyzing or files downloaded from ENCODE, standard
bioinformatics tools are highly recommended for finding overlaps (e.g., between
gene promoters and peaks):
.bed.bigBed- : For fast mathematical operations on genomic intervals.
bedtools - : For converting binary BigBed files to readable BED format.
bigBedToBed - : A Python wrapper for
pybedtools.bedtools
Write custom logic if these tools are not pre-installed.
分析从ENCODE下载的或文件时,强烈推荐使用标准生物信息学工具查找重叠区域(如基因启动子和峰之间的重叠):
.bed.bigBed- :用于对基因组区间进行快速数学运算。
bedtools - :用于将二进制BigBed文件转换为可读的BED格式。
bigBedToBed - :
pybedtools的Python封装库。bedtools
如果这些工具未预安装,请编写自定义逻辑。
Custom Queries (SCREEN GraphQL)
自定义查询(SCREEN GraphQL)
If you need to make a complex GraphQL query that the script does not support,
read for a reference of available queries,
arguments, and return fields in the SCREEN GraphQL API.
references/graphql_schema.md如果您需要执行脚本不支持的复杂GraphQL查询,请阅读,了解SCREEN GraphQL API中可用的查询、参数和返回字段参考。
references/graphql_schema.md