encode-ccres-database

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

ENCODE Database Skill

ENCODE 数据库技能

This skill allows you to query the ENCODE Registry of cCREs (candidate cis-Regulatory Elements) via the SCREEN GraphQL API. It helps identify functional non-coding DNA elements (like Promoters, Enhancers, and insulators) by analyzing biochemical signatures (DNase, H3K4me3, H3K27ac, CTCF).

本技能允许您通过SCREEN GraphQL API查询ENCODE候选顺式调控元件注册库（cCREs）。它通过分析生化特征（DNase、H3K4me3、H3K27ac、CTCF）帮助识别功能性非编码DNA元件（如启动子、增强子和绝缘子）。

Prerequisites

前置条件

uv
: Read the
```
uv
```
skill and follow its Setup instructions to ensure
```
uv
```
is installed and on PATH.
User Notification: If LICENSE_NOTIFICATION.txt does not already exist in this skill directory then (1) prominently notify the user to check the terms at https://www.encodeproject.org/help/rest-api/, then (2) create the file recording the notification text and timestamp.

uv
：阅读
```
uv
```
技能文档并遵循其安装说明，确保
```
uv
```
已安装且在系统PATH中。
用户通知：如果此技能目录中尚未存在LICENSE_NOTIFICATION.txt文件，则(1) 显著通知用户查看https://www.encodeproject.org/help/rest-api/上的条款，然后(2) 创建记录通知文本和时间戳的文件。

Core Rules

核心规则

Use the Wrapper: ALWAYS execute the provided helper scripts to query the database rather than accessing the database directly. The scripts automatically enforce the required rate limit gracefully.
Parsing Output: Do NOT use
```
cat
```
to read the entire JSON output file into context, as it can be extremely large. You MUST use
```
jq
```
to efficiently parse and extract relevant fields.
Notification: If this skill is used, ensure this is mentioned in the output.

使用封装脚本：始终执行提供的辅助脚本查询数据库，而非直接访问数据库。这些脚本会自动优雅地执行所需的速率限制。
输出解析：请勿使用
```
cat
```
将整个JSON输出文件读入上下文，因为文件可能极大。您必须使用
```
jq
```
来高效解析和提取相关字段。
通知要求：如果使用了此技能，请确保在输出中提及这一点。

Quick Start

快速开始

bash

undefined

bash

undefined

Search cCREs by coordinates

uv run scripts/screen_api.py search --chromosome chr11
--start 5205263 --end 5207263
--output /tmp/search.json

Get details for a specific cCRE

uv run scripts/screen_api.py details EH38E2941922
--output /tmp/details.json


All subcommands write JSON to disk. Always save output in a temporary location
like `/tmp/`.

uv run scripts/screen_api.py details EH38E2941922
--output /tmp/details.json


所有子命令都会将JSON写入磁盘。请始终将输出保存到`/tmp/`等临时位置。

Identifying High-Confidence ("Type A") Biosamples

识别高可信度（"A型"）生物样本

Biosamples in ENCODE are often categorized by their data completeness. "Type A" (or high-confidence) biosamples are those that have experimental data for all four core epigenetic markers: DNase, H3K4me3, H3K27ac, and CTCF.

The

biosamples

and

details

commands automatically enrich their output with an

is_type_a

boolean flag for each biosample.

Example: Finding high-confidence cell types

bash

uv run scripts/screen_api.py biosamples --output /tmp/biosamples.json

ENCODE中的生物样本通常按数据完整性分类。"A型"（或高可信度）生物样本是指拥有全部四种核心表观遗传标记实验数据的样本：DNase、H3K4me3、H3K27ac和CTCF。

biosamples

和

details

命令会自动在输出中为每个生物样本添加

is_type_a

布尔标记。

示例：查找高可信度细胞类型

bash

uv run scripts/screen_api.py biosamples --output /tmp/biosamples.json

Use jq to filter for Type A biosamples

jq '.data.ccREBiosampleQuery.biosamples[] | select(.is_type_a == true) | .displayname' /tmp/biosamples.json

undefined

jq '.data.ccREBiosampleQuery.biosamples[] | select(.is_type_a == true) | .displayname' /tmp/biosamples.json

undefined

Parsing Output (CRITICAL)

输出解析（关键）

Do NOT use
cat
to read the entire JSON output file into context, as it can be extremely large. Instead, you MUST use

jq

to efficiently parse and extract the relevant fields from the JSON file saved by the script. If

jq

is not available on the system, write your own Python filtering code (e.g.,

python3 -c "import json..."

) to extract the necessary data.

For a complete reference of the JSON structure returned by eachmcommand (so you know which fields to query with

jq

), read

references/json_output_structure.md

请勿使用
cat
将整个JSON输出文件读入上下文，因为文件可能极大。相反，您必须使用

jq

从脚本保存的JSON文件中高效解析和提取相关字段。如果系统中未安装

jq

，请编写自定义Python过滤代码（如

python3 -c "import json..."

）来提取必要数据。

如需了解每个命令返回的JSON结构完整参考（以便知道用

jq

查询哪些字段），请阅读

references/json_output_structure.md

。

Available Commands

可用命令

search

: Search cCREs by coordinates, accessions, or epigenetic signals.

bash

uv run scripts/screen_api.py search \
    --chromosome chr11 --start 5205263 --end 5207263 \
    --output /tmp/search.json

nearby-genes

: Find nearby genes for given cCRE accessions.

bash

uv run scripts/screen_api.py nearby-genes \
    EH38E1516972 --output /tmp/nearby.json

details

: Get detailed information and biosample-specific max Z-scores for a specific cCRE.

bash

uv run scripts/screen_api.py details EH38E2941922 \
    --output /tmp/details.json

biosamples

: Get biosample metadata for an assembly.

bash

uv run scripts/screen_api.py biosamples \
    --output /tmp/biosamples.json

orthologs

: Get orthologous cCREs in another assembly.

bash

uv run scripts/screen_api.py orthologs EH38E2941922 \
    --output /tmp/orthologs.json

linked-genes

: Find linked genes via methods like HiC or eQTLs.

bash

uv run scripts/screen_api.py linked-genes \
    EH38E1516972 --output /tmp/linked.json

```
gene-expression
```
: Get gene expression (TPM) across all biosamples for a named gene. Internally resolves the gene symbol to an Ensembl gene ID, then queries per-biosample RNA-seq quantifications.
bash
```
uv run scripts/screen_api.py gene-expression GAPDH \
    --output /tmp/gene_expr.json
```

entex

: Get ENTEx data for a cCRE or genomic region.

bash

uv run scripts/screen_api.py entex \
    --accession EH38E1310345 \
    --output /tmp/entex.json

bash

uv run scripts/screen_api.py entex \
    --region chr1:1000068:1000409 \
    --output /tmp/entex.json

gwas

: Query genome-wide association studies, SNPs, or enrichment data.

bash

uv run scripts/screen_api.py gwas studies \
    --output /tmp/gwas.json

bash

uv run scripts/screen_api.py gwas snps --study \
    Ahola-Olli_AV-27989323-Eotaxin_levels \
    --output /tmp/gwas_snps.json

You can supply the

--assembly mm10

--assembly grch38

flag to explicitly request a specific assembly for most commands. By default, the script targets

grch38

but will automatically fall back to

mm10

if no results are found or if the query fails.

search

：通过坐标、登录号或表观遗传信号搜索cCREs。

bash

uv run scripts/screen_api.py search \
    --chromosome chr11 --start 5205263 --end 5207263 \
    --output /tmp/search.json

nearby-genes

：查找给定cCRE登录号的邻近基因。

bash

uv run scripts/screen_api.py nearby-genes \
    EH38E1516972 --output /tmp/nearby.json

details

：获取特定cCRE的详细信息及生物样本特异性最大Z值。

bash

uv run scripts/screen_api.py details EH38E2941922 \
    --output /tmp/details.json

biosamples

：获取某个组装版本的生物样本元数据。

bash

uv run scripts/screen_api.py biosamples \
    --output /tmp/biosamples.json

orthologs

：获取另一个组装版本中的同源cCREs。

bash

uv run scripts/screen_api.py orthologs EH38E2941922 \
    --output /tmp/orthologs.json

linked-genes

：通过HiC或eQTL等方法查找关联基因。

bash

uv run scripts/screen_api.py linked-genes \
    EH38E1516972 --output /tmp/linked.json

```
gene-expression
```
：获取指定基因在所有生物样本中的基因表达量（TPM）。内部会将基因符号解析为Ensembl基因ID，然后查询每个生物样本的RNA-seq定量数据。
bash
```
uv run scripts/screen_api.py gene-expression GAPDH \
    --output /tmp/gene_expr.json
```

entex

：获取cCRE或基因组区域的ENTEx数据。

bash

uv run scripts/screen_api.py entex \
    --accession EH38E1310345 \
    --output /tmp/entex.json

bash

uv run scripts/screen_api.py entex \
    --region chr1:1000068:1000409 \
    --output /tmp/entex.json

gwas

：查询全基因组关联研究、SNP或富集数据。

bash

uv run scripts/screen_api.py gwas studies \
    --output /tmp/gwas.json

bash

uv run scripts/screen_api.py gwas snps --study \
    Ahola-Olli_AV-27989323-Eotaxin_levels \
    --output /tmp/gwas_snps.json

您可以为大多数命令提供

--assembly mm10

或

--assembly grch38

标志，明确请求特定的组装版本。默认情况下，脚本以

grch38

为目标，但如果未找到结果或查询失败，会自动回退到

mm10

。

ENCODE Portal REST API (Direct Access)

ENCODE Portal REST API（直接访问）

For accessing raw experiments, ChIP-seq peaks, or other datasets that are not represented as cCREs in SCREEN, use the

scripts/encode_portal_api.py

script. It allows custom queries to the ENCODE Portal REST API.

如需访问未在SCREEN中以cCREs形式呈现的原始实验、ChIP-seq峰或其他数据集，请使用

scripts/encode_portal_api.py

脚本。它允许对ENCODE Portal REST API进行自定义查询。

Usage

使用方法

bash

uv run scripts/encode_portal_api.py search "type=Experiment&target.label=ZNF549" --output /tmp/znf549_experiments.json

bash

uv run scripts/encode_portal_api.py search "type=Experiment&target.label=ZNF549" --output /tmp/znf549_experiments.json

Data Analysis Tips

数据分析技巧

When analyzing

.bed

.bigBed

files downloaded from ENCODE, standard bioinformatics tools are highly recommended for finding overlaps (e.g., between gene promoters and peaks):

bedtools
: For fast mathematical operations on genomic intervals.
bigBedToBed
: For converting binary BigBed files to readable BED format.
pybedtools
: A Python wrapper for
```
bedtools
```
.

Write custom logic if these tools are not pre-installed.

分析从ENCODE下载的

.bed

或

.bigBed

文件时，强烈推荐使用标准生物信息学工具查找重叠区域（如基因启动子和峰之间的重叠）：

bedtools
：用于对基因组区间进行快速数学运算。
bigBedToBed
：用于将二进制BigBed文件转换为可读的BED格式。
pybedtools
：
```
bedtools
```
的Python封装库。

如果这些工具未预安装，请编写自定义逻辑。

Custom Queries (SCREEN GraphQL)

自定义查询（SCREEN GraphQL）

If you need to make a complex GraphQL query that the script does not support, read

references/graphql_schema.md

for a reference of available queries, arguments, and return fields in the SCREEN GraphQL API.

如果您需要执行脚本不支持的复杂GraphQL查询，请阅读

references/graphql_schema.md

，了解SCREEN GraphQL API中可用的查询、参数和返回字段参考。