nextflow-development

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

nf-core Pipeline Deployment

nf-core 流程部署

Run nf-core bioinformatics pipelines on local or public sequencing data.

Target users: Bench scientists and researchers without specialized bioinformatics training who need to run large-scale omics analyses—differential expression, variant calling, or chromatin accessibility analysis.

在本地或公共测序数据上运行nf-core生物信息学流程。

目标用户： 没有专业生物信息学培训背景的实验室科学家和研究人员，他们需要进行大规模组学分析——差异表达分析、变异检测或染色质可及性分析。

Workflow Checklist

工作流检查清单

- [ ] Step 0: Acquire data (if from GEO/SRA)
- [ ] Step 1: Environment check (MUST pass)
- [ ] Step 2: Select pipeline (confirm with user)
- [ ] Step 3: Run test profile (MUST pass)
- [ ] Step 4: Create samplesheet
- [ ] Step 5: Configure & run (confirm genome with user)
- [ ] Step 6: Verify outputs

- [ ] 步骤0：获取数据（若来自GEO/SRA）
- [ ] 步骤1：环境检查（必须通过）
- [ ] 步骤2：选择流程（与用户确认）
- [ ] 步骤3：运行测试配置文件（必须通过）
- [ ] 步骤4：创建样本表
- [ ] 步骤5：配置并运行（与用户确认参考基因组）
- [ ] 步骤6：验证输出结果

Step 0: Acquire Data (GEO/SRA Only)

步骤0：获取数据（仅适用于GEO/SRA）

Skip this step if user has local FASTQ files.

For public datasets, fetch from GEO/SRA first. See references/geo-sra-acquisition.md for the full workflow.

Quick start:

bash

undefined

如果用户有本地FASTQ文件，请跳过此步骤。

对于公共数据集，先从GEO/SRA获取。完整工作流请参考references/geo-sra-acquisition.md。

快速开始：

bash

undefined

1. Get study info

1. 获取研究信息

python scripts/sra_geo_fetch.py info GSE110004

2. Download (interactive mode)

2. 下载（交互模式）

python scripts/sra_geo_fetch.py download GSE110004 -o ./fastq -i

3. Generate samplesheet

3. 生成样本表

python scripts/sra_geo_fetch.py samplesheet GSE110004 --fastq-dir ./fastq -o samplesheet.csv


**DECISION POINT:** After fetching study info, confirm with user:
- Which sample subset to download (if multiple data types)
- Suggested genome and pipeline

Then continue to Step 1.

---

python scripts/sra_geo_fetch.py samplesheet GSE110004 --fastq-dir ./fastq -o samplesheet.csv


**决策点：** 获取研究信息后，与用户确认：
- 要下载的样本子集（如果有多种数据类型）
- 推荐的参考基因组和流程

之后继续步骤1。

---

Step 1: Environment Check

步骤1：环境检查

Run first. Pipeline will fail without passing environment.

bash

python scripts/check_environment.py

All critical checks must pass. If any fail, provide fix instructions:

优先运行。未通过环境检查的话，流程会失败。

bash

python scripts/check_environment.py

所有关键检查项必须通过。如果有检查失败，提供修复说明：

Docker issues

Docker问题

Problem	Fix
Not installed	Install from https://docs.docker.com/get-docker/
Permission denied	`sudo usermod -aG docker $USER` then re-login
Daemon not running	`sudo systemctl start docker`

问题	修复方案
未安装	从https://docs.docker.com/get-docker/安装
权限不足	执行 `sudo usermod -aG docker $USER` 后重新登录
守护进程未运行	执行 `sudo systemctl start docker`

Nextflow issues

Nextflow问题

Problem	Fix
Not installed	`curl -s https://get.nextflow.io \| bash && mv nextflow ~/bin/`
Version < 23.04	`nextflow self-update`

问题	修复方案
未安装	执行 `curl -s https://get.nextflow.io \| bash && mv nextflow ~/bin/`
版本 < 23.04	执行 `nextflow self-update`

Java issues

Java问题

Problem	Fix
Not installed / < 11	`sudo apt install openjdk-11-jdk`

Do not proceed until all checks pass. For HPC/Singularity, see references/troubleshooting.md.

问题	修复方案
未安装 / 版本 < 11	执行 `sudo apt install openjdk-11-jdk`

所有检查通过前请勿继续。 针对HPC/Singularity的情况，请参考references/troubleshooting.md。

Step 2: Select Pipeline

步骤2：选择流程

DECISION POINT: Confirm with user before proceeding.

Data Type	Pipeline	Version	Goal
RNA-seq	`rnaseq`	3.22.2	Gene expression
WGS/WES	`sarek`	3.7.1	Variant calling
ATAC-seq	`atacseq`	2.1.2	Chromatin accessibility

Auto-detect from data:

bash

python scripts/detect_data_type.py /path/to/data

For pipeline-specific details:

references/pipelines/rnaseq.md
references/pipelines/sarek.md
references/pipelines/atacseq.md

决策点：继续前请与用户确认。

数据类型	流程	版本	目标
RNA-seq	`rnaseq`	3.22.2	基因表达分析
WGS/WES	`sarek`	3.7.1	变异检测
ATAC-seq	`atacseq`	2.1.2	染色质可及性分析

自动检测数据类型：

bash

python scripts/detect_data_type.py /path/to/data

流程详细信息请参考：

references/pipelines/rnaseq.md
references/pipelines/sarek.md
references/pipelines/atacseq.md

Step 3: Run Test Profile

步骤3：运行测试配置文件

Validates environment with small data. MUST pass before real data.

bash

nextflow run nf-core/<pipeline> -r <version> -profile test,docker --outdir test_output

Pipeline Command

rnaseq

Pipeline	Command
rnaseq	`nextflow run nf-core/rnaseq -r 3.22.2 -profile test,docker --outdir test_rnaseq`
sarek	`nextflow run nf-core/sarek -r 3.7.1 -profile test,docker --outdir test_sarek`
atacseq	`nextflow run nf-core/atacseq -r 2.1.2 -profile test,docker --outdir test_atacseq`

nextflow run nf-core/rnaseq -r 3.22.2 -profile test,docker --outdir test_rnaseq

sarek

nextflow run nf-core/sarek -r 3.7.1 -profile test,docker --outdir test_sarek

atacseq

nextflow run nf-core/atacseq -r 2.1.2 -profile test,docker --outdir test_atacseq

Verify:

bash

ls test_output/multiqc/multiqc_report.html
grep "Pipeline completed successfully" .nextflow.log

If test fails, see references/troubleshooting.md.

用小型数据验证环境。处理真实数据前必须通过此测试。

bash

nextflow run nf-core/<pipeline> -r <version> -profile test,docker --outdir test_output

流程命令

rnaseq

流程	命令
rnaseq	`nextflow run nf-core/rnaseq -r 3.22.2 -profile test,docker --outdir test_rnaseq`
sarek	`nextflow run nf-core/sarek -r 3.7.1 -profile test,docker --outdir test_sarek`
atacseq	`nextflow run nf-core/atacseq -r 2.1.2 -profile test,docker --outdir test_atacseq`

nextflow run nf-core/rnaseq -r 3.22.2 -profile test,docker --outdir test_rnaseq

sarek

nextflow run nf-core/sarek -r 3.7.1 -profile test,docker --outdir test_sarek

atacseq

nextflow run nf-core/atacseq -r 2.1.2 -profile test,docker --outdir test_atacseq

验证：

bash

ls test_output/multiqc/multiqc_report.html
grep "Pipeline completed successfully" .nextflow.log

如果测试失败，请参考references/troubleshooting.md。

Step 4: Create Samplesheet

步骤4：创建样本表

Generate automatically

自动生成

bash

python scripts/generate_samplesheet.py /path/to/data <pipeline> -o samplesheet.csv

The script:

Discovers FASTQ/BAM/CRAM files
Pairs R1/R2 reads
Infers sample metadata
Validates before writing

For sarek: Script prompts for tumor/normal status if not auto-detected.

bash

python scripts/generate_samplesheet.py /path/to/data <pipeline> -o samplesheet.csv

该脚本功能：

发现FASTQ/BAM/CRAM文件
配对R1/R2读段
推断样本元数据
写入前进行验证

针对sarek流程： 如果未自动检测到肿瘤/正常状态，脚本会提示用户输入。

Validate existing samplesheet

验证现有样本表

bash

python scripts/generate_samplesheet.py --validate samplesheet.csv <pipeline>

bash

python scripts/generate_samplesheet.py --validate samplesheet.csv <pipeline>

Samplesheet formats

样本表格式

rnaseq:

csv

sample,fastq_1,fastq_2,strandedness
SAMPLE1,/abs/path/R1.fq.gz,/abs/path/R2.fq.gz,auto

sarek:

csv

patient,sample,lane,fastq_1,fastq_2,status
patient1,tumor,L001,/abs/path/tumor_R1.fq.gz,/abs/path/tumor_R2.fq.gz,1
patient1,normal,L001,/abs/path/normal_R1.fq.gz,/abs/path/normal_R2.fq.gz,0

atacseq:

csv

sample,fastq_1,fastq_2,replicate
CONTROL,/abs/path/ctrl_R1.fq.gz,/abs/path/ctrl_R2.fq.gz,1

rnaseq：

csv

sample,fastq_1,fastq_2,strandedness
SAMPLE1,/abs/path/R1.fq.gz,/abs/path/R2.fq.gz,auto

sarek：

csv

patient,sample,lane,fastq_1,fastq_2,status
patient1,tumor,L001,/abs/path/tumor_R1.fq.gz,/abs/path/tumor_R2.fq.gz,1
patient1,normal,L001,/abs/path/normal_R1.fq.gz,/abs/path/normal_R2.fq.gz,0

atacseq：

csv

sample,fastq_1,fastq_2,replicate
CONTROL,/abs/path/ctrl_R1.fq.gz,/abs/path/ctrl_R2.fq.gz,1

Step 5: Configure & Run

步骤5：配置并运行

5a. Check genome availability

5a. 检查参考基因组可用性

bash

python scripts/manage_genomes.py check <genome>

bash

python scripts/manage_genomes.py check <genome>

If not installed:

若未安装：

python scripts/manage_genomes.py download <genome>


Common genomes: GRCh38 (human), GRCh37 (legacy), GRCm39 (mouse), R64-1-1 (yeast), BDGP6 (fly)

python scripts/manage_genomes.py download <genome>


常用参考基因组：GRCh38（人类）、GRCh37（旧版）、GRCm39（小鼠）、R64-1-1（酵母）、BDGP6（果蝇）

5b. Decision points

5b. 决策点

DECISION POINT: Confirm with user:

Genome: Which reference to use
Pipeline-specific options:
- rnaseq: aligner (star_salmon recommended, hisat2 for low memory)
- sarek: tools (haplotypecaller for germline, mutect2 for somatic)
- atacseq: read_length (50, 75, 100, or 150)

决策点：与用户确认：

参考基因组： 使用哪个参考序列
流程特定选项：
- rnaseq： 比对工具（推荐star_salmon，低内存场景用hisat2）
- sarek： 分析工具（生殖系变异用haplotypecaller，体细胞变异用mutect2）
- atacseq： 读长（50、75、100或150）

5c. Run pipeline

5c. 运行流程

bash

nextflow run nf-core/<pipeline> \
    -r <version> \
    -profile docker \
    --input samplesheet.csv \
    --outdir results \
    --genome <genome> \
    -resume

Key flags:

```
-r
```
: Pin version
```
-profile docker
```
: Use Docker (or
```
singularity
```
for HPC)
```
--genome
```
: iGenomes key
```
-resume
```
: Continue from checkpoint

Resource limits (if needed):

bash

--max_cpus 8 --max_memory '32.GB' --max_time '24.h'

bash

nextflow run nf-core/<pipeline> \
    -r <version> \
    -profile docker \
    --input samplesheet.csv \
    --outdir results \
    --genome <genome> \
    -resume

关键参数：

```
-r
```
：固定版本
```
-profile docker
```
：使用Docker（HPC环境用
```
singularity
```
）
```
--genome
```
：iGenomes编号
```
-resume
```
：从检查点继续运行

资源限制（如需）：

bash

--max_cpus 8 --max_memory '32.GB' --max_time '24.h'

Step 6: Verify Outputs

步骤6：验证输出结果

Check completion

检查运行完成状态

bash

ls results/multiqc/multiqc_report.html
grep "Pipeline completed successfully" .nextflow.log

bash

ls results/multiqc/multiqc_report.html
grep "Pipeline completed successfully" .nextflow.log

Key outputs by pipeline

各流程的关键输出

rnaseq:

results/star_salmon/salmon.merged.gene_counts.tsv

- Gene counts

results/star_salmon/salmon.merged.gene_tpm.tsv

- TPM values

sarek:

```
results/variant_calling/*/
```
- VCF files
```
results/preprocessing/recalibrated/
```
- BAM files

atacseq:

```
results/macs2/narrowPeak/
```
- Peak calls
```
results/bwa/mergedLibrary/bigwig/
```
- Coverage tracks

rnaseq：

results/star_salmon/salmon.merged.gene_counts.tsv

- 基因计数

results/star_salmon/salmon.merged.gene_tpm.tsv

- TPM值

sarek：

```
results/variant_calling/*/
```
- VCF文件
```
results/preprocessing/recalibrated/
```
- BAM文件

atacseq：

```
results/macs2/narrowPeak/
```
- 峰调用结果
```
results/bwa/mergedLibrary/bigwig/
```
- 覆盖度轨迹

Quick Reference

快速参考

For common exit codes and fixes, see references/troubleshooting.md.

常见退出码及修复方案请参考references/troubleshooting.md。

Resume failed run

恢复失败的运行

bash

nextflow run nf-core/<pipeline> -resume

bash

nextflow run nf-core/<pipeline> -resume

References

参考文档

references/geo-sra-acquisition.md - Downloading public GEO/SRA data
references/troubleshooting.md - Common issues and fixes
references/installation.md - Environment setup
references/pipelines/rnaseq.md - RNA-seq pipeline details
references/pipelines/sarek.md - Variant calling details
references/pipelines/atacseq.md - ATAC-seq details

references/geo-sra-acquisition.md - 下载公共GEO/SRA数据
references/troubleshooting.md - 常见问题及修复方案
references/installation.md - 环境搭建
references/pipelines/rnaseq.md - RNA-seq流程详细信息
references/pipelines/sarek.md - 变异检测流程详细信息
references/pipelines/atacseq.md - ATAC-seq流程详细信息

Disclaimer

免责声明

This skill is provided as a prototype example demonstrating how to integrate nf-core bioinformatics pipelines into Claude Code for automated analysis workflows. The current implementation supports three pipelines (rnaseq, sarek, and atacseq), serving as a foundation that enables the community to expand support to the full set of nf-core pipelines.

It is intended for educational and research purposes and should not be considered production-ready without appropriate validation for your specific use case. Users are responsible for ensuring their computing environment meets pipeline requirements and for verifying analysis results.

Anthropic does not guarantee the accuracy of bioinformatics outputs, and users should follow standard practices for validating computational analyses. This integration is not officially endorsed by or affiliated with the nf-core community.

本工具是一个原型示例，展示如何将nf-core生物信息学流程集成到Claude Code中以实现自动化分析工作流。当前版本支持三个流程（rnaseq、sarek和atacseq），作为基础框架，社区可以扩展支持所有nf-core流程。

本工具仅用于教育和研究目的，未经针对特定使用场景的适当验证，请勿用于生产环境。用户需自行确保其计算环境满足流程要求，并验证分析结果。

Anthropic不保证生物信息学输出结果的准确性，用户应遵循计算分析验证的标准实践。本集成未得到nf-core社区的官方认可或关联。

Attribution

引用说明

When publishing results, cite the appropriate pipeline. Citations are available in each nf-core repository's CITATIONS.md file (e.g., https://github.com/nf-core/rnaseq/blob/3.22.2/CITATIONS.md).

发表研究结果时，请引用相应的流程。引用信息可在每个nf-core仓库的CITATIONS.md文件中找到（例如：https://github.com/nf-core/rnaseq/blob/3.22.2/CITATIONS.md）。

Licenses

许可证

nf-core pipelines: MIT License (https://nf-co.re/about)
Nextflow: Apache License, Version 2.0 (https://www.nextflow.io/about-us.html)
NCBI SRA Toolkit: Public Domain (https://github.com/ncbi/sra-tools/blob/master/LICENSE)

nf-core流程： MIT许可证（https://nf-co.re/about）
Nextflow： Apache许可证2.0版本（https://www.nextflow.io/about-us.html）
NCBI SRA Toolkit： 公有领域（https://github.com/ncbi/sra-tools/blob/master/LICENSE）