nextflow-development
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesenf-core Pipeline Deployment
nf-core 流程部署
Run nf-core bioinformatics pipelines on local or public sequencing data.
Target users: Bench scientists and researchers without specialized bioinformatics training who need to run large-scale omics analyses—differential expression, variant calling, or chromatin accessibility analysis.
在本地或公共测序数据上运行nf-core生物信息学流程。
目标用户: 没有专业生物信息学培训背景的实验室科学家和研究人员,他们需要进行大规模组学分析——差异表达分析、变异检测或染色质可及性分析。
Workflow Checklist
工作流检查清单
- [ ] Step 0: Acquire data (if from GEO/SRA)
- [ ] Step 1: Environment check (MUST pass)
- [ ] Step 2: Select pipeline (confirm with user)
- [ ] Step 3: Run test profile (MUST pass)
- [ ] Step 4: Create samplesheet
- [ ] Step 5: Configure & run (confirm genome with user)
- [ ] Step 6: Verify outputs- [ ] 步骤0:获取数据(若来自GEO/SRA)
- [ ] 步骤1:环境检查(必须通过)
- [ ] 步骤2:选择流程(与用户确认)
- [ ] 步骤3:运行测试配置文件(必须通过)
- [ ] 步骤4:创建样本表
- [ ] 步骤5:配置并运行(与用户确认参考基因组)
- [ ] 步骤6:验证输出结果Step 0: Acquire Data (GEO/SRA Only)
步骤0:获取数据(仅适用于GEO/SRA)
Skip this step if user has local FASTQ files.
For public datasets, fetch from GEO/SRA first. See references/geo-sra-acquisition.md for the full workflow.
Quick start:
bash
undefined如果用户有本地FASTQ文件,请跳过此步骤。
对于公共数据集,先从GEO/SRA获取。完整工作流请参考references/geo-sra-acquisition.md。
快速开始:
bash
undefined1. Get study info
1. 获取研究信息
python scripts/sra_geo_fetch.py info GSE110004
python scripts/sra_geo_fetch.py info GSE110004
2. Download (interactive mode)
2. 下载(交互模式)
python scripts/sra_geo_fetch.py download GSE110004 -o ./fastq -i
python scripts/sra_geo_fetch.py download GSE110004 -o ./fastq -i
3. Generate samplesheet
3. 生成样本表
python scripts/sra_geo_fetch.py samplesheet GSE110004 --fastq-dir ./fastq -o samplesheet.csv
**DECISION POINT:** After fetching study info, confirm with user:
- Which sample subset to download (if multiple data types)
- Suggested genome and pipeline
Then continue to Step 1.
---python scripts/sra_geo_fetch.py samplesheet GSE110004 --fastq-dir ./fastq -o samplesheet.csv
**决策点:** 获取研究信息后,与用户确认:
- 要下载的样本子集(如果有多种数据类型)
- 推荐的参考基因组和流程
之后继续步骤1。
---Step 1: Environment Check
步骤1:环境检查
Run first. Pipeline will fail without passing environment.
bash
python scripts/check_environment.pyAll critical checks must pass. If any fail, provide fix instructions:
优先运行。未通过环境检查的话,流程会失败。
bash
python scripts/check_environment.py所有关键检查项必须通过。如果有检查失败,提供修复说明:
Docker issues
Docker问题
| Problem | Fix |
|---|---|
| Not installed | Install from https://docs.docker.com/get-docker/ |
| Permission denied | |
| Daemon not running | |
| 问题 | 修复方案 |
|---|---|
| 未安装 | 从https://docs.docker.com/get-docker/安装 |
| 权限不足 | 执行 |
| 守护进程未运行 | 执行 |
Nextflow issues
Nextflow问题
| Problem | Fix |
|---|---|
| Not installed | |
| Version < 23.04 | |
| 问题 | 修复方案 |
|---|---|
| 未安装 | 执行 |
| 版本 < 23.04 | 执行 |
Java issues
Java问题
| Problem | Fix |
|---|---|
| Not installed / < 11 | |
Do not proceed until all checks pass. For HPC/Singularity, see references/troubleshooting.md.
| 问题 | 修复方案 |
|---|---|
| 未安装 / 版本 < 11 | 执行 |
所有检查通过前请勿继续。 针对HPC/Singularity的情况,请参考references/troubleshooting.md。
Step 2: Select Pipeline
步骤2:选择流程
DECISION POINT: Confirm with user before proceeding.
| Data Type | Pipeline | Version | Goal |
|---|---|---|---|
| RNA-seq | | 3.22.2 | Gene expression |
| WGS/WES | | 3.7.1 | Variant calling |
| ATAC-seq | | 2.1.2 | Chromatin accessibility |
Auto-detect from data:
bash
python scripts/detect_data_type.py /path/to/dataFor pipeline-specific details:
- references/pipelines/rnaseq.md
- references/pipelines/sarek.md
- references/pipelines/atacseq.md
决策点:继续前请与用户确认。
| 数据类型 | 流程 | 版本 | 目标 |
|---|---|---|---|
| RNA-seq | | 3.22.2 | 基因表达分析 |
| WGS/WES | | 3.7.1 | 变异检测 |
| ATAC-seq | | 2.1.2 | 染色质可及性分析 |
自动检测数据类型:
bash
python scripts/detect_data_type.py /path/to/data流程详细信息请参考:
- references/pipelines/rnaseq.md
- references/pipelines/sarek.md
- references/pipelines/atacseq.md
Step 3: Run Test Profile
步骤3:运行测试配置文件
Validates environment with small data. MUST pass before real data.
bash
nextflow run nf-core/<pipeline> -r <version> -profile test,docker --outdir test_output| Pipeline | Command |
|---|---|
| rnaseq | |
| sarek | |
| atacseq | |
Verify:
bash
ls test_output/multiqc/multiqc_report.html
grep "Pipeline completed successfully" .nextflow.logIf test fails, see references/troubleshooting.md.
用小型数据验证环境。处理真实数据前必须通过此测试。
bash
nextflow run nf-core/<pipeline> -r <version> -profile test,docker --outdir test_output| 流程 | 命令 |
|---|---|
| rnaseq | |
| sarek | |
| atacseq | |
验证:
bash
ls test_output/multiqc/multiqc_report.html
grep "Pipeline completed successfully" .nextflow.log如果测试失败,请参考references/troubleshooting.md。
Step 4: Create Samplesheet
步骤4:创建样本表
Generate automatically
自动生成
bash
python scripts/generate_samplesheet.py /path/to/data <pipeline> -o samplesheet.csvThe script:
- Discovers FASTQ/BAM/CRAM files
- Pairs R1/R2 reads
- Infers sample metadata
- Validates before writing
For sarek: Script prompts for tumor/normal status if not auto-detected.
bash
python scripts/generate_samplesheet.py /path/to/data <pipeline> -o samplesheet.csv该脚本功能:
- 发现FASTQ/BAM/CRAM文件
- 配对R1/R2读段
- 推断样本元数据
- 写入前进行验证
针对sarek流程: 如果未自动检测到肿瘤/正常状态,脚本会提示用户输入。
Validate existing samplesheet
验证现有样本表
bash
python scripts/generate_samplesheet.py --validate samplesheet.csv <pipeline>bash
python scripts/generate_samplesheet.py --validate samplesheet.csv <pipeline>Samplesheet formats
样本表格式
rnaseq:
csv
sample,fastq_1,fastq_2,strandedness
SAMPLE1,/abs/path/R1.fq.gz,/abs/path/R2.fq.gz,autosarek:
csv
patient,sample,lane,fastq_1,fastq_2,status
patient1,tumor,L001,/abs/path/tumor_R1.fq.gz,/abs/path/tumor_R2.fq.gz,1
patient1,normal,L001,/abs/path/normal_R1.fq.gz,/abs/path/normal_R2.fq.gz,0atacseq:
csv
sample,fastq_1,fastq_2,replicate
CONTROL,/abs/path/ctrl_R1.fq.gz,/abs/path/ctrl_R2.fq.gz,1rnaseq:
csv
sample,fastq_1,fastq_2,strandedness
SAMPLE1,/abs/path/R1.fq.gz,/abs/path/R2.fq.gz,autosarek:
csv
patient,sample,lane,fastq_1,fastq_2,status
patient1,tumor,L001,/abs/path/tumor_R1.fq.gz,/abs/path/tumor_R2.fq.gz,1
patient1,normal,L001,/abs/path/normal_R1.fq.gz,/abs/path/normal_R2.fq.gz,0atacseq:
csv
sample,fastq_1,fastq_2,replicate
CONTROL,/abs/path/ctrl_R1.fq.gz,/abs/path/ctrl_R2.fq.gz,1Step 5: Configure & Run
步骤5:配置并运行
5a. Check genome availability
5a. 检查参考基因组可用性
bash
python scripts/manage_genomes.py check <genome>bash
python scripts/manage_genomes.py check <genome>If not installed:
若未安装:
python scripts/manage_genomes.py download <genome>
Common genomes: GRCh38 (human), GRCh37 (legacy), GRCm39 (mouse), R64-1-1 (yeast), BDGP6 (fly)python scripts/manage_genomes.py download <genome>
常用参考基因组:GRCh38(人类)、GRCh37(旧版)、GRCm39(小鼠)、R64-1-1(酵母)、BDGP6(果蝇)5b. Decision points
5b. 决策点
DECISION POINT: Confirm with user:
- Genome: Which reference to use
- Pipeline-specific options:
- rnaseq: aligner (star_salmon recommended, hisat2 for low memory)
- sarek: tools (haplotypecaller for germline, mutect2 for somatic)
- atacseq: read_length (50, 75, 100, or 150)
决策点:与用户确认:
- 参考基因组: 使用哪个参考序列
- 流程特定选项:
- rnaseq: 比对工具(推荐star_salmon,低内存场景用hisat2)
- sarek: 分析工具(生殖系变异用haplotypecaller,体细胞变异用mutect2)
- atacseq: 读长(50、75、100或150)
5c. Run pipeline
5c. 运行流程
bash
nextflow run nf-core/<pipeline> \
-r <version> \
-profile docker \
--input samplesheet.csv \
--outdir results \
--genome <genome> \
-resumeKey flags:
- : Pin version
-r - : Use Docker (or
-profile dockerfor HPC)singularity - : iGenomes key
--genome - : Continue from checkpoint
-resume
Resource limits (if needed):
bash
--max_cpus 8 --max_memory '32.GB' --max_time '24.h'bash
nextflow run nf-core/<pipeline> \
-r <version> \
-profile docker \
--input samplesheet.csv \
--outdir results \
--genome <genome> \
-resume关键参数:
- :固定版本
-r - :使用Docker(HPC环境用
-profile docker)singularity - :iGenomes编号
--genome - :从检查点继续运行
-resume
资源限制(如需):
bash
--max_cpus 8 --max_memory '32.GB' --max_time '24.h'Step 6: Verify Outputs
步骤6:验证输出结果
Check completion
检查运行完成状态
bash
ls results/multiqc/multiqc_report.html
grep "Pipeline completed successfully" .nextflow.logbash
ls results/multiqc/multiqc_report.html
grep "Pipeline completed successfully" .nextflow.logKey outputs by pipeline
各流程的关键输出
rnaseq:
- - Gene counts
results/star_salmon/salmon.merged.gene_counts.tsv - - TPM values
results/star_salmon/salmon.merged.gene_tpm.tsv
sarek:
- - VCF files
results/variant_calling/*/ - - BAM files
results/preprocessing/recalibrated/
atacseq:
- - Peak calls
results/macs2/narrowPeak/ - - Coverage tracks
results/bwa/mergedLibrary/bigwig/
rnaseq:
- - 基因计数
results/star_salmon/salmon.merged.gene_counts.tsv - - TPM值
results/star_salmon/salmon.merged.gene_tpm.tsv
sarek:
- - VCF文件
results/variant_calling/*/ - - BAM文件
results/preprocessing/recalibrated/
atacseq:
- - 峰调用结果
results/macs2/narrowPeak/ - - 覆盖度轨迹
results/bwa/mergedLibrary/bigwig/
Quick Reference
快速参考
For common exit codes and fixes, see references/troubleshooting.md.
常见退出码及修复方案请参考references/troubleshooting.md。
Resume failed run
恢复失败的运行
bash
nextflow run nf-core/<pipeline> -resumebash
nextflow run nf-core/<pipeline> -resumeReferences
参考文档
- references/geo-sra-acquisition.md - Downloading public GEO/SRA data
- references/troubleshooting.md - Common issues and fixes
- references/installation.md - Environment setup
- references/pipelines/rnaseq.md - RNA-seq pipeline details
- references/pipelines/sarek.md - Variant calling details
- references/pipelines/atacseq.md - ATAC-seq details
- references/geo-sra-acquisition.md - 下载公共GEO/SRA数据
- references/troubleshooting.md - 常见问题及修复方案
- references/installation.md - 环境搭建
- references/pipelines/rnaseq.md - RNA-seq流程详细信息
- references/pipelines/sarek.md - 变异检测流程详细信息
- references/pipelines/atacseq.md - ATAC-seq流程详细信息
Disclaimer
免责声明
This skill is provided as a prototype example demonstrating how to integrate nf-core bioinformatics pipelines into Claude Code for automated analysis workflows. The current implementation supports three pipelines (rnaseq, sarek, and atacseq), serving as a foundation that enables the community to expand support to the full set of nf-core pipelines.
It is intended for educational and research purposes and should not be considered production-ready without appropriate validation for your specific use case. Users are responsible for ensuring their computing environment meets pipeline requirements and for verifying analysis results.
Anthropic does not guarantee the accuracy of bioinformatics outputs, and users should follow standard practices for validating computational analyses. This integration is not officially endorsed by or affiliated with the nf-core community.
本工具是一个原型示例,展示如何将nf-core生物信息学流程集成到Claude Code中以实现自动化分析工作流。当前版本支持三个流程(rnaseq、sarek和atacseq),作为基础框架,社区可以扩展支持所有nf-core流程。
本工具仅用于教育和研究目的,未经针对特定使用场景的适当验证,请勿用于生产环境。用户需自行确保其计算环境满足流程要求,并验证分析结果。
Anthropic不保证生物信息学输出结果的准确性,用户应遵循计算分析验证的标准实践。本集成未得到nf-core社区的官方认可或关联。
Attribution
引用说明
When publishing results, cite the appropriate pipeline. Citations are available in each nf-core repository's CITATIONS.md file (e.g., https://github.com/nf-core/rnaseq/blob/3.22.2/CITATIONS.md).
发表研究结果时,请引用相应的流程。引用信息可在每个nf-core仓库的CITATIONS.md文件中找到(例如:https://github.com/nf-core/rnaseq/blob/3.22.2/CITATIONS.md)。
Licenses
许可证
- nf-core pipelines: MIT License (https://nf-co.re/about)
- Nextflow: Apache License, Version 2.0 (https://www.nextflow.io/about-us.html)
- NCBI SRA Toolkit: Public Domain (https://github.com/ncbi/sra-tools/blob/master/LICENSE)
- nf-core流程: MIT许可证(https://nf-co.re/about)
- Nextflow: Apache许可证2.0版本(https://www.nextflow.io/about-us.html)
- NCBI SRA Toolkit: 公有领域(https://github.com/ncbi/sra-tools/blob/master/LICENSE)