bio-phylogenomics
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseBio Phylogenomics
生物系统发育基因组学
Build marker gene alignments and phylogenetic trees.
构建标记基因比对序列和系统发育树。
Instructions
使用说明
- Extract marker genes or SSU rRNA sequences.
- Align and trim sequences using project-standard workflows.
- Build ML trees with bootstraps:
- Standard accuracy: Use IQ-TREE (comprehensive model selection, publication-quality)
- Fast mode: Use IQ-TREE -fast (exploratory analysis, large datasets >10K sequences)
- Very large datasets: Use VeryFastTree (>100K sequences, ultra-fast)
- Post-process trees with ETE Toolkit:
- Calculate tree statistics (branch lengths, distances, topology metrics)
- Root, prune, or collapse nodes as needed
- Filter by bootstrap support values
- Add taxonomic or trait annotations
- Generate publication-quality visualizations
- 提取标记基因或SSU rRNA序列。
- 使用项目标准工作流对序列进行比对和修剪。
- 构建带自展值(bootstrap)的最大似然(ML)树:
- 标准精度模式:使用IQ-TREE(综合模型选择,符合出版级质量)
- 快速模式:使用IQ-TREE -fast(探索性分析,适用于大于10K条序列的大型数据集)
- 超大型数据集:使用VeryFastTree(适用于大于100K条序列,速度极快)
- 使用ETE Toolkit对树进行后处理:
- 计算树的统计指标(分支长度、距离、拓扑结构指标)
- 根据需要对节点进行生根、修剪或合并
- 按自展支持值过滤
- 添加分类或性状注释
- 生成出版级质量的可视化结果
Quick Reference
快速参考
| Task | Action |
|---|---|
| Run workflow | Follow the steps in this skill and capture outputs. |
| Validate inputs | Confirm required inputs and reference data exist. |
| Review outputs | Inspect reports and QC gates before proceeding. |
| Tool docs | See |
| References | - See ../bio-skills-references.md |
| 任务 | 操作 |
|---|---|
| 运行工作流 | 遵循本技能中的步骤并保存输出结果。 |
| 输入验证 | 确认所需输入和参考数据存在。 |
| 输出审核 | 继续下一步前检查报告和质量控制(QC)关卡。 |
| 工具文档 | 参考 |
| 参考文献 | - 参考../bio-skills-references.md |
Input Requirements
输入要求
Prerequisites:
- Tools available in the active environment (Pixi/conda/system). See for expected tools.
docs/README.md - Marker gene set or alignments available. Inputs:
- markers.faa (marker genes) or alignments.fasta
前提条件:
- 活跃环境(Pixi/conda/系统)中已安装所需工具。预期工具列表见。
docs/README.md - 已准备好标记基因集或比对序列。 输入项:
- markers.faa(标记基因)或alignments.fasta
Output
输出
- results/bio-phylogenomics/alignments/
- results/bio-phylogenomics/trees/
- results/bio-phylogenomics/phylo_report.md
- results/bio-phylogenomics/logs/
- results/bio-phylogenomics/alignments/
- results/bio-phylogenomics/trees/
- results/bio-phylogenomics/phylo_report.md
- results/bio-phylogenomics/logs/
Quality Gates
质量控制关卡
- Alignment length and missingness meet project thresholds.
- Bootstrap support summary meets project thresholds.
- On failure: retry with alternative parameters; if still failing, record in report and exit non-zero.
- Verify markers.faa is non-empty and aligned sequences are consistent.
- 比对长度和缺失率符合项目阈值要求。
- 自展支持值汇总符合项目阈值要求。
- 运行失败:使用替代参数重试;如果仍然失败,在报告中记录并以非零状态退出。
- 验证markers.faa非空,且比对序列一致。
Examples
示例
Example 1: Expected input layout
示例1:预期输入布局
text
markers.faa (marker genes) or alignments.fastatext
markers.faa (marker genes) or alignments.fastaTroubleshooting
故障排除
Issue: Missing inputs or reference databases
Solution: Verify paths and permissions before running the workflow.
Issue: Low-quality results or failed QC gates
Solution: Review reports, adjust parameters, and re-run the affected step.
问题:缺少输入或参考数据库
解决方案:运行工作流前验证路径和权限。
问题:结果质量低或未通过质量控制关卡
解决方案:查看报告,调整参数,重新运行受影响的步骤。