bio-phylogenomics

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Bio Phylogenomics

生物系统发育基因组学

Build marker gene alignments and phylogenetic trees.
构建标记基因比对序列和系统发育树。

Instructions

使用说明

  1. Extract marker genes or SSU rRNA sequences.
  2. Align and trim sequences using project-standard workflows.
  3. Build ML trees with bootstraps:
  4. Standard accuracy: Use IQ-TREE (comprehensive model selection, publication-quality)
  5. Fast mode: Use IQ-TREE -fast (exploratory analysis, large datasets >10K sequences)
  6. Very large datasets: Use VeryFastTree (>100K sequences, ultra-fast)
  7. Post-process trees with ETE Toolkit:
  8. Calculate tree statistics (branch lengths, distances, topology metrics)
  9. Root, prune, or collapse nodes as needed
  10. Filter by bootstrap support values
  11. Add taxonomic or trait annotations
  12. Generate publication-quality visualizations
  1. 提取标记基因或SSU rRNA序列。
  2. 使用项目标准工作流对序列进行比对和修剪。
  3. 构建带自展值(bootstrap)的最大似然(ML)树:
  4. 标准精度模式:使用IQ-TREE(综合模型选择,符合出版级质量)
  5. 快速模式:使用IQ-TREE -fast(探索性分析,适用于大于10K条序列的大型数据集)
  6. 超大型数据集:使用VeryFastTree(适用于大于100K条序列,速度极快)
  7. 使用ETE Toolkit对树进行后处理:
  8. 计算树的统计指标(分支长度、距离、拓扑结构指标)
  9. 根据需要对节点进行生根、修剪或合并
  10. 按自展支持值过滤
  11. 添加分类或性状注释
  12. 生成出版级质量的可视化结果

Quick Reference

快速参考

TaskAction
Run workflowFollow the steps in this skill and capture outputs.
Validate inputsConfirm required inputs and reference data exist.
Review outputsInspect reports and QC gates before proceeding.
Tool docsSee
docs/README.md
.
References- See ../bio-skills-references.md
任务操作
运行工作流遵循本技能中的步骤并保存输出结果。
输入验证确认所需输入和参考数据存在。
输出审核继续下一步前检查报告和质量控制(QC)关卡。
工具文档参考
docs/README.md
参考文献- 参考../bio-skills-references.md

Input Requirements

输入要求

Prerequisites:
  • Tools available in the active environment (Pixi/conda/system). See
    docs/README.md
    for expected tools.
  • Marker gene set or alignments available. Inputs:
  • markers.faa (marker genes) or alignments.fasta
前提条件:
  • 活跃环境(Pixi/conda/系统)中已安装所需工具。预期工具列表见
    docs/README.md
  • 已准备好标记基因集或比对序列。 输入项:
  • markers.faa(标记基因)或alignments.fasta

Output

输出

  • results/bio-phylogenomics/alignments/
  • results/bio-phylogenomics/trees/
  • results/bio-phylogenomics/phylo_report.md
  • results/bio-phylogenomics/logs/
  • results/bio-phylogenomics/alignments/
  • results/bio-phylogenomics/trees/
  • results/bio-phylogenomics/phylo_report.md
  • results/bio-phylogenomics/logs/

Quality Gates

质量控制关卡

  • Alignment length and missingness meet project thresholds.
  • Bootstrap support summary meets project thresholds.
  • On failure: retry with alternative parameters; if still failing, record in report and exit non-zero.
  • Verify markers.faa is non-empty and aligned sequences are consistent.
  • 比对长度和缺失率符合项目阈值要求。
  • 自展支持值汇总符合项目阈值要求。
  • 运行失败:使用替代参数重试;如果仍然失败,在报告中记录并以非零状态退出。
  • 验证markers.faa非空,且比对序列一致。

Examples

示例

Example 1: Expected input layout

示例1:预期输入布局

text
markers.faa (marker genes) or alignments.fasta
text
markers.faa (marker genes) or alignments.fasta

Troubleshooting

故障排除

Issue: Missing inputs or reference databases Solution: Verify paths and permissions before running the workflow.
Issue: Low-quality results or failed QC gates Solution: Review reports, adjust parameters, and re-run the affected step.
问题:缺少输入或参考数据库 解决方案:运行工作流前验证路径和权限。
问题:结果质量低或未通过质量控制关卡 解决方案:查看报告,调整参数,重新运行受影响的步骤。