tooluniverse-phylogenetics

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Phylogenetics and Sequence Analysis

系统发育与序列分析

Comprehensive phylogenetics and sequence analysis using PhyKIT, Biopython, and DendroPy. Designed for bioinformatics questions about multiple sequence alignments, phylogenetic trees, parsimony, molecular evolution, and comparative genomics.

IMPORTANT: This skill handles complex phylogenetic workflows. Most implementation details have been moved to

references/

for progressive disclosure. This document focuses on high-level decision-making and workflow orchestration.

借助PhyKIT、Biopython和DendroPy实现全面的系统发育与序列分析。专为解答多序列比对、系统发育树、简约性、分子进化和比较基因组学相关的生物信息学问题设计。

重要提示：本工具可处理复杂的系统发育工作流。大多数实现细节已移至

references/

目录，采用渐进式披露方式。本文档聚焦于高层决策与工作流编排。

When to Use This Skill

使用场景

Apply when users:

Have FASTA alignment files and ask about parsimony informative sites, gaps, or alignment quality
Have Newick tree files and ask about treeness, tree length, evolutionary rate, or DVMC
Ask about treeness/RCV, RCV, or relative composition variability
Need to compare phylogenetic metrics between groups (fungi vs animals, etc.)
Ask about PhyKIT functions (treeness, rcv, dvmc, evo_rate, parsimony_informative, tree_length)
Have gene family data with paired alignments and trees
Need Mann-Whitney U tests or other statistical comparisons of phylogenetic metrics
Ask about bootstrap support, branch lengths, or tree topology
Need to build trees (NJ, UPGMA, parsimony) from alignments
Ask about Robinson-Foulds distance or tree comparison

BixBench Coverage: 33 questions across 8 projects (bix-4, bix-11, bix-12, bix-25, bix-35, bix-38, bix-45, bix-60)

NOT for (use other skills instead):

Multiple sequence alignment generation → Use external tools (MUSCLE, MAFFT, ClustalW)
Maximum Likelihood tree construction → Use IQ-TREE, RAxML, or PhyML
Bayesian phylogenetics → Use MrBayes or BEAST
Ancestral state reconstruction → Use separate tools

当用户有以下需求时适用：

拥有FASTA比对文件，询问简约信息位点、间隙或比对质量相关问题
拥有Newick树文件，询问树性、树长、进化速率或DVMC相关问题
询问树性/RCV、RCV或相对组成变异性相关内容
需要比较不同类群（如真菌vs动物）的系统发育指标
询问PhyKIT的功能（树性、rcv、dvmc、evo_rate、parsimony_informative、tree_length）
拥有包含配对比对和树的基因家族数据
需要对系统发育指标进行Mann-Whitney U检验或其他统计比较
询问自展支持度、分支长度或树拓扑结构相关问题
需要从比对构建树（NJ、UPGMA、简约法）
询问Robinson-Foulds距离或树比较相关内容

BixBench覆盖范围：8个项目中的33个问题（bix-4、bix-11、bix-12、bix-25、bix-35、bix-38、bix-45、bix-60）

不适用场景（请使用其他工具）：

多序列比对生成 → 使用外部工具（MUSCLE、MAFFT、ClustalW）
最大似然树构建 → 使用IQ-TREE、RAxML或PhyML
贝叶斯系统发育分析 → 使用MrBayes或BEAST
祖先状态重建 → 使用专用工具

Core Principles

核心原则

Data-first approach - Discover and validate all input files (alignments, trees) before any analysis
PhyKIT-compatible - Use PhyKIT functions for treeness, RCV, DVMC, parsimony, evolutionary rate (matches BixBench expected outputs)
Format-flexible - Support FASTA, PHYLIP, Nexus, Newick, and auto-detect formats
Batch processing - Process hundreds of gene alignments/trees in a single analysis
Statistical rigor - Mann-Whitney U, medians, percentiles, standard deviations with scipy.stats
Precision awareness - Match rounding to 4 decimal places (PhyKIT default) or as requested
Group comparison - Compare metrics between taxa groups (e.g., fungi vs animals)
Question-driven - Parse exactly what is asked and return the specific number/statistic

数据优先 - 在进行任何分析前，先发现并验证所有输入文件（比对、树）
兼容PhyKIT - 使用PhyKIT函数计算树性、RCV、DVMC、简约性、进化速率（与BixBench预期输出匹配）
格式灵活 - 支持FASTA、PHYLIP、Nexus、Newick格式，并可自动检测格式
批量处理 - 单次分析可处理数百个基因比对/树
统计严谨 - 使用scipy.stats进行Mann-Whitney U检验、中位数、百分位数、标准差计算
精度可控 - 默认保留4位小数（PhyKIT默认），或按需求调整
类群比较 - 支持不同类群（如真菌vs动物）的指标比较
以问题为导向 - 精准解析用户问题，返回指定的数值/统计结果

Required Python Packages

所需Python包

python

undefined

python

undefined

Core (MUST be installed)

核心包（必须安装）

import numpy as np import pandas as pd from scipy import stats from Bio import AlignIO, Phylo, SeqIO from Bio.Phylo.TreeConstruction import DistanceCalculator, DistanceTreeConstructor

PhyKIT (primary computation engine)

PhyKIT（主要计算引擎）

from phykit.services.tree.treeness import Treeness from phykit.services.tree.total_tree_length import TotalTreeLength from phykit.services.tree.evolutionary_rate import EvolutionaryRate from phykit.services.tree.dvmc import DVMC from phykit.services.tree.treeness_over_rcv import TreenessOverRCV from phykit.services.alignment.parsimony_informative_sites import ParsimonyInformative from phykit.services.alignment.rcv import RelativeCompositionVariability

DendroPy (for advanced tree operations)

DendroPy（用于高级树操作）

import dendropy

ToolUniverse (for sequence retrieval)

ToolUniverse（用于序列检索）

from tooluniverse import ToolUniverse


**Installation**:
```bash
pip install phykit dendropy biopython pandas numpy scipy

from tooluniverse import ToolUniverse


**安装命令**:
```bash
pip install phykit dendropy biopython pandas numpy scipy

High-Level Workflow Decision Tree

高层工作流决策树

START: User question about phylogenetic data
│
├─ Q1: What type of analysis is needed?
│  │
│  ├─ ALIGNMENT ANALYSIS (FASTA/PHYLIP files)
│  │  ├─ Parsimony informative sites → phykit_parsimony_informative()
│  │  ├─ RCV score → phykit_rcv()
│  │  ├─ Gap percentage → alignment_gap_percentage()
│  │  ├─ GC content → alignment_statistics()
│  │  └─ See: references/sequence_alignment.md
│  │
│  ├─ TREE ANALYSIS (Newick files)
│  │  ├─ Treeness → phykit_treeness()
│  │  ├─ Tree length → phykit_tree_length()
│  │  ├─ Evolutionary rate → phykit_evolutionary_rate()
│  │  ├─ DVMC → phykit_dvmc()
│  │  ├─ Bootstrap support → extract_bootstrap_support()
│  │  └─ See: references/tree_building.md
│  │
│  ├─ COMBINED ANALYSIS (alignment + tree)
│  │  └─ Treeness/RCV → phykit_treeness_over_rcv()
│  │
│  ├─ TREE CONSTRUCTION (build from alignment)
│  │  ├─ Neighbor-Joining → build_nj_tree()
│  │  ├─ UPGMA → build_upgma_tree()
│  │  ├─ Parsimony → build_parsimony_tree()
│  │  └─ See: references/tree_building.md
│  │
│  ├─ GROUP COMPARISON (fungi vs animals, etc.)
│  │  ├─ Batch compute metrics per group
│  │  ├─ Mann-Whitney U test
│  │  ├─ Summary statistics (median, mean, percentiles)
│  │  └─ See: references/parsimony_analysis.md
│  │
│  └─ TREE COMPARISON
│     ├─ Robinson-Foulds distance → robinson_foulds_distance()
│     └─ Bootstrap consensus → bootstrap_analysis()
│
├─ Q2: What data format is available?
│  ├─ FASTA (.fa, .fasta, .faa, .fna)
│  ├─ PHYLIP (.phy, .phylip) - Use phylip-relaxed for long names
│  ├─ Nexus (.nex, .nexus)
│  ├─ Newick (.nwk, .newick, .tre, .tree)
│  └─ Auto-detect with load_alignment() or load_tree()
│
└─ Q3: Is this a batch analysis?
   ├─ Single gene → Run metric function once
   ├─ Multiple genes → Use batch_compute_metric()
   └─ Group comparison → Use discover_gene_files() + compare_groups()

开始：用户提出系统发育数据相关问题
│
├─ 问题1：需要哪种类型的分析？
│  │
│  ├─ 比对分析（FASTA/PHYLIP文件）
│  │  ├─ 简约信息位点 → phykit_parsimony_informative()
│  │  ├─ RCV得分 → phykit_rcv()
│  │  ├─ 间隙百分比 → alignment_gap_percentage()
│  │  ├─ GC含量 → alignment_statistics()
│  │  └─ 详见：references/sequence_alignment.md
│  │
│  ├─ 树分析（Newick文件）
│  │  ├─ 树性 → phykit_treeness()
│  │  ├─ 树长 → phykit_tree_length()
│  │  ├─ 进化速率 → phykit_evolutionary_rate()
│  │  ├─ DVMC → phykit_dvmc()
│  │  ├─ 自展支持度 → extract_bootstrap_support()
│  │  └─ 详见：references/tree_building.md
│  │
│  ├─ 联合分析（比对 + 树）
│  │  └─ 树性/RCV → phykit_treeness_over_rcv()
│  │
│  ├─ 树构建（从比对生成）
│  │  ├─ 邻接法（Neighbor-Joining） → build_nj_tree()
│  │  ├─ UPGMA → build_upgma_tree()
│  │  ├─ 简约法 → build_parsimony_tree()
│  │  └─ 详见：references/tree_building.md
│  │
│  ├─ 类群比较（真菌vs动物等）
│  │  ├─ 批量计算各类群指标
│  │  ├─ Mann-Whitney U检验
│  │  ├─ 汇总统计（中位数、均值、百分位数）
│  │  └─ 详见：references/parsimony_analysis.md
│  │
│  └─ 树比较
│     ├─ Robinson-Foulds距离 → robinson_foulds_distance()
│     └─ 自展一致性 → bootstrap_analysis()
│
├─ 问题2：可用的数据格式是什么？
│  ├─ FASTA（.fa、.fasta、.faa、.fna）
│  ├─ PHYLIP（.phy、.phylip）- 长名称使用phylip-relaxed格式
│  ├─ Nexus（.nex、.nexus）
│  ├─ Newick（.nwk、.newick、.tre、.tree）
│  └─ 通过load_alignment()或load_tree()自动检测
│
└─ 问题3：是否为批量分析？
   ├─ 单个基因 → 运行一次指标函数
   ├─ 多个基因 → 使用batch_compute_metric()
   └─ 类群比较 → 使用discover_gene_files() + compare_groups()

Quick Reference: Common Metrics

快速参考：常用指标

Metric	Function	Input	Description
Treeness	`phykit_treeness(tree_file)`	Newick	Internal branch length / Total branch length
RCV	`phykit_rcv(aln_file)`	FASTA/PHYLIP	Relative Composition Variability
Treeness/RCV	`phykit_treeness_over_rcv(tree, aln)`	Both	Treeness divided by RCV
Tree Length	`phykit_tree_length(tree_file)`	Newick	Sum of all branch lengths
Evolutionary Rate	`phykit_evolutionary_rate(tree_file)`	Newick	Total branch length / num terminals
DVMC	`phykit_dvmc(tree_file)`	Newick	Degree of Violation of Molecular Clock
Parsimony Sites	`phykit_parsimony_informative(aln_file)`	FASTA/PHYLIP	Sites with ≥2 chars appearing ≥2 times
Gap Percentage	`alignment_gap_percentage(aln_file)`	FASTA/PHYLIP	Percentage of gap characters

See

scripts/tree_statistics.py

for implementation.

指标	函数	输入	描述
树性（Treeness）	`phykit_treeness(tree_file)`	Newick	内部分支长度 / 总分支长度
RCV	`phykit_rcv(aln_file)`	FASTA/PHYLIP	相对组成变异性
树性/RCV	`phykit_treeness_over_rcv(tree, aln)`	两者皆可	树性除以RCV
树长	`phykit_tree_length(tree_file)`	Newick	所有分支长度之和
进化速率	`phykit_evolutionary_rate(tree_file)`	Newick	总分支长度 / 终端节点数
DVMC	`phykit_dvmc(tree_file)`	Newick	分子钟违反程度
简约信息位点	`phykit_parsimony_informative(aln_file)`	FASTA/PHYLIP	出现至少2种字符且每种字符至少出现2次的位点
间隙百分比	`alignment_gap_percentage(aln_file)`	FASTA/PHYLIP	间隙字符的百分比

详见

scripts/tree_statistics.py

实现细节。

Common Analysis Patterns (BixBench)

常见分析模式（BixBench）

Pattern 1: Single Metric Across Groups

模式1：跨类群的单一指标分析

Question: "What is the median DVMC for fungi vs animals?"

Workflow:

python

undefined

问题："真菌和动物的中位DVMC分别是多少？"

工作流:

python

undefined

1. Discover files

1. 发现文件

fungi_genes = discover_gene_files("data/fungi") animal_genes = discover_gene_files("data/animals")

2. Compute metric

2. 计算指标

fungi_dvmc = batch_dvmc(fungi_genes) animal_dvmc = batch_dvmc(animal_genes)

3. Compare

3. 比较

fungi_values = list(fungi_dvmc.values()) animal_values = list(animal_dvmc.values())

print(f"Fungi median DVMC: {np.median(fungi_values):.4f}") print(f"Animal median DVMC: {np.median(animal_values):.4f}")


**See**: `references/parsimony_analysis.md` for full implementation

fungi_values = list(fungi_dvmc.values()) animal_values = list(animal_dvmc.values())

print(f"真菌中位DVMC: {np.median(fungi_values):.4f}") print(f"动物中位DVMC: {np.median(animal_values):.4f}")


**详见**：`references/parsimony_analysis.md`完整实现

Pattern 2: Statistical Comparison

模式2：统计比较

Question: "What is the Mann-Whitney U statistic comparing treeness between groups?"

Workflow:

python

from scipy import stats

问题："比较两类群树性的Mann-Whitney U统计量是多少？"

工作流:

python

from scipy import stats

Compute treeness for both groups

计算两类群的树性

group1_treeness = batch_treeness(group1_genes) group2_treeness = batch_treeness(group2_genes)

Mann-Whitney U test (two-sided)

Mann-Whitney U检验（双侧）

u_stat, p_value = stats.mannwhitneyu( list(group1_treeness.values()), list(group2_treeness.values()), alternative='two-sided' )

print(f"U statistic: {u_stat:.0f}") print(f"P-value: {p_value:.4e}")

undefined

u_stat, p_value = stats.mannwhitneyu( list(group1_treeness.values()), list(group2_treeness.values()), alternative='two-sided' )

print(f"U统计量: {u_stat:.0f}") print(f"P值: {p_value:.4e}")

undefined

Pattern 3: Filtering + Metric

模式3：过滤 + 指标计算

Question: "What is the treeness/RCV for alignments with <5% gaps?"

Workflow:

python

undefined

问题："间隙占比<5%的比对的树性/RCV比值是多少？"

工作流:

python

undefined

1. Filter by gap percentage

1. 按间隙百分比过滤

valid_genes = [] for entry in gene_files: if 'aln_file' in entry: gap_pct = alignment_gap_percentage(entry['aln_file']) if gap_pct < 5.0: valid_genes.append(entry)

2. Compute metric on filtered set

2. 对过滤后的集合计算指标

results = batch_treeness_over_rcv(valid_genes)

3. Report

3. 报告结果

values = [r[0] for r in results.values()] # treeness/rcv ratio print(f"Median treeness/RCV: {np.median(values):.4f}")

undefined

values = [r[0] for r in results.values()] # 树性/rcv比值 print(f"中位树性/RCV: {np.median(values):.4f}")

undefined

Pattern 4: Specific Gene Lookup

模式4：特定基因查询

Question: "What is the evolutionary rate for gene X?"

Workflow:

python

undefined

问题："基因X的进化速率是多少？"

工作流:

python

undefined

Find gene file

查找基因文件

gene_files = discover_gene_files("data/") gene_entry = [g for g in gene_files if g['gene_id'] == 'X'][0]

Compute metric

计算指标

evo_rate = phykit_evolutionary_rate(gene_entry['tree_file'])

print(f"Evolutionary rate for gene X: {evo_rate:.4f}")

---

evo_rate = phykit_evolutionary_rate(gene_entry['tree_file'])

print(f"基因X的进化速率: {evo_rate:.4f}")

---

Choosing Methods: When to Use What

方法选择指南

Alignment Methods

比对方法

When building alignments (use external tools, not this skill):

Method	Speed	Accuracy	Use Case
ClustalW	Slow	Medium	Small datasets (<100 sequences), educational
MUSCLE	Fast	High	Medium datasets (100-1000 sequences)
MAFFT	Very Fast	Very High	Recommended - Large datasets (>1000 sequences)

For this skill: Work with pre-aligned sequences. Use

load_alignment()

to read any format.

构建比对时（使用外部工具，非本工具）：

方法	速度	准确性	使用场景
ClustalW	慢	中等	小型数据集（<100条序列）、教学场景
MUSCLE	快	高	中型数据集（100-1000条序列）
MAFFT	极快	极高	推荐 - 大型数据集（>1000条序列）

本工具使用说明：处理预比对序列。使用

load_alignment()

读取任意格式。

Tree Building Methods

树构建方法

When to use which tree method:

Method	Speed	Accuracy	Use Case
Neighbor-Joining	Fast	Medium	Quick trees, large datasets, exploratory
UPGMA	Fast	Low	Assumes molecular clock, special cases only
Maximum Parsimony	Medium	Medium	Small datasets, discrete characters
Maximum Likelihood	Slow	High	Use external tools (IQ-TREE, RAxML) for production

Implementation in this skill:

python

undefined

树构建方法选择：

方法	速度	准确性	使用场景
邻接法（Neighbor-Joining）	快	中等	快速生成树、大型数据集、探索性分析
UPGMA	快	低	仅适用于假设分子钟的特殊场景
最大简约法	中等	中等	小型数据集、离散性状
最大似然法	慢	高	使用外部工具（IQ-TREE、RAxML）用于生产环境

本工具实现:

python

undefined

Fast distance-based trees

基于距离的快速树构建

tree = build_nj_tree("alignment.fa") # Neighbor-Joining tree = build_upgma_tree("alignment.fa") # UPGMA

tree = build_nj_tree("alignment.fa") # 邻接法 tree = build_upgma_tree("alignment.fa") # UPGMA

Parsimony (for small alignments)

简约法（适用于小型比对）

tree = build_parsimony_tree("alignment.fa")


**For production ML trees**: Use IQ-TREE or RAxML externally, then analyze with this skill.

See `references/tree_building.md` for detailed implementations.

---

tree = build_parsimony_tree("alignment.fa")


**生产环境ML树**：使用外部工具IQ-TREE或RAxML构建，再用本工具分析。

详见`references/tree_building.md`的详细实现。

---

Batch Processing

批量处理

Discovering Gene Files

发现基因文件

python

undefined

python

undefined

Auto-discover paired alignment + tree files

自动发现配对的比对 + 树文件

gene_files = discover_gene_files("data/")

Result: list of dicts with 'gene_id', 'aln_file', 'tree_file'

结果：包含'gene_id'、'aln_file'、'tree_file'的字典列表

[

{'gene_id': 'gene1', 'aln_file': 'gene1.fa', 'tree_file': 'gene1.nwk'},

{'gene_id': 'gene2', 'aln_file': 'gene2.fa', 'tree_file': 'gene2.nwk'},

...

]

undefined

undefined

Computing Metrics in Batch

批量计算指标

python

undefined

python

undefined

Tree metrics

树指标

treeness_results = batch_treeness(gene_files) tree_length_results = batch_tree_length(gene_files) dvmc_results = batch_dvmc(gene_files) evo_rate_results = batch_evolutionary_rate(gene_files)

Alignment metrics

比对指标

rcv_results = batch_rcv(gene_files) pi_results = batch_parsimony_informative(gene_files) gap_results = batch_gap_percentage(gene_files)

Combined metrics

联合指标

treeness_rcv_results = batch_treeness_over_rcv(gene_files)

All return dict: {gene_id: value}

所有结果均返回字典：{gene_id: 数值}

undefined

undefined

Statistical Analysis

统计分析

python

undefined

python

undefined

Summary statistics

汇总统计

stats = summary_stats(list(treeness_results.values()))

Returns: {'mean': ..., 'median': ..., 'std': ..., 'min': ..., 'max': ...}

返回：{'mean': ..., 'median': ..., 'std': ..., 'min': ..., 'max': ...}

Group comparison

类群比较

comparison = compare_groups( list(fungi_treeness.values()), list(animal_treeness.values()), group1_name="Fungi", group2_name="Animals" )

Returns: {'u_statistic': ..., 'p_value': ..., 'Fungi': {...}, 'Animals': {...}}

返回：{'u_statistic': ..., 'p_value': ..., 'Fungi': {...}, 'Animals': {...}}


See `references/parsimony_analysis.md` for full workflow.

---


详见`references/parsimony_analysis.md`完整工作流。

---

Answer Extraction for BixBench

BixBench问题提取规则

Question Pattern	Extraction Method
"What is the median X?"	`np.median(values)`
"What is the maximum X?"	`np.max(values)`
"What is the difference between median X for A vs B?"	`abs(np.median(a) - np.median(b))`
"What percentage of X have Y above Z?"	`sum(v > Z for v in values) / len(values) * 100`
"What is the Mann-Whitney U statistic?"	`stats.mannwhitneyu(a, b)[0]`
"What is the p-value?"	`stats.mannwhitneyu(a, b)[1]`
"What is the X value for gene Y?"	`results[gene_id]`
"What is the fold-change in median X?"	`np.median(a) / np.median(b)`
"multiplied by 1000"	`round(value * 1000)`

问题模式	提取方法
"X的中位数是多少？"	`np.median(values)`
"X的最大值是多少？"	`np.max(values)`
"A和B的X中位数之差是多少？"	`abs(np.median(a) - np.median(b))`
"有多少百分比的X的Y值高于Z？"	`sum(v > Z for v in values) / len(values) * 100`
"Mann-Whitney U统计量是多少？"	`stats.mannwhitneyu(a, b)[0]`
"P值是多少？"	`stats.mannwhitneyu(a, b)[1]`
"基因Y的X值是多少？"	`results[gene_id]`
"X中位数的倍数变化是多少？"	`np.median(a) / np.median(b)`
"乘以1000"	`round(value * 1000)`

Rounding Rules

舍入规则

PhyKIT default: 4 decimal places
Percentages: Match question format (e.g., "35%" → integer, "3.5%" → 1 decimal)
P-values: Scientific notation for very small values
U statistics: Integer (no decimals)
Always check question wording: "rounded to 3 decimal places" overrides defaults

PhyKIT默认：4位小数
百分比：匹配问题格式（如"35%" → 整数，"3.5%" → 1位小数）
P值：极小值使用科学计数法
U统计量：整数（无小数）
始终检查问题表述："保留3位小数"的要求优先于默认规则

BixBench Question Coverage

BixBench问题覆盖情况

Project	Questions	Metrics
bix-4	7	DVMC analysis (fungi vs animals)
bix-11	6	Treeness analysis (median, percentages, Mann-Whitney U)
bix-12	5	Parsimony informative sites (counts, percentages, ratios)
bix-25	2	Treeness/RCV with gap filtering
bix-35	4	Evolutionary rate (specific genes, comparisons)
bix-38	5	Tree length (fold-change, variance, paired ratios)
bix-45	4	RCV (Mann-Whitney U, medians, paired differences)
bix-60	1	Average treeness across multiple trees

项目	问题数量	指标
bix-4	7	DVMC分析（真菌vs动物）
bix-11	6	树性分析（中位数、百分比、Mann-Whitney U）
bix-12	5	简约信息位点（数量、百分比、比值）
bix-25	2	带间隙过滤的树性/RCV
bix-35	4	进化速率（特定基因、比较）
bix-38	5	树长（倍数变化、方差、配对比值）
bix-45	4	RCV（Mann-Whitney U、中位数、配对差异）
bix-60	1	多棵树的平均树性

ToolUniverse Integration

ToolUniverse集成

Sequence Retrieval

序列检索

python

from tooluniverse import ToolUniverse

tu = ToolUniverse()
tu.load_tools()

python

from tooluniverse import ToolUniverse

tu = ToolUniverse()
tu.load_tools()

Get sequences from NCBI

从NCBI获取序列

result = tu.tools.NCBI_get_sequence(accession="NP_000546")

Get gene tree from Ensembl

从Ensembl获取基因树

tree_result = tu.tools.EnsemblCompara_get_gene_tree(gene="ENSG00000141510")

Get species tree from OpenTree

从OpenTree获取物种树

tree_result = tu.tools.OpenTree_get_induced_subtree(ott_ids="770315,770319")

---

tree_result = tu.tools.OpenTree_get_induced_subtree(ott_ids="770315,770319")

---

File Structure

文件结构

tooluniverse-phylogenetics/
├── SKILL.md                           # This file (workflow orchestration)
├── QUICK_START.md                     # Quick reference
├── test_phylogenetics.py             # Comprehensive test suite
├── references/
│   ├── sequence_alignment.md         # Alignment analysis details
│   ├── tree_building.md              # Tree construction methods
│   ├── parsimony_analysis.md         # Statistical comparison workflows
│   └── troubleshooting.md            # Common issues and solutions
└── scripts/
    ├── format_alignment.py           # Alignment format conversion
    └── tree_statistics.py            # Core metric implementations

tooluniverse-phylogenetics/
├── SKILL.md                           # 本文档（工作流编排）
├── QUICK_START.md                     # 快速参考
├── test_phylogenetics.py             # 综合测试套件
├── references/
│   ├── sequence_alignment.md         # 比对分析细节
│   ├── tree_building.md              # 树构建方法
│   ├── parsimony_analysis.md         # 统计比较工作流
│   └── troubleshooting.md            # 常见问题与解决方案
└── scripts/
    ├── format_alignment.py           # 比对格式转换
    └── tree_statistics.py            # 核心指标实现

Completeness Checklist

完整性检查清单

Before returning your answer, verify:

Identified all input files (alignments and/or trees)
Detected group structure (fungi/animals/etc.) if applicable
Used correct PhyKIT function for the requested metric
Processed ALL genes in each group (not just a sample)
Applied correct statistical test if comparison requested
Used correct rounding (4 decimals default, or as specified)
Returned the specific statistic asked for (median, max, U stat, p-value, etc.)
For percentage questions, confirmed whether answer is integer or decimal
For "difference" questions, confirmed direction (A - B vs abs difference)
For Mann-Whitney U, used
```
alternative='two-sided'
```
(default in scipy)

返回答案前，请验证：

已识别所有输入文件（比对和/或树）
若适用，已检测类群结构（真菌/动物等）
对请求的指标使用了正确的PhyKIT函数
处理了每个类群中的所有基因（而非仅样本）
若涉及比较，使用了正确的统计检验
使用了正确的舍入规则（默认4位小数，或按指定要求）
返回了问题要求的特定统计量（中位数、最大值、U统计量、P值等）
对于百分比问题，确认答案为整数或小数
对于"差异"问题，确认方向（A-B vs 绝对差异）
对于Mann-Whitney U检验，使用了
```
alternative='two-sided'
```
（scipy默认）

Next Steps

下一步

For detailed alignment analysis workflows → See
```
references/sequence_alignment.md
```
For tree construction methods → See
```
references/tree_building.md
```
For statistical comparison examples → See
```
references/parsimony_analysis.md
```
For common errors and solutions → See
```
references/troubleshooting.md
```
For script implementations → See
```
scripts/tree_statistics.py
```

详细比对分析工作流 → 详见
```
references/sequence_alignment.md
```
树构建方法 → 详见
```
references/tree_building.md
```
统计比较示例 → 详见
```
references/parsimony_analysis.md
```
常见错误与解决方案 → 详见
```
references/troubleshooting.md
```
脚本实现 → 详见
```
scripts/tree_statistics.py
```

Support

支持

For issues with:

PhyKIT functions: Check PhyKIT documentation at https://jlsteenwyk.com/PhyKIT/
Biopython tree/alignment parsing: See https://biopython.org/wiki/Phylo
DendroPy operations: See https://dendropy.org/
ToolUniverse integration: Check ToolUniverse documentation

若遇到以下问题：

PhyKIT函数：查看PhyKIT文档 https://jlsteenwyk.com/PhyKIT/
Biopython树/比对解析：详见 https://biopython.org/wiki/Phylo
DendroPy操作：详见 https://dendropy.org/
ToolUniverse集成：查看ToolUniverse文档

License

许可证

Same as ToolUniverse framework license.

与ToolUniverse框架许可证相同。