etetoolkit

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

ETE Toolkit Skill

ETE Toolkit 技能文档

Overview

概述

ETE (Environment for Tree Exploration) is a toolkit for phylogenetic and hierarchical tree analysis. Manipulate trees, analyze evolutionary events, visualize results, and integrate with biological databases for phylogenomic research and clustering analysis.
ETE(Environment for Tree Exploration)是一款用于系统发育树和层次树分析的工具包。可进行树操作、进化事件分析、结果可视化,并与生物数据库集成,用于系统发育组学研究和聚类分析。

Core Capabilities

核心功能

1. Tree Manipulation and Analysis

1. 树操作与分析

Load, manipulate, and analyze hierarchical tree structures with support for:
  • Tree I/O: Read and write Newick, NHX, PhyloXML, and NeXML formats
  • Tree traversal: Navigate trees using preorder, postorder, or levelorder strategies
  • Topology modification: Prune, root, collapse nodes, resolve polytomies
  • Distance calculations: Compute branch lengths and topological distances between nodes
  • Tree comparison: Calculate Robinson-Foulds distances and identify topological differences
Common patterns:
python
from ete3 import Tree
加载、操作和分析层次树结构,支持:
  • 树I/O:读写Newick、NHX、PhyloXML和NeXML格式
  • 树遍历:使用前序、后序或层次序策略遍历树
  • 拓扑修改:剪枝、定根、合并节点、解决多歧节点
  • 距离计算:计算分支长度和节点间的拓扑距离
  • 树比较:计算Robinson-Foulds距离并识别拓扑差异
常见使用模式:
python
from ete3 import Tree

Load tree from file

从文件加载树

tree = Tree("tree.nw", format=1)
tree = Tree("tree.nw", format=1)

Basic statistics

基础统计

print(f"Leaves: {len(tree)}") print(f"Total nodes: {len(list(tree.traverse()))}")
print(f"叶子节点数: {len(tree)}") print(f"总节点数: {len(list(tree.traverse()))}")

Prune to taxa of interest

剪枝保留目标类群

taxa_to_keep = ["species1", "species2", "species3"] tree.prune(taxa_to_keep, preserve_branch_length=True)
taxa_to_keep = ["species1", "species2", "species3"] tree.prune(taxa_to_keep, preserve_branch_length=True)

Midpoint root

中点定根

midpoint = tree.get_midpoint_outgroup() tree.set_outgroup(midpoint)
midpoint = tree.get_midpoint_outgroup() tree.set_outgroup(midpoint)

Save modified tree

保存修改后的树

tree.write(outfile="rooted_tree.nw")

Use `scripts/tree_operations.py` for command-line tree manipulation:

```bash
tree.write(outfile="rooted_tree.nw")

使用 `scripts/tree_operations.py` 进行命令行树操作:

```bash

Display tree statistics

显示树统计信息

python scripts/tree_operations.py stats tree.nw
python scripts/tree_operations.py stats tree.nw

Convert format

转换格式

python scripts/tree_operations.py convert tree.nw output.nw --in-format 0 --out-format 1
python scripts/tree_operations.py convert tree.nw output.nw --in-format 0 --out-format 1

Reroot tree

重新定根

python scripts/tree_operations.py reroot tree.nw rooted.nw --midpoint
python scripts/tree_operations.py reroot tree.nw rooted.nw --midpoint

Prune to specific taxa

剪枝保留特定类群

python scripts/tree_operations.py prune tree.nw pruned.nw --keep-taxa "sp1,sp2,sp3"
python scripts/tree_operations.py prune tree.nw pruned.nw --keep-taxa "sp1,sp2,sp3"

Show ASCII visualization

显示ASCII格式树

python scripts/tree_operations.py ascii tree.nw
undefined
python scripts/tree_operations.py ascii tree.nw
undefined

2. Phylogenetic Analysis

2. 系统发育分析

Analyze gene trees with evolutionary event detection:
  • Sequence alignment integration: Link trees to multiple sequence alignments (FASTA, Phylip)
  • Species naming: Automatic or custom species extraction from gene names
  • Evolutionary events: Detect duplication and speciation events using Species Overlap or tree reconciliation
  • Orthology detection: Identify orthologs and paralogs based on evolutionary events
  • Gene family analysis: Split trees by duplications, collapse lineage-specific expansions
Workflow for gene tree analysis:
python
from ete3 import PhyloTree
分析基因树并检测进化事件:
  • 序列比对集成:将树与多序列比对(FASTA、Phylip格式)关联
  • 物种命名:从基因名称中自动或自定义提取物种信息
  • 进化事件检测:使用物种重叠法或树 reconciliation 检测重复和物种形成事件
  • 直系同源检测:基于进化事件识别直系同源和旁系同源基因
  • 基因家族分析:按重复事件拆分树,合并谱系特异性扩张分支
基因树分析工作流:
python
from ete3 import PhyloTree

Load gene tree with alignment

加载带比对信息的基因树

tree = PhyloTree("gene_tree.nw", alignment="alignment.fasta")
tree = PhyloTree("gene_tree.nw", alignment="alignment.fasta")

Set species naming function

设置物种命名函数

def get_species(gene_name): return gene_name.split("_")[0]
tree.set_species_naming_function(get_species)
def get_species(gene_name): return gene_name.split("_")[0]
tree.set_species_naming_function(get_species)

Detect evolutionary events

检测进化事件

events = tree.get_descendant_evol_events()
events = tree.get_descendant_evol_events()

Analyze events

分析事件

for node in tree.traverse(): if hasattr(node, "evoltype"): if node.evoltype == "D": print(f"Duplication at {node.name}") elif node.evoltype == "S": print(f"Speciation at {node.name}")
for node in tree.traverse(): if hasattr(node, "evoltype"): if node.evoltype == "D": print(f"节点 {node.name} 处发生重复事件") elif node.evoltype == "S": print(f"节点 {node.name} 处发生物种形成事件")

Extract ortholog groups

提取直系同源组

ortho_groups = tree.get_speciation_trees() for i, ortho_tree in enumerate(ortho_groups): ortho_tree.write(outfile=f"ortholog_group_{i}.nw")

**Finding orthologs and paralogs:**

```python
ortho_groups = tree.get_speciation_trees() for i, ortho_tree in enumerate(ortho_groups): ortho_tree.write(outfile=f"ortholog_group_{i}.nw")

**查找直系同源和旁系同源基因:**

```python

Find orthologs to query gene

查找查询基因的同源基因

query = tree & "species1_gene1"
orthologs = [] paralogs = []
for event in events: if query in event.in_seqs: if event.etype == "S": orthologs.extend([s for s in event.out_seqs if s != query]) elif event.etype == "D": paralogs.extend([s for s in event.out_seqs if s != query])
undefined
query = tree & "species1_gene1"
orthologs = [] paralogs = []
for event in events: if query in event.in_seqs: if event.etype == "S": orthologs.extend([s for s in event.out_seqs if s != query]) elif event.etype == "D": paralogs.extend([s for s in event.out_seqs if s != query])
undefined

3. NCBI Taxonomy Integration

3. NCBI分类学集成

Integrate taxonomic information from NCBI Taxonomy database:
  • Database access: Automatic download and local caching of NCBI taxonomy (~300MB)
  • Taxid/name translation: Convert between taxonomic IDs and scientific names
  • Lineage retrieval: Get complete evolutionary lineages
  • Taxonomy trees: Build species trees connecting specified taxa
  • Tree annotation: Automatically annotate trees with taxonomic information
Building taxonomy-based trees:
python
from ete3 import NCBITaxa

ncbi = NCBITaxa()
集成NCBI分类学数据库的分类信息:
  • 数据库访问:自动下载并本地缓存NCBI分类学数据库(约300MB)
  • TaxID/名称转换:在分类学ID和科学名称之间转换
  • 谱系检索:获取完整的进化谱系
  • 分类学树构建:构建连接指定类群的物种树
  • 树注释:自动用分类学信息注释树节点
基于分类学构建树:
python
from ete3 import NCBITaxa

ncbi = NCBITaxa()

Build tree from species names

从物种名称构建树

species = ["Homo sapiens", "Pan troglodytes", "Mus musculus"] name2taxid = ncbi.get_name_translator(species) taxids = [name2taxid[sp][0] for sp in species]
species = ["Homo sapiens", "Pan troglodytes", "Mus musculus"] name2taxid = ncbi.get_name_translator(species) taxids = [name2taxid[sp][0] for sp in species]

Get minimal tree connecting taxa

获取连接类群的最小树

tree = ncbi.get_topology(taxids)
tree = ncbi.get_topology(taxids)

Annotate nodes with taxonomy info

用分类学信息注释节点

for node in tree.traverse(): if hasattr(node, "sci_name"): print(f"{node.sci_name} - Rank: {node.rank} - TaxID: {node.taxid}")

**Annotating existing trees:**

```python
for node in tree.traverse(): if hasattr(node, "sci_name"): print(f"{node.sci_name} - 分类等级: {node.rank} - TaxID: {node.taxid}")

**注释现有树:**

```python

Get taxonomy info for tree leaves

获取树叶子节点的分类学信息

for leaf in tree: species = extract_species_from_name(leaf.name) taxid = ncbi.get_name_translator([species])[species][0]
# Get lineage
lineage = ncbi.get_lineage(taxid)
ranks = ncbi.get_rank(lineage)
names = ncbi.get_taxid_translator(lineage)

# Add to node
leaf.add_feature("taxid", taxid)
leaf.add_feature("lineage", [names[t] for t in lineage])
undefined
for leaf in tree: species = extract_species_from_name(leaf.name) taxid = ncbi.get_name_translator([species])[species][0]
# 获取谱系
lineage = ncbi.get_lineage(taxid)
ranks = ncbi.get_rank(lineage)
names = ncbi.get_taxid_translator(lineage)

# 添加到节点
leaf.add_feature("taxid", taxid)
leaf.add_feature("lineage", [names[t] for t in lineage])
undefined

4. Tree Visualization

4. 树可视化

Create publication-quality tree visualizations:
  • Output formats: PNG (raster), PDF, and SVG (vector) for publications
  • Layout modes: Rectangular and circular tree layouts
  • Interactive GUI: Explore trees interactively with zoom, pan, and search
  • Custom styling: NodeStyle for node appearance (colors, shapes, sizes)
  • Faces: Add graphical elements (text, images, charts, heatmaps) to nodes
  • Layout functions: Dynamic styling based on node properties
Basic visualization workflow:
python
from ete3 import Tree, TreeStyle, NodeStyle

tree = Tree("tree.nw")
创建可用于出版物的树可视化结果:
  • 输出格式:PNG(光栅图)、PDF和SVG(矢量图),适用于出版物
  • 布局模式:矩形和圆形树布局
  • 交互式GUI:通过缩放、平移和搜索功能交互式探索树
  • 自定义样式:使用NodeStyle设置节点外观(颜色、形状、大小)
  • Faces:向节点添加图形元素(文本、图片、图表、热图)
  • 布局函数:基于节点属性进行动态样式设置
基础可视化工作流:
python
from ete3 import Tree, TreeStyle, NodeStyle

tree = Tree("tree.nw")

Configure tree style

配置树样式

ts = TreeStyle() ts.show_leaf_name = True ts.show_branch_support = True ts.scale = 50 # pixels per branch length unit
ts = TreeStyle() ts.show_leaf_name = True ts.show_branch_support = True ts.scale = 50 # 每个分支长度单位对应的像素数

Style nodes

设置节点样式

for node in tree.traverse(): nstyle = NodeStyle()
if node.is_leaf():
    nstyle["fgcolor"] = "blue"
    nstyle["size"] = 8
else:
    # Color by support
    if node.support > 0.9:
        nstyle["fgcolor"] = "darkgreen"
    else:
        nstyle["fgcolor"] = "red"
    nstyle["size"] = 5

node.set_style(nstyle)
for node in tree.traverse(): nstyle = NodeStyle()
if node.is_leaf():
    nstyle["fgcolor"] = "blue"
    nstyle["size"] = 8
else:
    # 按支持值着色
    if node.support > 0.9:
        nstyle["fgcolor"] = "darkgreen"
    else:
        nstyle["fgcolor"] = "red"
    nstyle["size"] = 5

node.set_style(nstyle)

Render to file

渲染到文件

tree.render("tree.pdf", tree_style=ts) tree.render("tree.png", w=800, h=600, units="px", dpi=300)

Use `scripts/quick_visualize.py` for rapid visualization:

```bash
tree.render("tree.pdf", tree_style=ts) tree.render("tree.png", w=800, h=600, units="px", dpi=300)

使用 `scripts/quick_visualize.py` 快速可视化:

```bash

Basic visualization

基础可视化

python scripts/quick_visualize.py tree.nw output.pdf
python scripts/quick_visualize.py tree.nw output.pdf

Circular layout with custom styling

带自定义样式的圆形布局

python scripts/quick_visualize.py tree.nw output.pdf --mode c --color-by-support
python scripts/quick_visualize.py tree.nw output.pdf --mode c --color-by-support

High-resolution PNG

高分辨率PNG

python scripts/quick_visualize.py tree.nw output.png --width 1200 --height 800 --units px --dpi 300
python scripts/quick_visualize.py tree.nw output.png --width 1200 --height 800 --units px --dpi 300

Custom title and styling

自定义标题和样式

python scripts/quick_visualize.py tree.nw output.pdf --title "Species Phylogeny" --show-support

**Advanced visualization with faces:**

```python
from ete3 import Tree, TreeStyle, TextFace, CircleFace

tree = Tree("tree.nw")
python scripts/quick_visualize.py tree.nw output.pdf --title "物种系统发育树" --show-support

**使用Faces进行高级可视化:**

```python
from ete3 import Tree, TreeStyle, TextFace, CircleFace

tree = Tree("tree.nw")

Add features to nodes

向节点添加属性

for leaf in tree: leaf.add_feature("habitat", "marine" if "fish" in leaf.name else "land")
for leaf in tree: leaf.add_feature("habitat", "marine" if "fish" in leaf.name else "land")

Layout function

布局函数

def layout(node): if node.is_leaf(): # Add colored circle color = "blue" if node.habitat == "marine" else "green" circle = CircleFace(radius=5, color=color) node.add_face(circle, column=0, position="aligned")
    # Add label
    label = TextFace(node.name, fsize=10)
    node.add_face(label, column=1, position="aligned")
ts = TreeStyle() ts.layout_fn = layout ts.show_leaf_name = False
tree.render("annotated_tree.pdf", tree_style=ts)
undefined
def layout(node): if node.is_leaf(): # 添加彩色圆圈 color = "blue" if node.habitat == "marine" else "green" circle = CircleFace(radius=5, color=color) node.add_face(circle, column=0, position="aligned")
    # 添加标签
    label = TextFace(node.name, fsize=10)
    node.add_face(label, column=1, position="aligned")
ts = TreeStyle() ts.layout_fn = layout ts.show_leaf_name = False
tree.render("annotated_tree.pdf", tree_style=ts)
undefined

5. Clustering Analysis

5. 聚类分析

Analyze hierarchical clustering results with data integration:
  • ClusterTree: Specialized class for clustering dendrograms
  • Data matrix linking: Connect tree leaves to numerical profiles
  • Cluster metrics: Silhouette coefficient, Dunn index, inter/intra-cluster distances
  • Validation: Test cluster quality with different distance metrics
  • Heatmap visualization: Display data matrices alongside trees
Clustering workflow:
python
from ete3 import ClusterTree
分析层次聚类结果并集成数据:
  • ClusterTree:用于聚类树状图的专用类
  • 数据矩阵关联:将树叶子节点与数值特征关联
  • 聚类指标:轮廓系数、Dunn指数、簇间/簇内距离
  • 验证:使用不同距离指标测试聚类质量
  • 热图可视化:在树旁显示数据矩阵
聚类工作流:
python
from ete3 import ClusterTree

Load tree with data matrix

加载带数据矩阵的树

matrix = """#Names\tSample1\tSample2\tSample3 Gene1\t1.5\t2.3\t0.8 Gene2\t0.9\t1.1\t1.8 Gene3\t2.1\t2.5\t0.5"""
tree = ClusterTree("((Gene1,Gene2),Gene3);", text_array=matrix)
matrix = """#Names\tSample1\tSample2\tSample3 Gene1\t1.5\t2.3\t0.8 Gene2\t0.9\t1.1\t1.8 Gene3\t2.1\t2.5\t0.5"""
tree = ClusterTree("((Gene1,Gene2),Gene3);", text_array=matrix)

Evaluate cluster quality

评估聚类质量

for node in tree.traverse(): if not node.is_leaf(): silhouette = node.get_silhouette() dunn = node.get_dunn()
    print(f"Cluster: {node.name}")
    print(f"  Silhouette: {silhouette:.3f}")
    print(f"  Dunn index: {dunn:.3f}")
for node in tree.traverse(): if not node.is_leaf(): silhouette = node.get_silhouette() dunn = node.get_dunn()
    print(f"聚类簇: {node.name}")
    print(f"  轮廓系数: {silhouette:.3f}")
    print(f"  Dunn指数: {dunn:.3f}")

Visualize with heatmap

带热图的可视化

tree.show("heatmap")
undefined
tree.show("heatmap")
undefined

6. Tree Comparison

6. 树比较

Quantify topological differences between trees:
  • Robinson-Foulds distance: Standard metric for tree comparison
  • Normalized RF: Scale-invariant distance (0.0 to 1.0)
  • Partition analysis: Identify unique and shared bipartitions
  • Consensus trees: Analyze support across multiple trees
  • Batch comparison: Compare multiple trees pairwise
Compare two trees:
python
from ete3 import Tree

tree1 = Tree("tree1.nw")
tree2 = Tree("tree2.nw")
量化树之间的拓扑差异:
  • Robinson-Foulds距离:树比较的标准指标
  • 归一化RF距离:尺度不变的距离(0.0到1.0)
  • 分区分析:识别唯一和共享的二分分区
  • 共识树:分析多棵树的支持度
  • 批量比较:成对比较多棵树
比较两棵树:
python
from ete3 import Tree

tree1 = Tree("tree1.nw")
tree2 = Tree("tree2.nw")

Calculate RF distance

计算RF距离

rf, max_rf, common_leaves, parts_t1, parts_t2 = tree1.robinson_foulds(tree2)
print(f"RF distance: {rf}/{max_rf}") print(f"Normalized RF: {rf/max_rf:.3f}") print(f"Common leaves: {len(common_leaves)}")
rf, max_rf, common_leaves, parts_t1, parts_t2 = tree1.robinson_foulds(tree2)
print(f"RF距离: {rf}/{max_rf}") print(f"归一化RF距离: {rf/max_rf:.3f}") print(f"共同叶子节点数: {len(common_leaves)}")

Find unique partitions

查找唯一分区

unique_t1 = parts_t1 - parts_t2 unique_t2 = parts_t2 - parts_t1
print(f"Unique to tree1: {len(unique_t1)}") print(f"Unique to tree2: {len(unique_t2)}")

**Compare multiple trees:**

```python
import numpy as np

trees = [Tree(f"tree{i}.nw") for i in range(4)]
unique_t1 = parts_t1 - parts_t2 unique_t2 = parts_t2 - parts_t1
print(f"Tree1独有的分区数: {len(unique_t1)}") print(f"Tree2独有的分区数: {len(unique_t2)}")

**比较多棵树:**

```python
import numpy as np

trees = [Tree(f"tree{i}.nw") for i in range(4)]

Create distance matrix

创建距离矩阵

n = len(trees) dist_matrix = np.zeros((n, n))
for i in range(n): for j in range(i+1, n): rf, max_rf, _, _, _ = trees[i].robinson_foulds(trees[j]) norm_rf = rf / max_rf if max_rf > 0 else 0 dist_matrix[i, j] = norm_rf dist_matrix[j, i] = norm_rf
undefined
n = len(trees) dist_matrix = np.zeros((n, n))
for i in range(n): for j in range(i+1, n): rf, max_rf, _, _, _ = trees[i].robinson_foulds(trees[j]) norm_rf = rf / max_rf if max_rf > 0 else 0 dist_matrix[i, j] = norm_rf dist_matrix[j, i] = norm_rf
undefined

Installation and Setup

安装与设置

Install ETE toolkit:
bash
undefined
安装ETE工具包:
bash
undefined

Basic installation

基础安装

uv pip install ete3
uv pip install ete3

With external dependencies for rendering (optional but recommended)

安装渲染所需的外部依赖(可选但推荐)

On macOS:

在macOS上:

brew install qt@5
brew install qt@5

On Ubuntu/Debian:

在Ubuntu/Debian上:

sudo apt-get install python3-pyqt5 python3-pyqt5.qtsvg
sudo apt-get install python3-pyqt5 python3-pyqt5.qtsvg

For full features including GUI

安装包含GUI的完整功能版本

uv pip install ete3[gui]

**First-time NCBI Taxonomy setup:**

The first time NCBITaxa is instantiated, it automatically downloads the NCBI taxonomy database (~300MB) to `~/.etetoolkit/taxa.sqlite`. This happens only once:

```python
from ete3 import NCBITaxa
ncbi = NCBITaxa()  # Downloads database on first run
Update taxonomy database:
python
ncbi.update_taxonomy_database()  # Download latest NCBI data
uv pip install ete3[gui]

**首次NCBI分类学设置:**

首次实例化NCBITaxa时,会自动下载NCBI分类学数据库(约300MB)到 `~/.etetoolkit/taxa.sqlite`,此操作仅执行一次:

```python
from ete3 import NCBITaxa
ncbi = NCBITaxa()  # 首次运行时下载数据库
更新分类学数据库:
python
ncbi.update_taxonomy_database()  # 下载最新的NCBI数据

Common Use Cases

常见用例

Use Case 1: Phylogenomic Pipeline

用例1:系统发育组学流程

Complete workflow from gene tree to ortholog identification:
python
from ete3 import PhyloTree, NCBITaxa
从基因树到直系同源识别的完整工作流:
python
from ete3 import PhyloTree, NCBITaxa

1. Load gene tree with alignment

1. 加载带比对信息的基因树

tree = PhyloTree("gene_tree.nw", alignment="alignment.fasta")
tree = PhyloTree("gene_tree.nw", alignment="alignment.fasta")

2. Configure species naming

2. 配置物种命名

tree.set_species_naming_function(lambda x: x.split("_")[0])
tree.set_species_naming_function(lambda x: x.split("_")[0])

3. Detect evolutionary events

3. 检测进化事件

tree.get_descendant_evol_events()
tree.get_descendant_evol_events()

4. Annotate with taxonomy

4. 用分类学信息注释

ncbi = NCBITaxa() for leaf in tree: if leaf.species in species_to_taxid: taxid = species_to_taxid[leaf.species] lineage = ncbi.get_lineage(taxid) leaf.add_feature("lineage", lineage)
ncbi = NCBITaxa() for leaf in tree: if leaf.species in species_to_taxid: taxid = species_to_taxid[leaf.species] lineage = ncbi.get_lineage(taxid) leaf.add_feature("lineage", lineage)

5. Extract ortholog groups

5. 提取直系同源组

ortho_groups = tree.get_speciation_trees()
ortho_groups = tree.get_speciation_trees()

6. Save and visualize

6. 保存并可视化

for i, ortho in enumerate(ortho_groups): ortho.write(outfile=f"ortho_{i}.nw")
undefined
for i, ortho in enumerate(ortho_groups): ortho.write(outfile=f"ortho_{i}.nw")
undefined

Use Case 2: Tree Preprocessing and Formatting

用例2:树预处理与格式化

Batch process trees for analysis:
bash
undefined
批量处理树用于后续分析:
bash
undefined

Convert format

转换格式

python scripts/tree_operations.py convert input.nw output.nw --in-format 0 --out-format 1
python scripts/tree_operations.py convert input.nw output.nw --in-format 0 --out-format 1

Root at midpoint

中点定根

python scripts/tree_operations.py reroot input.nw rooted.nw --midpoint
python scripts/tree_operations.py reroot input.nw rooted.nw --midpoint

Prune to focal taxa

剪枝保留核心类群

python scripts/tree_operations.py prune rooted.nw pruned.nw --keep-taxa taxa_list.txt
python scripts/tree_operations.py prune rooted.nw pruned.nw --keep-taxa taxa_list.txt

Get statistics

获取统计信息

python scripts/tree_operations.py stats pruned.nw
undefined
python scripts/tree_operations.py stats pruned.nw
undefined

Use Case 3: Publication-Quality Figures

用例3:出版物级别的图

Create styled visualizations:
python
from ete3 import Tree, TreeStyle, NodeStyle, TextFace

tree = Tree("tree.nw")
创建带样式的可视化结果:
python
from ete3 import Tree, TreeStyle, NodeStyle, TextFace

tree = Tree("tree.nw")

Define clade colors

定义分支颜色

clade_colors = { "Mammals": "red", "Birds": "blue", "Fish": "green" }
def layout(node): # Highlight clades if node.is_leaf(): for clade, color in clade_colors.items(): if clade in node.name: nstyle = NodeStyle() nstyle["fgcolor"] = color nstyle["size"] = 8 node.set_style(nstyle) else: # Add support values if node.support > 0.95: support = TextFace(f"{node.support:.2f}", fsize=8) node.add_face(support, column=0, position="branch-top")
ts = TreeStyle() ts.layout_fn = layout ts.show_scale = True
clade_colors = { "Mammals": "red", "Birds": "blue", "Fish": "green" }
def layout(node): # 高亮分支 if node.is_leaf(): for clade, color in clade_colors.items(): if clade in node.name: nstyle = NodeStyle() nstyle["fgcolor"] = color nstyle["size"] = 8 node.set_style(nstyle) else: # 添加支持值 if node.support > 0.95: support = TextFace(f"{node.support:.2f}", fsize=8) node.add_face(support, column=0, position="branch-top")
ts = TreeStyle() ts.layout_fn = layout ts.show_scale = True

Render for publication

渲染用于出版物

tree.render("figure.pdf", w=200, units="mm", tree_style=ts) tree.render("figure.svg", tree_style=ts) # Editable vector
undefined
tree.render("figure.pdf", w=200, units="mm", tree_style=ts) tree.render("figure.svg", tree_style=ts) # 可编辑的矢量图
undefined

Use Case 4: Automated Tree Analysis

用例4:自动化树分析

Process multiple trees systematically:
python
from ete3 import Tree
import os

input_dir = "trees"
output_dir = "processed"

for filename in os.listdir(input_dir):
    if filename.endswith(".nw"):
        tree = Tree(os.path.join(input_dir, filename))

        # Standardize: midpoint root, resolve polytomies
        midpoint = tree.get_midpoint_outgroup()
        tree.set_outgroup(midpoint)
        tree.resolve_polytomy(recursive=True)

        # Filter low support branches
        for node in tree.traverse():
            if hasattr(node, 'support') and node.support < 0.5:
                if not node.is_leaf() and not node.is_root():
                    node.delete()

        # Save processed tree
        output_file = os.path.join(output_dir, f"processed_{filename}")
        tree.write(outfile=output_file)
系统化处理多棵树:
python
from ete3 import Tree
import os

input_dir = "trees"
output_dir = "processed"

for filename in os.listdir(input_dir):
    if filename.endswith(".nw"):
        tree = Tree(os.path.join(input_dir, filename))

        # 标准化处理:中点定根,解决多歧节点
        midpoint = tree.get_midpoint_outgroup()
        tree.set_outgroup(midpoint)
        tree.resolve_polytomy(recursive=True)

        # 过滤低支持度分支
        for node in tree.traverse():
            if hasattr(node, 'support') and node.support < 0.5:
                if not node.is_leaf() and not node.is_root():
                    node.delete()

        # 保存处理后的树
        output_file = os.path.join(output_dir, f"processed_{filename}")
        tree.write(outfile=output_file)

Reference Documentation

参考文档

For comprehensive API documentation, code examples, and detailed guides, refer to the following resources in the
references/
directory:
  • api_reference.md
    : Complete API documentation for all ETE classes and methods (Tree, PhyloTree, ClusterTree, NCBITaxa), including parameters, return types, and code examples
  • workflows.md
    : Common workflow patterns organized by task (tree operations, phylogenetic analysis, tree comparison, taxonomy integration, clustering analysis)
  • visualization.md
    : Comprehensive visualization guide covering TreeStyle, NodeStyle, Faces, layout functions, and advanced visualization techniques
Load these references when detailed information is needed:
python
undefined
如需完整的API文档、代码示例和详细指南,请参考
references/
目录下的以下资源:
  • api_reference.md
    :所有ETE类和方法(Tree、PhyloTree、ClusterTree、NCBITaxa)的完整API文档,包括参数、返回类型和代码示例
  • workflows.md
    :按任务分类的常见工作流模式(树操作、系统发育分析、树比较、分类学集成、聚类分析)
  • visualization.md
    :全面的可视化指南,涵盖TreeStyle、NodeStyle、Faces、布局函数和高级可视化技术
需要详细信息时加载这些参考文档:
python
undefined

To use API reference

使用API参考

Read references/api_reference.md for complete method signatures and parameters

阅读references/api_reference.md获取完整的方法签名和参数

To implement workflows

实现工作流

Read references/workflows.md for step-by-step workflow examples

阅读references/workflows.md获取分步工作流示例

To create visualizations

创建可视化结果

Read references/visualization.md for styling and rendering options

阅读references/visualization.md获取样式和渲染选项

undefined
undefined

Troubleshooting

故障排除

Import errors:
bash
undefined
导入错误:
bash
undefined

If "ModuleNotFoundError: No module named 'ete3'"

如果出现 "ModuleNotFoundError: No module named 'ete3'"

uv pip install ete3
uv pip install ete3

For GUI and rendering issues

针对GUI和渲染问题

uv pip install ete3[gui]

**Rendering issues:**

If `tree.render()` or `tree.show()` fails with Qt-related errors, install system dependencies:

```bash
uv pip install ete3[gui]

**渲染问题:**

如果 `tree.render()` 或 `tree.show()` 出现Qt相关错误,请安装系统依赖:

```bash

macOS

macOS

brew install qt@5
brew install qt@5

Ubuntu/Debian

Ubuntu/Debian

sudo apt-get install python3-pyqt5 python3-pyqt5.qtsvg

**NCBI Taxonomy database:**

If database download fails or becomes corrupted:

```python
from ete3 import NCBITaxa
ncbi = NCBITaxa()
ncbi.update_taxonomy_database()  # Redownload database
Memory issues with large trees:
For very large trees (>10,000 leaves), use iterators instead of list comprehensions:
python
undefined
sudo apt-get install python3-pyqt5 python3-pyqt5.qtsvg

**NCBI分类学数据库问题:**

如果数据库下载失败或损坏:

```python
from ete3 import NCBITaxa
ncbi = NCBITaxa()
ncbi.update_taxonomy_database()  # 重新下载数据库
大型树的内存问题:
对于非常大的树(>10,000个叶子节点),使用迭代器而非列表推导式:
python
undefined

Memory-efficient iteration

内存高效的迭代

for leaf in tree.iter_leaves(): process(leaf)
for leaf in tree.iter_leaves(): process(leaf)

Instead of

替代以下方式

for leaf in tree.get_leaves(): # Loads all into memory process(leaf)
undefined
for leaf in tree.get_leaves(): # 将所有节点加载到内存 process(leaf)
undefined

Newick Format Reference

Newick格式参考

ETE supports multiple Newick format specifications (0-100):
  • Format 0: Flexible with branch lengths (default)
  • Format 1: With internal node names
  • Format 2: With bootstrap/support values
  • Format 5: Internal node names + branch lengths
  • Format 8: All features (names, distances, support)
  • Format 9: Leaf names only
  • Format 100: Topology only
Specify format when reading/writing:
python
tree = Tree("tree.nw", format=1)
tree.write(outfile="output.nw", format=5)
NHX (New Hampshire eXtended) format preserves custom features:
python
tree.write(outfile="tree.nhx", features=["habitat", "temperature", "depth"])
ETE支持多种Newick格式规范(0-100):
  • 格式0:灵活支持分支长度(默认)
  • 格式1:包含内部节点名称
  • 格式2:包含bootstrap/支持值
  • 格式5:内部节点名称 + 分支长度
  • 格式8:包含所有特征(名称、距离、支持值)
  • 格式9:仅包含叶子名称
  • 格式100:仅包含拓扑结构
读写时指定格式:
python
tree = Tree("tree.nw", format=1)
tree.write(outfile="output.nw", format=5)
NHX(New Hampshire eXtended)格式保留自定义特征:
python
tree.write(outfile="tree.nhx", features=["habitat", "temperature", "depth"])

Best Practices

最佳实践

  1. Preserve branch lengths: Use
    preserve_branch_length=True
    when pruning for phylogenetic analysis
  2. Cache content: Use
    get_cached_content()
    for repeated access to node contents on large trees
  3. Use iterators: Employ
    iter_*
    methods for memory-efficient processing of large trees
  4. Choose appropriate traversal: Postorder for bottom-up analysis, preorder for top-down
  5. Validate monophyly: Always check returned clade type (monophyletic/paraphyletic/polyphyletic)
  6. Vector formats for publication: Use PDF or SVG for publication figures (scalable, editable)
  7. Interactive testing: Use
    tree.show()
    to test visualizations before rendering to file
  8. PhyloTree for phylogenetics: Use PhyloTree class for gene trees and evolutionary analysis
  9. Copy method selection: "newick" for speed, "cpickle" for full fidelity, "deepcopy" for complex objects
  10. NCBI query caching: Store NCBI taxonomy query results to avoid repeated database access
  1. 保留分支长度:进行系统发育分析时,剪枝操作使用
    preserve_branch_length=True
  2. 缓存内容:对大型树重复访问节点内容时,使用
    get_cached_content()
  3. 使用迭代器:处理大型树时使用
    iter_*
    方法以节省内存
  4. 选择合适的遍历方式:自底向上分析使用后序遍历,自顶向下使用前序遍历
  5. 验证单系性:始终检查返回分支的类型(单系/并系/多系)
  6. 出版物使用矢量格式:使用PDF或SVG格式生成出版物用图(可缩放、可编辑)
  7. 交互式测试:渲染到文件前,使用
    tree.show()
    测试可视化效果
  8. 系统发育分析使用PhyloTree:基因树和进化分析使用PhyloTree类
  9. 选择合适的复制方法:"newick" 速度快,"cpickle" 保真度高,"deepcopy" 适用于复杂对象
  10. NCBI查询缓存:存储NCBI分类学查询结果,避免重复访问数据库