etetoolkit
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseETE Toolkit Skill
ETE Toolkit 技能文档
Overview
概述
ETE (Environment for Tree Exploration) is a toolkit for phylogenetic and hierarchical tree analysis. Manipulate trees, analyze evolutionary events, visualize results, and integrate with biological databases for phylogenomic research and clustering analysis.
ETE(Environment for Tree Exploration)是一款用于系统发育树和层次树分析的工具包。可进行树操作、进化事件分析、结果可视化,并与生物数据库集成,用于系统发育组学研究和聚类分析。
Core Capabilities
核心功能
1. Tree Manipulation and Analysis
1. 树操作与分析
Load, manipulate, and analyze hierarchical tree structures with support for:
- Tree I/O: Read and write Newick, NHX, PhyloXML, and NeXML formats
- Tree traversal: Navigate trees using preorder, postorder, or levelorder strategies
- Topology modification: Prune, root, collapse nodes, resolve polytomies
- Distance calculations: Compute branch lengths and topological distances between nodes
- Tree comparison: Calculate Robinson-Foulds distances and identify topological differences
Common patterns:
python
from ete3 import Tree加载、操作和分析层次树结构,支持:
- 树I/O:读写Newick、NHX、PhyloXML和NeXML格式
- 树遍历:使用前序、后序或层次序策略遍历树
- 拓扑修改:剪枝、定根、合并节点、解决多歧节点
- 距离计算:计算分支长度和节点间的拓扑距离
- 树比较:计算Robinson-Foulds距离并识别拓扑差异
常见使用模式:
python
from ete3 import TreeLoad tree from file
从文件加载树
tree = Tree("tree.nw", format=1)
tree = Tree("tree.nw", format=1)
Basic statistics
基础统计
print(f"Leaves: {len(tree)}")
print(f"Total nodes: {len(list(tree.traverse()))}")
print(f"叶子节点数: {len(tree)}")
print(f"总节点数: {len(list(tree.traverse()))}")
Prune to taxa of interest
剪枝保留目标类群
taxa_to_keep = ["species1", "species2", "species3"]
tree.prune(taxa_to_keep, preserve_branch_length=True)
taxa_to_keep = ["species1", "species2", "species3"]
tree.prune(taxa_to_keep, preserve_branch_length=True)
Midpoint root
中点定根
midpoint = tree.get_midpoint_outgroup()
tree.set_outgroup(midpoint)
midpoint = tree.get_midpoint_outgroup()
tree.set_outgroup(midpoint)
Save modified tree
保存修改后的树
tree.write(outfile="rooted_tree.nw")
Use `scripts/tree_operations.py` for command-line tree manipulation:
```bashtree.write(outfile="rooted_tree.nw")
使用 `scripts/tree_operations.py` 进行命令行树操作:
```bashDisplay tree statistics
显示树统计信息
python scripts/tree_operations.py stats tree.nw
python scripts/tree_operations.py stats tree.nw
Convert format
转换格式
python scripts/tree_operations.py convert tree.nw output.nw --in-format 0 --out-format 1
python scripts/tree_operations.py convert tree.nw output.nw --in-format 0 --out-format 1
Reroot tree
重新定根
python scripts/tree_operations.py reroot tree.nw rooted.nw --midpoint
python scripts/tree_operations.py reroot tree.nw rooted.nw --midpoint
Prune to specific taxa
剪枝保留特定类群
python scripts/tree_operations.py prune tree.nw pruned.nw --keep-taxa "sp1,sp2,sp3"
python scripts/tree_operations.py prune tree.nw pruned.nw --keep-taxa "sp1,sp2,sp3"
Show ASCII visualization
显示ASCII格式树
python scripts/tree_operations.py ascii tree.nw
undefinedpython scripts/tree_operations.py ascii tree.nw
undefined2. Phylogenetic Analysis
2. 系统发育分析
Analyze gene trees with evolutionary event detection:
- Sequence alignment integration: Link trees to multiple sequence alignments (FASTA, Phylip)
- Species naming: Automatic or custom species extraction from gene names
- Evolutionary events: Detect duplication and speciation events using Species Overlap or tree reconciliation
- Orthology detection: Identify orthologs and paralogs based on evolutionary events
- Gene family analysis: Split trees by duplications, collapse lineage-specific expansions
Workflow for gene tree analysis:
python
from ete3 import PhyloTree分析基因树并检测进化事件:
- 序列比对集成:将树与多序列比对(FASTA、Phylip格式)关联
- 物种命名:从基因名称中自动或自定义提取物种信息
- 进化事件检测:使用物种重叠法或树 reconciliation 检测重复和物种形成事件
- 直系同源检测:基于进化事件识别直系同源和旁系同源基因
- 基因家族分析:按重复事件拆分树,合并谱系特异性扩张分支
基因树分析工作流:
python
from ete3 import PhyloTreeLoad gene tree with alignment
加载带比对信息的基因树
tree = PhyloTree("gene_tree.nw", alignment="alignment.fasta")
tree = PhyloTree("gene_tree.nw", alignment="alignment.fasta")
Set species naming function
设置物种命名函数
def get_species(gene_name):
return gene_name.split("_")[0]
tree.set_species_naming_function(get_species)
def get_species(gene_name):
return gene_name.split("_")[0]
tree.set_species_naming_function(get_species)
Detect evolutionary events
检测进化事件
events = tree.get_descendant_evol_events()
events = tree.get_descendant_evol_events()
Analyze events
分析事件
for node in tree.traverse():
if hasattr(node, "evoltype"):
if node.evoltype == "D":
print(f"Duplication at {node.name}")
elif node.evoltype == "S":
print(f"Speciation at {node.name}")
for node in tree.traverse():
if hasattr(node, "evoltype"):
if node.evoltype == "D":
print(f"节点 {node.name} 处发生重复事件")
elif node.evoltype == "S":
print(f"节点 {node.name} 处发生物种形成事件")
Extract ortholog groups
提取直系同源组
ortho_groups = tree.get_speciation_trees()
for i, ortho_tree in enumerate(ortho_groups):
ortho_tree.write(outfile=f"ortholog_group_{i}.nw")
**Finding orthologs and paralogs:**
```pythonortho_groups = tree.get_speciation_trees()
for i, ortho_tree in enumerate(ortho_groups):
ortho_tree.write(outfile=f"ortholog_group_{i}.nw")
**查找直系同源和旁系同源基因:**
```pythonFind orthologs to query gene
查找查询基因的同源基因
query = tree & "species1_gene1"
orthologs = []
paralogs = []
for event in events:
if query in event.in_seqs:
if event.etype == "S":
orthologs.extend([s for s in event.out_seqs if s != query])
elif event.etype == "D":
paralogs.extend([s for s in event.out_seqs if s != query])
undefinedquery = tree & "species1_gene1"
orthologs = []
paralogs = []
for event in events:
if query in event.in_seqs:
if event.etype == "S":
orthologs.extend([s for s in event.out_seqs if s != query])
elif event.etype == "D":
paralogs.extend([s for s in event.out_seqs if s != query])
undefined3. NCBI Taxonomy Integration
3. NCBI分类学集成
Integrate taxonomic information from NCBI Taxonomy database:
- Database access: Automatic download and local caching of NCBI taxonomy (~300MB)
- Taxid/name translation: Convert between taxonomic IDs and scientific names
- Lineage retrieval: Get complete evolutionary lineages
- Taxonomy trees: Build species trees connecting specified taxa
- Tree annotation: Automatically annotate trees with taxonomic information
Building taxonomy-based trees:
python
from ete3 import NCBITaxa
ncbi = NCBITaxa()集成NCBI分类学数据库的分类信息:
- 数据库访问:自动下载并本地缓存NCBI分类学数据库(约300MB)
- TaxID/名称转换:在分类学ID和科学名称之间转换
- 谱系检索:获取完整的进化谱系
- 分类学树构建:构建连接指定类群的物种树
- 树注释:自动用分类学信息注释树节点
基于分类学构建树:
python
from ete3 import NCBITaxa
ncbi = NCBITaxa()Build tree from species names
从物种名称构建树
species = ["Homo sapiens", "Pan troglodytes", "Mus musculus"]
name2taxid = ncbi.get_name_translator(species)
taxids = [name2taxid[sp][0] for sp in species]
species = ["Homo sapiens", "Pan troglodytes", "Mus musculus"]
name2taxid = ncbi.get_name_translator(species)
taxids = [name2taxid[sp][0] for sp in species]
Get minimal tree connecting taxa
获取连接类群的最小树
tree = ncbi.get_topology(taxids)
tree = ncbi.get_topology(taxids)
Annotate nodes with taxonomy info
用分类学信息注释节点
for node in tree.traverse():
if hasattr(node, "sci_name"):
print(f"{node.sci_name} - Rank: {node.rank} - TaxID: {node.taxid}")
**Annotating existing trees:**
```pythonfor node in tree.traverse():
if hasattr(node, "sci_name"):
print(f"{node.sci_name} - 分类等级: {node.rank} - TaxID: {node.taxid}")
**注释现有树:**
```pythonGet taxonomy info for tree leaves
获取树叶子节点的分类学信息
for leaf in tree:
species = extract_species_from_name(leaf.name)
taxid = ncbi.get_name_translator([species])[species][0]
# Get lineage
lineage = ncbi.get_lineage(taxid)
ranks = ncbi.get_rank(lineage)
names = ncbi.get_taxid_translator(lineage)
# Add to node
leaf.add_feature("taxid", taxid)
leaf.add_feature("lineage", [names[t] for t in lineage])undefinedfor leaf in tree:
species = extract_species_from_name(leaf.name)
taxid = ncbi.get_name_translator([species])[species][0]
# 获取谱系
lineage = ncbi.get_lineage(taxid)
ranks = ncbi.get_rank(lineage)
names = ncbi.get_taxid_translator(lineage)
# 添加到节点
leaf.add_feature("taxid", taxid)
leaf.add_feature("lineage", [names[t] for t in lineage])undefined4. Tree Visualization
4. 树可视化
Create publication-quality tree visualizations:
- Output formats: PNG (raster), PDF, and SVG (vector) for publications
- Layout modes: Rectangular and circular tree layouts
- Interactive GUI: Explore trees interactively with zoom, pan, and search
- Custom styling: NodeStyle for node appearance (colors, shapes, sizes)
- Faces: Add graphical elements (text, images, charts, heatmaps) to nodes
- Layout functions: Dynamic styling based on node properties
Basic visualization workflow:
python
from ete3 import Tree, TreeStyle, NodeStyle
tree = Tree("tree.nw")创建可用于出版物的树可视化结果:
- 输出格式:PNG(光栅图)、PDF和SVG(矢量图),适用于出版物
- 布局模式:矩形和圆形树布局
- 交互式GUI:通过缩放、平移和搜索功能交互式探索树
- 自定义样式:使用NodeStyle设置节点外观(颜色、形状、大小)
- Faces:向节点添加图形元素(文本、图片、图表、热图)
- 布局函数:基于节点属性进行动态样式设置
基础可视化工作流:
python
from ete3 import Tree, TreeStyle, NodeStyle
tree = Tree("tree.nw")Configure tree style
配置树样式
ts = TreeStyle()
ts.show_leaf_name = True
ts.show_branch_support = True
ts.scale = 50 # pixels per branch length unit
ts = TreeStyle()
ts.show_leaf_name = True
ts.show_branch_support = True
ts.scale = 50 # 每个分支长度单位对应的像素数
Style nodes
设置节点样式
for node in tree.traverse():
nstyle = NodeStyle()
if node.is_leaf():
nstyle["fgcolor"] = "blue"
nstyle["size"] = 8
else:
# Color by support
if node.support > 0.9:
nstyle["fgcolor"] = "darkgreen"
else:
nstyle["fgcolor"] = "red"
nstyle["size"] = 5
node.set_style(nstyle)for node in tree.traverse():
nstyle = NodeStyle()
if node.is_leaf():
nstyle["fgcolor"] = "blue"
nstyle["size"] = 8
else:
# 按支持值着色
if node.support > 0.9:
nstyle["fgcolor"] = "darkgreen"
else:
nstyle["fgcolor"] = "red"
nstyle["size"] = 5
node.set_style(nstyle)Render to file
渲染到文件
tree.render("tree.pdf", tree_style=ts)
tree.render("tree.png", w=800, h=600, units="px", dpi=300)
Use `scripts/quick_visualize.py` for rapid visualization:
```bashtree.render("tree.pdf", tree_style=ts)
tree.render("tree.png", w=800, h=600, units="px", dpi=300)
使用 `scripts/quick_visualize.py` 快速可视化:
```bashBasic visualization
基础可视化
python scripts/quick_visualize.py tree.nw output.pdf
python scripts/quick_visualize.py tree.nw output.pdf
Circular layout with custom styling
带自定义样式的圆形布局
python scripts/quick_visualize.py tree.nw output.pdf --mode c --color-by-support
python scripts/quick_visualize.py tree.nw output.pdf --mode c --color-by-support
High-resolution PNG
高分辨率PNG
python scripts/quick_visualize.py tree.nw output.png --width 1200 --height 800 --units px --dpi 300
python scripts/quick_visualize.py tree.nw output.png --width 1200 --height 800 --units px --dpi 300
Custom title and styling
自定义标题和样式
python scripts/quick_visualize.py tree.nw output.pdf --title "Species Phylogeny" --show-support
**Advanced visualization with faces:**
```python
from ete3 import Tree, TreeStyle, TextFace, CircleFace
tree = Tree("tree.nw")python scripts/quick_visualize.py tree.nw output.pdf --title "物种系统发育树" --show-support
**使用Faces进行高级可视化:**
```python
from ete3 import Tree, TreeStyle, TextFace, CircleFace
tree = Tree("tree.nw")Add features to nodes
向节点添加属性
for leaf in tree:
leaf.add_feature("habitat", "marine" if "fish" in leaf.name else "land")
for leaf in tree:
leaf.add_feature("habitat", "marine" if "fish" in leaf.name else "land")
Layout function
布局函数
def layout(node):
if node.is_leaf():
# Add colored circle
color = "blue" if node.habitat == "marine" else "green"
circle = CircleFace(radius=5, color=color)
node.add_face(circle, column=0, position="aligned")
# Add label
label = TextFace(node.name, fsize=10)
node.add_face(label, column=1, position="aligned")ts = TreeStyle()
ts.layout_fn = layout
ts.show_leaf_name = False
tree.render("annotated_tree.pdf", tree_style=ts)
undefineddef layout(node):
if node.is_leaf():
# 添加彩色圆圈
color = "blue" if node.habitat == "marine" else "green"
circle = CircleFace(radius=5, color=color)
node.add_face(circle, column=0, position="aligned")
# 添加标签
label = TextFace(node.name, fsize=10)
node.add_face(label, column=1, position="aligned")ts = TreeStyle()
ts.layout_fn = layout
ts.show_leaf_name = False
tree.render("annotated_tree.pdf", tree_style=ts)
undefined5. Clustering Analysis
5. 聚类分析
Analyze hierarchical clustering results with data integration:
- ClusterTree: Specialized class for clustering dendrograms
- Data matrix linking: Connect tree leaves to numerical profiles
- Cluster metrics: Silhouette coefficient, Dunn index, inter/intra-cluster distances
- Validation: Test cluster quality with different distance metrics
- Heatmap visualization: Display data matrices alongside trees
Clustering workflow:
python
from ete3 import ClusterTree分析层次聚类结果并集成数据:
- ClusterTree:用于聚类树状图的专用类
- 数据矩阵关联:将树叶子节点与数值特征关联
- 聚类指标:轮廓系数、Dunn指数、簇间/簇内距离
- 验证:使用不同距离指标测试聚类质量
- 热图可视化:在树旁显示数据矩阵
聚类工作流:
python
from ete3 import ClusterTreeLoad tree with data matrix
加载带数据矩阵的树
matrix = """#Names\tSample1\tSample2\tSample3
Gene1\t1.5\t2.3\t0.8
Gene2\t0.9\t1.1\t1.8
Gene3\t2.1\t2.5\t0.5"""
tree = ClusterTree("((Gene1,Gene2),Gene3);", text_array=matrix)
matrix = """#Names\tSample1\tSample2\tSample3
Gene1\t1.5\t2.3\t0.8
Gene2\t0.9\t1.1\t1.8
Gene3\t2.1\t2.5\t0.5"""
tree = ClusterTree("((Gene1,Gene2),Gene3);", text_array=matrix)
Evaluate cluster quality
评估聚类质量
for node in tree.traverse():
if not node.is_leaf():
silhouette = node.get_silhouette()
dunn = node.get_dunn()
print(f"Cluster: {node.name}")
print(f" Silhouette: {silhouette:.3f}")
print(f" Dunn index: {dunn:.3f}")for node in tree.traverse():
if not node.is_leaf():
silhouette = node.get_silhouette()
dunn = node.get_dunn()
print(f"聚类簇: {node.name}")
print(f" 轮廓系数: {silhouette:.3f}")
print(f" Dunn指数: {dunn:.3f}")Visualize with heatmap
带热图的可视化
tree.show("heatmap")
undefinedtree.show("heatmap")
undefined6. Tree Comparison
6. 树比较
Quantify topological differences between trees:
- Robinson-Foulds distance: Standard metric for tree comparison
- Normalized RF: Scale-invariant distance (0.0 to 1.0)
- Partition analysis: Identify unique and shared bipartitions
- Consensus trees: Analyze support across multiple trees
- Batch comparison: Compare multiple trees pairwise
Compare two trees:
python
from ete3 import Tree
tree1 = Tree("tree1.nw")
tree2 = Tree("tree2.nw")量化树之间的拓扑差异:
- Robinson-Foulds距离:树比较的标准指标
- 归一化RF距离:尺度不变的距离(0.0到1.0)
- 分区分析:识别唯一和共享的二分分区
- 共识树:分析多棵树的支持度
- 批量比较:成对比较多棵树
比较两棵树:
python
from ete3 import Tree
tree1 = Tree("tree1.nw")
tree2 = Tree("tree2.nw")Calculate RF distance
计算RF距离
rf, max_rf, common_leaves, parts_t1, parts_t2 = tree1.robinson_foulds(tree2)
print(f"RF distance: {rf}/{max_rf}")
print(f"Normalized RF: {rf/max_rf:.3f}")
print(f"Common leaves: {len(common_leaves)}")
rf, max_rf, common_leaves, parts_t1, parts_t2 = tree1.robinson_foulds(tree2)
print(f"RF距离: {rf}/{max_rf}")
print(f"归一化RF距离: {rf/max_rf:.3f}")
print(f"共同叶子节点数: {len(common_leaves)}")
Find unique partitions
查找唯一分区
unique_t1 = parts_t1 - parts_t2
unique_t2 = parts_t2 - parts_t1
print(f"Unique to tree1: {len(unique_t1)}")
print(f"Unique to tree2: {len(unique_t2)}")
**Compare multiple trees:**
```python
import numpy as np
trees = [Tree(f"tree{i}.nw") for i in range(4)]unique_t1 = parts_t1 - parts_t2
unique_t2 = parts_t2 - parts_t1
print(f"Tree1独有的分区数: {len(unique_t1)}")
print(f"Tree2独有的分区数: {len(unique_t2)}")
**比较多棵树:**
```python
import numpy as np
trees = [Tree(f"tree{i}.nw") for i in range(4)]Create distance matrix
创建距离矩阵
n = len(trees)
dist_matrix = np.zeros((n, n))
for i in range(n):
for j in range(i+1, n):
rf, max_rf, _, _, _ = trees[i].robinson_foulds(trees[j])
norm_rf = rf / max_rf if max_rf > 0 else 0
dist_matrix[i, j] = norm_rf
dist_matrix[j, i] = norm_rf
undefinedn = len(trees)
dist_matrix = np.zeros((n, n))
for i in range(n):
for j in range(i+1, n):
rf, max_rf, _, _, _ = trees[i].robinson_foulds(trees[j])
norm_rf = rf / max_rf if max_rf > 0 else 0
dist_matrix[i, j] = norm_rf
dist_matrix[j, i] = norm_rf
undefinedInstallation and Setup
安装与设置
Install ETE toolkit:
bash
undefined安装ETE工具包:
bash
undefinedBasic installation
基础安装
uv pip install ete3
uv pip install ete3
With external dependencies for rendering (optional but recommended)
安装渲染所需的外部依赖(可选但推荐)
On macOS:
在macOS上:
brew install qt@5
brew install qt@5
On Ubuntu/Debian:
在Ubuntu/Debian上:
sudo apt-get install python3-pyqt5 python3-pyqt5.qtsvg
sudo apt-get install python3-pyqt5 python3-pyqt5.qtsvg
For full features including GUI
安装包含GUI的完整功能版本
uv pip install ete3[gui]
**First-time NCBI Taxonomy setup:**
The first time NCBITaxa is instantiated, it automatically downloads the NCBI taxonomy database (~300MB) to `~/.etetoolkit/taxa.sqlite`. This happens only once:
```python
from ete3 import NCBITaxa
ncbi = NCBITaxa() # Downloads database on first runUpdate taxonomy database:
python
ncbi.update_taxonomy_database() # Download latest NCBI datauv pip install ete3[gui]
**首次NCBI分类学设置:**
首次实例化NCBITaxa时,会自动下载NCBI分类学数据库(约300MB)到 `~/.etetoolkit/taxa.sqlite`,此操作仅执行一次:
```python
from ete3 import NCBITaxa
ncbi = NCBITaxa() # 首次运行时下载数据库更新分类学数据库:
python
ncbi.update_taxonomy_database() # 下载最新的NCBI数据Common Use Cases
常见用例
Use Case 1: Phylogenomic Pipeline
用例1:系统发育组学流程
Complete workflow from gene tree to ortholog identification:
python
from ete3 import PhyloTree, NCBITaxa从基因树到直系同源识别的完整工作流:
python
from ete3 import PhyloTree, NCBITaxa1. Load gene tree with alignment
1. 加载带比对信息的基因树
tree = PhyloTree("gene_tree.nw", alignment="alignment.fasta")
tree = PhyloTree("gene_tree.nw", alignment="alignment.fasta")
2. Configure species naming
2. 配置物种命名
tree.set_species_naming_function(lambda x: x.split("_")[0])
tree.set_species_naming_function(lambda x: x.split("_")[0])
3. Detect evolutionary events
3. 检测进化事件
tree.get_descendant_evol_events()
tree.get_descendant_evol_events()
4. Annotate with taxonomy
4. 用分类学信息注释
ncbi = NCBITaxa()
for leaf in tree:
if leaf.species in species_to_taxid:
taxid = species_to_taxid[leaf.species]
lineage = ncbi.get_lineage(taxid)
leaf.add_feature("lineage", lineage)
ncbi = NCBITaxa()
for leaf in tree:
if leaf.species in species_to_taxid:
taxid = species_to_taxid[leaf.species]
lineage = ncbi.get_lineage(taxid)
leaf.add_feature("lineage", lineage)
5. Extract ortholog groups
5. 提取直系同源组
ortho_groups = tree.get_speciation_trees()
ortho_groups = tree.get_speciation_trees()
6. Save and visualize
6. 保存并可视化
for i, ortho in enumerate(ortho_groups):
ortho.write(outfile=f"ortho_{i}.nw")
undefinedfor i, ortho in enumerate(ortho_groups):
ortho.write(outfile=f"ortho_{i}.nw")
undefinedUse Case 2: Tree Preprocessing and Formatting
用例2:树预处理与格式化
Batch process trees for analysis:
bash
undefined批量处理树用于后续分析:
bash
undefinedConvert format
转换格式
python scripts/tree_operations.py convert input.nw output.nw --in-format 0 --out-format 1
python scripts/tree_operations.py convert input.nw output.nw --in-format 0 --out-format 1
Root at midpoint
中点定根
python scripts/tree_operations.py reroot input.nw rooted.nw --midpoint
python scripts/tree_operations.py reroot input.nw rooted.nw --midpoint
Prune to focal taxa
剪枝保留核心类群
python scripts/tree_operations.py prune rooted.nw pruned.nw --keep-taxa taxa_list.txt
python scripts/tree_operations.py prune rooted.nw pruned.nw --keep-taxa taxa_list.txt
Get statistics
获取统计信息
python scripts/tree_operations.py stats pruned.nw
undefinedpython scripts/tree_operations.py stats pruned.nw
undefinedUse Case 3: Publication-Quality Figures
用例3:出版物级别的图
Create styled visualizations:
python
from ete3 import Tree, TreeStyle, NodeStyle, TextFace
tree = Tree("tree.nw")创建带样式的可视化结果:
python
from ete3 import Tree, TreeStyle, NodeStyle, TextFace
tree = Tree("tree.nw")Define clade colors
定义分支颜色
clade_colors = {
"Mammals": "red",
"Birds": "blue",
"Fish": "green"
}
def layout(node):
# Highlight clades
if node.is_leaf():
for clade, color in clade_colors.items():
if clade in node.name:
nstyle = NodeStyle()
nstyle["fgcolor"] = color
nstyle["size"] = 8
node.set_style(nstyle)
else:
# Add support values
if node.support > 0.95:
support = TextFace(f"{node.support:.2f}", fsize=8)
node.add_face(support, column=0, position="branch-top")
ts = TreeStyle()
ts.layout_fn = layout
ts.show_scale = True
clade_colors = {
"Mammals": "red",
"Birds": "blue",
"Fish": "green"
}
def layout(node):
# 高亮分支
if node.is_leaf():
for clade, color in clade_colors.items():
if clade in node.name:
nstyle = NodeStyle()
nstyle["fgcolor"] = color
nstyle["size"] = 8
node.set_style(nstyle)
else:
# 添加支持值
if node.support > 0.95:
support = TextFace(f"{node.support:.2f}", fsize=8)
node.add_face(support, column=0, position="branch-top")
ts = TreeStyle()
ts.layout_fn = layout
ts.show_scale = True
Render for publication
渲染用于出版物
tree.render("figure.pdf", w=200, units="mm", tree_style=ts)
tree.render("figure.svg", tree_style=ts) # Editable vector
undefinedtree.render("figure.pdf", w=200, units="mm", tree_style=ts)
tree.render("figure.svg", tree_style=ts) # 可编辑的矢量图
undefinedUse Case 4: Automated Tree Analysis
用例4:自动化树分析
Process multiple trees systematically:
python
from ete3 import Tree
import os
input_dir = "trees"
output_dir = "processed"
for filename in os.listdir(input_dir):
if filename.endswith(".nw"):
tree = Tree(os.path.join(input_dir, filename))
# Standardize: midpoint root, resolve polytomies
midpoint = tree.get_midpoint_outgroup()
tree.set_outgroup(midpoint)
tree.resolve_polytomy(recursive=True)
# Filter low support branches
for node in tree.traverse():
if hasattr(node, 'support') and node.support < 0.5:
if not node.is_leaf() and not node.is_root():
node.delete()
# Save processed tree
output_file = os.path.join(output_dir, f"processed_{filename}")
tree.write(outfile=output_file)系统化处理多棵树:
python
from ete3 import Tree
import os
input_dir = "trees"
output_dir = "processed"
for filename in os.listdir(input_dir):
if filename.endswith(".nw"):
tree = Tree(os.path.join(input_dir, filename))
# 标准化处理:中点定根,解决多歧节点
midpoint = tree.get_midpoint_outgroup()
tree.set_outgroup(midpoint)
tree.resolve_polytomy(recursive=True)
# 过滤低支持度分支
for node in tree.traverse():
if hasattr(node, 'support') and node.support < 0.5:
if not node.is_leaf() and not node.is_root():
node.delete()
# 保存处理后的树
output_file = os.path.join(output_dir, f"processed_{filename}")
tree.write(outfile=output_file)Reference Documentation
参考文档
For comprehensive API documentation, code examples, and detailed guides, refer to the following resources in the directory:
references/- : Complete API documentation for all ETE classes and methods (Tree, PhyloTree, ClusterTree, NCBITaxa), including parameters, return types, and code examples
api_reference.md - : Common workflow patterns organized by task (tree operations, phylogenetic analysis, tree comparison, taxonomy integration, clustering analysis)
workflows.md - : Comprehensive visualization guide covering TreeStyle, NodeStyle, Faces, layout functions, and advanced visualization techniques
visualization.md
Load these references when detailed information is needed:
python
undefined如需完整的API文档、代码示例和详细指南,请参考 目录下的以下资源:
references/- :所有ETE类和方法(Tree、PhyloTree、ClusterTree、NCBITaxa)的完整API文档,包括参数、返回类型和代码示例
api_reference.md - :按任务分类的常见工作流模式(树操作、系统发育分析、树比较、分类学集成、聚类分析)
workflows.md - :全面的可视化指南,涵盖TreeStyle、NodeStyle、Faces、布局函数和高级可视化技术
visualization.md
需要详细信息时加载这些参考文档:
python
undefinedTo use API reference
使用API参考
Read references/api_reference.md for complete method signatures and parameters
阅读references/api_reference.md获取完整的方法签名和参数
To implement workflows
实现工作流
Read references/workflows.md for step-by-step workflow examples
阅读references/workflows.md获取分步工作流示例
To create visualizations
创建可视化结果
Read references/visualization.md for styling and rendering options
阅读references/visualization.md获取样式和渲染选项
undefinedundefinedTroubleshooting
故障排除
Import errors:
bash
undefined导入错误:
bash
undefinedIf "ModuleNotFoundError: No module named 'ete3'"
如果出现 "ModuleNotFoundError: No module named 'ete3'"
uv pip install ete3
uv pip install ete3
For GUI and rendering issues
针对GUI和渲染问题
uv pip install ete3[gui]
**Rendering issues:**
If `tree.render()` or `tree.show()` fails with Qt-related errors, install system dependencies:
```bashuv pip install ete3[gui]
**渲染问题:**
如果 `tree.render()` 或 `tree.show()` 出现Qt相关错误,请安装系统依赖:
```bashmacOS
macOS
brew install qt@5
brew install qt@5
Ubuntu/Debian
Ubuntu/Debian
sudo apt-get install python3-pyqt5 python3-pyqt5.qtsvg
**NCBI Taxonomy database:**
If database download fails or becomes corrupted:
```python
from ete3 import NCBITaxa
ncbi = NCBITaxa()
ncbi.update_taxonomy_database() # Redownload databaseMemory issues with large trees:
For very large trees (>10,000 leaves), use iterators instead of list comprehensions:
python
undefinedsudo apt-get install python3-pyqt5 python3-pyqt5.qtsvg
**NCBI分类学数据库问题:**
如果数据库下载失败或损坏:
```python
from ete3 import NCBITaxa
ncbi = NCBITaxa()
ncbi.update_taxonomy_database() # 重新下载数据库大型树的内存问题:
对于非常大的树(>10,000个叶子节点),使用迭代器而非列表推导式:
python
undefinedMemory-efficient iteration
内存高效的迭代
for leaf in tree.iter_leaves():
process(leaf)
for leaf in tree.iter_leaves():
process(leaf)
Instead of
替代以下方式
for leaf in tree.get_leaves(): # Loads all into memory
process(leaf)
undefinedfor leaf in tree.get_leaves(): # 将所有节点加载到内存
process(leaf)
undefinedNewick Format Reference
Newick格式参考
ETE supports multiple Newick format specifications (0-100):
- Format 0: Flexible with branch lengths (default)
- Format 1: With internal node names
- Format 2: With bootstrap/support values
- Format 5: Internal node names + branch lengths
- Format 8: All features (names, distances, support)
- Format 9: Leaf names only
- Format 100: Topology only
Specify format when reading/writing:
python
tree = Tree("tree.nw", format=1)
tree.write(outfile="output.nw", format=5)NHX (New Hampshire eXtended) format preserves custom features:
python
tree.write(outfile="tree.nhx", features=["habitat", "temperature", "depth"])ETE支持多种Newick格式规范(0-100):
- 格式0:灵活支持分支长度(默认)
- 格式1:包含内部节点名称
- 格式2:包含bootstrap/支持值
- 格式5:内部节点名称 + 分支长度
- 格式8:包含所有特征(名称、距离、支持值)
- 格式9:仅包含叶子名称
- 格式100:仅包含拓扑结构
读写时指定格式:
python
tree = Tree("tree.nw", format=1)
tree.write(outfile="output.nw", format=5)NHX(New Hampshire eXtended)格式保留自定义特征:
python
tree.write(outfile="tree.nhx", features=["habitat", "temperature", "depth"])Best Practices
最佳实践
- Preserve branch lengths: Use when pruning for phylogenetic analysis
preserve_branch_length=True - Cache content: Use for repeated access to node contents on large trees
get_cached_content() - Use iterators: Employ methods for memory-efficient processing of large trees
iter_* - Choose appropriate traversal: Postorder for bottom-up analysis, preorder for top-down
- Validate monophyly: Always check returned clade type (monophyletic/paraphyletic/polyphyletic)
- Vector formats for publication: Use PDF or SVG for publication figures (scalable, editable)
- Interactive testing: Use to test visualizations before rendering to file
tree.show() - PhyloTree for phylogenetics: Use PhyloTree class for gene trees and evolutionary analysis
- Copy method selection: "newick" for speed, "cpickle" for full fidelity, "deepcopy" for complex objects
- NCBI query caching: Store NCBI taxonomy query results to avoid repeated database access
- 保留分支长度:进行系统发育分析时,剪枝操作使用
preserve_branch_length=True - 缓存内容:对大型树重复访问节点内容时,使用
get_cached_content() - 使用迭代器:处理大型树时使用 方法以节省内存
iter_* - 选择合适的遍历方式:自底向上分析使用后序遍历,自顶向下使用前序遍历
- 验证单系性:始终检查返回分支的类型(单系/并系/多系)
- 出版物使用矢量格式:使用PDF或SVG格式生成出版物用图(可缩放、可编辑)
- 交互式测试:渲染到文件前,使用 测试可视化效果
tree.show() - 系统发育分析使用PhyloTree:基因树和进化分析使用PhyloTree类
- 选择合适的复制方法:"newick" 速度快,"cpickle" 保真度高,"deepcopy" 适用于复杂对象
- NCBI查询缓存:存储NCBI分类学查询结果,避免重复访问数据库