phylogenetics
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePhylogenetics
系统发育分析
Overview
概述
Phylogenetic analysis reconstructs the evolutionary history of biological sequences (genes, proteins, genomes) by inferring the branching pattern of descent. This skill covers the standard pipeline:
- MAFFT — Multiple sequence alignment
- IQ-TREE 2 — Maximum likelihood tree inference with model selection
- FastTree — Fast approximate maximum likelihood (for large datasets)
- ETE3 — Python library for tree manipulation and visualization
Installation:
bash
undefined系统发育分析通过推断生物序列(基因、蛋白质、基因组)的分支遗传模式,重建其进化历史。本技能涵盖标准流程:
- MAFFT — 多序列比对
- IQ-TREE 2 — 带模型选择的最大似然树推断
- FastTree — 快速近似最大似然法(适用于大型数据集)
- ETE3 — 用于树操作和可视化的Python库
安装方法:
bash
undefinedConda (recommended for CLI tools)
Conda(CLI工具推荐方式)
conda install -c bioconda mafft iqtree fasttree
pip install ete3
undefinedconda install -c bioconda mafft iqtree fasttree
pip install ete3
undefinedWhen to Use This Skill
何时使用本技能
Use phylogenetics when:
- Evolutionary relationships: Which organism/gene is most closely related to my sequence?
- Viral phylodynamics: Trace outbreak spread and estimate transmission dates
- Protein family analysis: Infer evolutionary relationships within a gene family
- Horizontal gene transfer detection: Identify genes with discordant species/gene trees
- Ancestral sequence reconstruction: Infer ancestral protein sequences
- Molecular clock analysis: Estimate divergence dates using temporal sampling
- GWAS companion: Place variants in evolutionary context (e.g., SARS-CoV-2 variants)
- Microbiology: Species phylogeny from 16S rRNA or core genome phylogeny
在以下场景中使用系统发育分析:
- 进化关系分析:哪种生物/基因与我的序列亲缘关系最近?
- 病毒系统动力学:追踪疫情传播并估算传播日期
- 蛋白家族分析:推断基因家族内的进化关系
- 水平基因转移检测:识别物种/基因树不一致的基因
- 祖先序列重建:推断祖先蛋白质序列
- 分子钟分析:利用时间采样估算分化日期
- GWAS辅助分析:将变异置于进化背景中(如SARS-CoV-2变异株)
- 微生物学:基于16S rRNA的物种系统发育或核心基因组系统发育
Standard Workflow
标准工作流程
1. Multiple Sequence Alignment with MAFFT
1. 使用MAFFT进行多序列比对
python
import subprocess
import os
def run_mafft(input_fasta: str, output_fasta: str, method: str = "auto",
n_threads: int = 4) -> str:
"""
Align sequences with MAFFT.
Args:
input_fasta: Path to unaligned FASTA file
output_fasta: Path for aligned output
method: 'auto' (auto-select), 'einsi' (accurate), 'linsi' (accurate, slow),
'fftnsi' (medium), 'fftns' (fast), 'retree2' (fast)
n_threads: Number of CPU threads
Returns:
Path to aligned FASTA file
"""
methods = {
"auto": ["mafft", "--auto"],
"einsi": ["mafft", "--genafpair", "--maxiterate", "1000"],
"linsi": ["mafft", "--localpair", "--maxiterate", "1000"],
"fftnsi": ["mafft", "--fftnsi"],
"fftns": ["mafft", "--fftns"],
"retree2": ["mafft", "--retree", "2"],
}
cmd = methods.get(method, methods["auto"])
cmd += ["--thread", str(n_threads), "--inputorder", input_fasta]
with open(output_fasta, 'w') as out:
result = subprocess.run(cmd, stdout=out, stderr=subprocess.PIPE, text=True)
if result.returncode != 0:
raise RuntimeError(f"MAFFT failed:\n{result.stderr}")
# Count aligned sequences
with open(output_fasta) as f:
n_seqs = sum(1 for line in f if line.startswith('>'))
print(f"MAFFT: aligned {n_seqs} sequences → {output_fasta}")
return output_fastapython
import subprocess
import os
def run_mafft(input_fasta: str, output_fasta: str, method: str = "auto",
n_threads: int = 4) -> str:
"""
使用MAFFT比对序列。
参数:
input_fasta: 未比对FASTA文件的路径
output_fasta: 比对结果输出路径
method: 'auto'(自动选择)、'einsi'(高精度)、'linsi'(高精度,慢)、
'fftnsi'(中等速度)、'fftns'(快速)、'retree2'(快速)
n_threads: CPU线程数
返回:
比对后FASTA文件的路径
"""
methods = {
"auto": ["mafft", "--auto"],
"einsi": ["mafft", "--genafpair", "--maxiterate", "1000"],
"linsi": ["mafft", "--localpair", "--maxiterate", "1000"],
"fftnsi": ["mafft", "--fftnsi"],
"fftns": ["mafft", "--fftns"],
"retree2": ["mafft", "--retree", "2"],
}
cmd = methods.get(method, methods["auto"])
cmd += ["--thread", str(n_threads), "--inputorder", input_fasta]
with open(output_fasta, 'w') as out:
result = subprocess.run(cmd, stdout=out, stderr=subprocess.PIPE, text=True)
if result.returncode != 0:
raise RuntimeError(f"MAFFT执行失败:\n{result.stderr}")
# 统计比对序列数量
with open(output_fasta) as f:
n_seqs = sum(1 for line in f if line.startswith('>'))
print(f"MAFFT: 完成 {n_seqs} 条序列比对 → {output_fasta}")
return output_fastaMAFFT method selection guide:
MAFFT方法选择指南:
Few sequences (<200), accurate: linsi or einsi
序列数量少(<200)、追求高精度: linsi或einsi
Many sequences (<1000), moderate: fftnsi
序列数量较多(<1000)、中等精度: fftnsi
Large datasets (>1000): fftns or auto
大型数据集(>1000): fftns或auto
Ultra-fast (>10000): mafft --retree 1
超快速需求(>10000): mafft --retree 1
undefinedundefined2. Trim Alignment (Optional but Recommended)
2. 比对结果修剪(可选但推荐)
python
def trim_alignment_trimal(aligned_fasta: str, output_fasta: str,
method: str = "automated1") -> str:
"""
Trim poorly aligned columns with TrimAl.
Methods:
- 'automated1': Automatic heuristic (recommended)
- 'gappyout': Remove gappy columns
- 'strict': Strict gap threshold
"""
cmd = ["trimal", f"-{method}", "-in", aligned_fasta, "-out", output_fasta, "-fasta"]
result = subprocess.run(cmd, capture_output=True, text=True)
if result.returncode != 0:
print(f"TrimAl warning: {result.stderr}")
# Fall back to using the untrimmed alignment
import shutil
shutil.copy(aligned_fasta, output_fasta)
return output_fastapython
def trim_alignment_trimal(aligned_fasta: str, output_fasta: str,
method: str = "automated1") -> str:
"""
使用TrimAl修剪比对质量差的列。
方法:
- 'automated1': 自动启发式方法(推荐)
- 'gappyout': 移除含大量空位的列
- 'strict': 严格空位阈值
"""
cmd = ["trimal", f"-{method}", "-in", aligned_fasta, "-out", output_fasta, "-fasta"]
result = subprocess.run(cmd, capture_output=True, text=True)
if result.returncode != 0:
print(f"TrimAl警告: {result.stderr}")
# 回退使用未修剪的比对结果
import shutil
shutil.copy(aligned_fasta, output_fasta)
return output_fasta3. IQ-TREE 2 — Maximum Likelihood Tree
3. IQ-TREE 2 — 最大似然树构建
python
def run_iqtree(aligned_fasta: str, output_prefix: str,
model: str = "TEST", bootstrap: int = 1000,
n_threads: int = 4, extra_args: list = None) -> dict:
"""
Build a maximum likelihood tree with IQ-TREE 2.
Args:
aligned_fasta: Aligned FASTA file
output_prefix: Prefix for output files
model: 'TEST' for automatic model selection, or specify (e.g., 'GTR+G' for DNA,
'LG+G4' for proteins, 'JTT+G' for proteins)
bootstrap: Number of ultrafast bootstrap replicates (1000 recommended)
n_threads: Number of threads ('AUTO' to auto-detect)
extra_args: Additional IQ-TREE arguments
Returns:
Dict with paths to output files
"""
cmd = [
"iqtree2",
"-s", aligned_fasta,
"--prefix", output_prefix,
"-m", model,
"-B", str(bootstrap), # Ultrafast bootstrap
"-T", str(n_threads),
"--redo" # Overwrite existing results
]
if extra_args:
cmd.extend(extra_args)
result = subprocess.run(cmd, capture_output=True, text=True)
if result.returncode != 0:
raise RuntimeError(f"IQ-TREE failed:\n{result.stderr}")
# Print model selection result
log_file = f"{output_prefix}.log"
if os.path.exists(log_file):
with open(log_file) as f:
for line in f:
if "Best-fit model" in line:
print(f"IQ-TREE: {line.strip()}")
output_files = {
"tree": f"{output_prefix}.treefile",
"log": f"{output_prefix}.log",
"iqtree": f"{output_prefix}.iqtree", # Full report
"model": f"{output_prefix}.model.gz",
}
print(f"IQ-TREE: Tree saved to {output_files['tree']}")
return output_filespython
def run_iqtree(aligned_fasta: str, output_prefix: str,
model: str = "TEST", bootstrap: int = 1000,
n_threads: int = 4, extra_args: list = None) -> dict:
"""
使用IQ-TREE 2构建最大似然树。
参数:
aligned_fasta: 已比对的FASTA文件
output_prefix: 输出文件前缀
model: 'TEST'表示自动选择模型,或指定模型(如DNA序列用'GTR+G',
蛋白质用'LG+G4'、'JTT+G')
bootstrap: 超快自展重复次数(推荐1000次)
n_threads: 线程数('AUTO'表示自动检测)
extra_args: 额外的IQ-TREE参数
返回:
包含输出文件路径的字典
"""
cmd = [
"iqtree2",
"-s", aligned_fasta,
"--prefix", output_prefix,
"-m", model,
"-B", str(bootstrap), # 超快自展
"-T", str(n_threads),
"--redo" # 覆盖已有结果
]
if extra_args:
cmd.extend(extra_args)
result = subprocess.run(cmd, capture_output=True, text=True)
if result.returncode != 0:
raise RuntimeError(f"IQ-TREE执行失败:\n{result.stderr}")
# 打印模型选择结果
log_file = f"{output_prefix}.log"
if os.path.exists(log_file):
with open(log_file) as f:
for line in f:
if "Best-fit model" in line:
print(f"IQ-TREE: {line.strip()}")
output_files = {
"tree": f"{output_prefix}.treefile",
"log": f"{output_prefix}.log",
"iqtree": f"{output_prefix}.iqtree", # 完整报告
"model": f"{output_prefix}.model.gz",
}
print(f"IQ-TREE: 树文件已保存至 {output_files['tree']}")
return output_filesIQ-TREE model selection guide:
IQ-TREE模型选择指南:
DNA: TEST → GTR+G, HKY+G, TrN+G
DNA序列: TEST → GTR+G, HKY+G, TrN+G
Protein: TEST → LG+G4, WAG+G, JTT+G, Q.pfam+G
蛋白质序列: TEST → LG+G4, WAG+G, JTT+G, Q.pfam+G
Codon: TEST → MG+F3X4
密码子序列: TEST → MG+F3X4
For temporal (molecular clock) analysis, add:
时间(分子钟)分析需添加:
extra_args = ["--date", "dates.txt", "--clock-test", "--date-CI", "95"]
extra_args = ["--date", "dates.txt", "--clock-test", "--date-CI", "95"]
undefinedundefined4. FastTree — Fast Approximate ML
4. FastTree — 快速近似最大似然树构建
For large datasets (>1000 sequences) where IQ-TREE is too slow:
python
def run_fasttree(aligned_fasta: str, output_tree: str,
sequence_type: str = "nt", model: str = "gtr",
n_threads: int = 4) -> str:
"""
Build a fast approximate ML tree with FastTree.
Args:
sequence_type: 'nt' for nucleotide or 'aa' for amino acid
model: For nt: 'gtr' (recommended) or 'jc'; for aa: 'lg', 'wag', 'jtt'
"""
if sequence_type == "nt":
cmd = ["FastTree", "-nt", "-gtr"]
else:
cmd = ["FastTree", f"-{model}"]
cmd += [aligned_fasta]
with open(output_tree, 'w') as out:
result = subprocess.run(cmd, stdout=out, stderr=subprocess.PIPE, text=True)
if result.returncode != 0:
raise RuntimeError(f"FastTree failed:\n{result.stderr}")
print(f"FastTree: Tree saved to {output_tree}")
return output_tree针对大型数据集(>1000条序列),IQ-TREE速度过慢时可使用:
python
def run_fasttree(aligned_fasta: str, output_tree: str,
sequence_type: str = "nt", model: str = "gtr",
n_threads: int = 4) -> str:
"""
使用FastTree构建快速近似最大似然树。
参数:
sequence_type: 'nt'表示核苷酸序列,'aa'表示氨基酸序列
model: 核苷酸序列可选'gtr'(推荐)或'jc'; 氨基酸序列可选'lg', 'wag', 'jtt'
"""
if sequence_type == "nt":
cmd = ["FastTree", "-nt", "-gtr"]
else:
cmd = ["FastTree", f"-{model}"]
cmd += [aligned_fasta]
with open(output_tree, 'w') as out:
result = subprocess.run(cmd, stdout=out, stderr=subprocess.PIPE, text=True)
if result.returncode != 0:
raise RuntimeError(f"FastTree执行失败:\n{result.stderr}")
print(f"FastTree: 树文件已保存至 {output_tree}")
return output_tree5. Tree Analysis and Visualization with ETE3
5. 使用ETE3进行树分析与可视化
python
from ete3 import Tree, TreeStyle, NodeStyle, TextFace, PhyloTree
import matplotlib.pyplot as plt
def load_tree(tree_file: str) -> Tree:
"""Load a Newick tree file."""
t = Tree(tree_file)
print(f"Tree: {len(t)} leaves, {len(list(t.traverse()))} nodes")
return t
def basic_tree_stats(t: Tree) -> dict:
"""Compute basic tree statistics."""
leaves = t.get_leaves()
distances = [t.get_distance(l1, l2) for l1 in leaves[:min(50, len(leaves))]
for l2 in leaves[:min(50, len(leaves))] if l1 != l2]
stats = {
"n_leaves": len(leaves),
"n_internal_nodes": len(t) - len(leaves),
"total_branch_length": sum(n.dist for n in t.traverse()),
"max_leaf_distance": max(distances) if distances else 0,
"mean_leaf_distance": sum(distances)/len(distances) if distances else 0,
}
return stats
def find_mrca(t: Tree, leaf_names: list) -> Tree:
"""Find the most recent common ancestor of a set of leaves."""
return t.get_common_ancestor(*leaf_names)
def visualize_tree(t: Tree, output_file: str = "tree.png",
show_branch_support: bool = True,
color_groups: dict = None,
width: int = 800) -> None:
"""
Render phylogenetic tree to image.
Args:
t: ETE3 Tree object
color_groups: Dict mapping leaf_name → color (for coloring taxa)
show_branch_support: Show bootstrap values
"""
ts = TreeStyle()
ts.show_leaf_name = True
ts.show_branch_support = show_branch_support
ts.mode = "r" # 'r' = rectangular, 'c' = circular
if color_groups:
for node in t.traverse():
if node.is_leaf() and node.name in color_groups:
nstyle = NodeStyle()
nstyle["fgcolor"] = color_groups[node.name]
nstyle["size"] = 8
node.set_style(nstyle)
t.render(output_file, tree_style=ts, w=width, units="px")
print(f"Tree saved to: {output_file}")
def midpoint_root(t: Tree) -> Tree:
"""Root tree at midpoint (use when outgroup unknown)."""
t.set_outgroup(t.get_midpoint_outgroup())
return t
def prune_tree(t: Tree, keep_leaves: list) -> Tree:
"""Prune tree to keep only specified leaves."""
t.prune(keep_leaves, preserve_branch_length=True)
return tpython
from ete3 import Tree, TreeStyle, NodeStyle, TextFace, PhyloTree
import matplotlib.pyplot as plt
def load_tree(tree_file: str) -> Tree:
"""加载Newick格式的树文件。"""
t = Tree(tree_file)
print(f"树结构: {len(t)}个叶节点, {len(list(t.traverse()))}个节点")
return t
def basic_tree_stats(t: Tree) -> dict:
"""计算树的基本统计信息。"""
leaves = t.get_leaves()
distances = [t.get_distance(l1, l2) for l1 in leaves[:min(50, len(leaves))]
for l2 in leaves[:min(50, len(leaves))] if l1 != l2]
stats = {
"n_leaves": len(leaves),
"n_internal_nodes": len(t) - len(leaves),
"total_branch_length": sum(n.dist for n in t.traverse()),
"max_leaf_distance": max(distances) if distances else 0,
"mean_leaf_distance": sum(distances)/len(distances) if distances else 0,
}
return stats
def find_mrca(t: Tree, leaf_names: list) -> Tree:
"""查找一组叶节点的最近共同祖先。"""
return t.get_common_ancestor(*leaf_names)
def visualize_tree(t: Tree, output_file: str = "tree.png",
show_branch_support: bool = True,
color_groups: dict = None,
width: int = 800) -> None:
"""
将系统发育树渲染为图片。
参数:
t: ETE3 Tree对象
color_groups: 字典,键为叶节点名称,值为颜色(用于分类群着色)
show_branch_support: 是否显示自展支持值
"""
ts = TreeStyle()
ts.show_leaf_name = True
ts.show_branch_support = show_branch_support
ts.mode = "r" # 'r' = 矩形布局, 'c' = 圆形布局
if color_groups:
for node in t.traverse():
if node.is_leaf() and node.name in color_groups:
nstyle = NodeStyle()
nstyle["fgcolor"] = color_groups[node.name]
nstyle["size"] = 8
node.set_style(nstyle)
t.render(output_file, tree_style=ts, w=width, units="px")
print(f"树可视化结果已保存至: {output_file}")
def midpoint_root(t: Tree) -> Tree:
"""将树以中点为根(当外类群未知时使用)。"""
t.set_outgroup(t.get_midpoint_outgroup())
return t
def prune_tree(t: Tree, keep_leaves: list) -> Tree:
"""修剪树,仅保留指定叶节点。"""
t.prune(keep_leaves, preserve_branch_length=True)
return t6. Complete Analysis Script
6. 完整分析脚本
python
import subprocess, os
from ete3 import Tree
def full_phylogenetic_analysis(
input_fasta: str,
output_dir: str = "phylo_results",
sequence_type: str = "nt",
n_threads: int = 4,
bootstrap: int = 1000,
use_fasttree: bool = False
) -> dict:
"""
Complete phylogenetic pipeline: align → trim → tree → visualize.
Args:
input_fasta: Unaligned FASTA
sequence_type: 'nt' (nucleotide) or 'aa' (amino acid/protein)
use_fasttree: Use FastTree instead of IQ-TREE (faster for large datasets)
"""
os.makedirs(output_dir, exist_ok=True)
prefix = os.path.join(output_dir, "phylo")
print("=" * 50)
print("Step 1: Multiple Sequence Alignment (MAFFT)")
aligned = run_mafft(input_fasta, f"{prefix}_aligned.fasta",
method="auto", n_threads=n_threads)
print("\nStep 2: Tree Inference")
if use_fasttree:
tree_file = run_fasttree(
aligned, f"{prefix}.tree",
sequence_type=sequence_type,
model="gtr" if sequence_type == "nt" else "lg"
)
else:
model = "TEST" if sequence_type == "nt" else "TEST"
iqtree_files = run_iqtree(
aligned, prefix,
model=model,
bootstrap=bootstrap,
n_threads=n_threads
)
tree_file = iqtree_files["tree"]
print("\nStep 3: Tree Analysis")
t = Tree(tree_file)
t = midpoint_root(t)
stats = basic_tree_stats(t)
print(f"Tree statistics: {stats}")
print("\nStep 4: Visualization")
visualize_tree(t, f"{prefix}_tree.png", show_branch_support=True)
# Save rooted tree
rooted_tree_file = f"{prefix}_rooted.nwk"
t.write(format=1, outfile=rooted_tree_file)
results = {
"aligned_fasta": aligned,
"tree_file": tree_file,
"rooted_tree": rooted_tree_file,
"visualization": f"{prefix}_tree.png",
"stats": stats
}
print("\n" + "=" * 50)
print("Phylogenetic analysis complete!")
print(f"Results in: {output_dir}/")
return resultspython
import subprocess, os
from ete3 import Tree
def full_phylogenetic_analysis(
input_fasta: str,
output_dir: str = "phylo_results",
sequence_type: str = "nt",
n_threads: int = 4,
bootstrap: int = 1000,
use_fasttree: bool = False
) -> dict:
"""
完整系统发育分析流程: 比对 → 修剪 → 建树 → 可视化。
参数:
input_fasta: 未比对的FASTA文件
sequence_type: 'nt'(核苷酸)或'aa'(氨基酸/蛋白质)
use_fasttree: 使用FastTree替代IQ-TREE(大型数据集速度更快)
"""
os.makedirs(output_dir, exist_ok=True)
prefix = os.path.join(output_dir, "phylo")
print("=" * 50)
print("步骤1: 多序列比对(MAFFT)")
aligned = run_mafft(input_fasta, f"{prefix}_aligned.fasta",
method="auto", n_threads=n_threads)
print("\n步骤2: 树推断")
if use_fasttree:
tree_file = run_fasttree(
aligned, f"{prefix}.tree",
sequence_type=sequence_type,
model="gtr" if sequence_type == "nt" else "lg"
)
else:
model = "TEST" if sequence_type == "nt" else "TEST"
iqtree_files = run_iqtree(
aligned, prefix,
model=model,
bootstrap=bootstrap,
n_threads=n_threads
)
tree_file = iqtree_files["tree"]
print("\n步骤3: 树分析")
t = Tree(tree_file)
t = midpoint_root(t)
stats = basic_tree_stats(t)
print(f"树统计信息: {stats}")
print("\n步骤4: 可视化")
visualize_tree(t, f"{prefix}_tree.png", show_branch_support=True)
# 保存有根树
rooted_tree_file = f"{prefix}_rooted.nwk"
t.write(format=1, outfile=rooted_tree_file)
results = {
"aligned_fasta": aligned,
"tree_file": tree_file,
"rooted_tree": rooted_tree_file,
"visualization": f"{prefix}_tree.png",
"stats": stats
}
print("\n" + "=" * 50)
print("系统发育分析完成!")
print(f"结果保存至: {output_dir}/")
return resultsIQ-TREE Model Guide
IQ-TREE模型指南
DNA Models
DNA模型
| Model | Description | Use case |
|---|---|---|
| General Time Reversible + Gamma | Most flexible DNA model |
| Hasegawa-Kishino-Yano + Gamma | Two-rate model (common) |
| Tamura-Nei | Unequal transitions |
| Jukes-Cantor | Simplest; all rates equal |
| 模型 | 描述 | 适用场景 |
|---|---|---|
| 通用时间可逆模型+Gamma分布 | 最灵活的DNA模型 |
| Hasegawa-Kishino-Yano模型+Gamma分布 | 双速率模型(常用) |
| Tamura-Nei模型 | 不等转换速率 |
| Jukes-Cantor模型 | 最简单模型;所有速率相等 |
Protein Models
蛋白质模型
| Model | Description | Use case |
|---|---|---|
| Le-Gascuel + Gamma | Best average protein model |
| Whelan-Goldman | Widely used |
| Jones-Taylor-Thornton | Classical model |
| pfam-trained | For Pfam-like protein families |
| Bird-specific | Vertebrate proteins |
Tip: Use to let IQ-TREE automatically select the best model.
-m TEST| 模型 | 描述 | 适用场景 |
|---|---|---|
| Le-Gascuel模型+Gamma分布 | 平均性能最佳的蛋白质模型 |
| Whelan-Goldman模型 | 广泛使用 |
| Jones-Taylor-Thornton模型 | 经典模型 |
| Pfam训练模型 | 类Pfam蛋白质家族 |
| 鸟类特异性模型 | 脊椎动物蛋白质 |
提示: 使用让IQ-TREE自动选择最优模型。
-m TESTBest Practices
最佳实践
- Alignment quality first: Poor alignment → unreliable trees; check alignment manually
- Use for small (<200 seq),
linsiorfftnsfor large alignmentsauto - Model selection: Always use for IQ-TREE unless you have a specific reason
-m TEST - Bootstrap: Use ≥1000 ultrafast bootstraps () for branch support
-B 1000 - Root the tree: Unrooted trees can be misleading; use outgroup or midpoint rooting
- FastTree for >5000 sequences: IQ-TREE becomes slow; FastTree is 10–100× faster
- Trim long alignments: TrimAl removes unreliable columns; improves tree accuracy
- Check for recombination in viral/bacterial sequences before building trees (,
RDP4)GARD
- 优先保证比对质量:差的比对会导致不可靠的树;需手动检查比对结果
- 小数据集(<200条序列)用,大数据集用
linsi或fftnsauto - 模型选择:除非有特定理由,否则IQ-TREE始终使用
-m TEST - 自展验证:使用≥1000次超快自展()获取分支支持值
-B 1000 - 树的根节点:无根树易产生误导;使用外类群或中点法定根
- 超大数据集(>5000条序列)用FastTree:IQ-TREE速度变慢;FastTree快10–100倍
- 修剪长比对序列:TrimAl移除不可靠列;提升树的准确性
- 病毒/细菌序列建树前检查重组:使用、
RDP4工具GARD
Additional Resources
额外资源
- MAFFT: https://mafft.cbrc.jp/alignment/software/
- IQ-TREE 2: http://www.iqtree.org/ | Tutorial: https://www.iqtree.org/workshop/molevol2022
- FastTree: http://www.microbesonline.org/fasttree/
- ETE3: http://etetoolkit.org/
- FigTree (GUI visualization): https://tree.bio.ed.ac.uk/software/figtree/
- iTOL (web visualization): https://itol.embl.de/
- MUSCLE (alternative aligner): https://www.drive5.com/muscle/
- TrimAl (alignment trimming): https://vicfero.github.io/trimal/
- MAFFT: https://mafft.cbrc.jp/alignment/software/
- IQ-TREE 2: http://www.iqtree.org/ | 教程: https://www.iqtree.org/workshop/molevol2022
- FastTree: http://www.microbesonline.org/fasttree/
- ETE3: http://etetoolkit.org/
- FigTree(GUI可视化工具): https://tree.bio.ed.ac.uk/software/figtree/
- iTOL(网页可视化工具): https://itol.embl.de/
- MUSCLE(替代比对工具): https://www.drive5.com/muscle/
- TrimAl(比对修剪工具): https://vicfero.github.io/trimal/