phylogenetics

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Phylogenetics

系统发育分析

Overview

概述

Phylogenetic analysis reconstructs the evolutionary history of biological sequences (genes, proteins, genomes) by inferring the branching pattern of descent. This skill covers the standard pipeline:

MAFFT — Multiple sequence alignment
IQ-TREE 2 — Maximum likelihood tree inference with model selection
FastTree — Fast approximate maximum likelihood (for large datasets)
ETE3 — Python library for tree manipulation and visualization

Installation:

bash

undefined

系统发育分析通过推断生物序列（基因、蛋白质、基因组）的分支遗传模式，重建其进化历史。本技能涵盖标准流程：

MAFFT — 多序列比对
IQ-TREE 2 — 带模型选择的最大似然树推断
FastTree — 快速近似最大似然法（适用于大型数据集）
ETE3 — 用于树操作和可视化的Python库

安装方法：

bash

undefined

Conda (recommended for CLI tools)

Conda（CLI工具推荐方式）

conda install -c bioconda mafft iqtree fasttree pip install ete3

undefined

conda install -c bioconda mafft iqtree fasttree pip install ete3

undefined

When to Use This Skill

何时使用本技能

Use phylogenetics when:

Evolutionary relationships: Which organism/gene is most closely related to my sequence?
Viral phylodynamics: Trace outbreak spread and estimate transmission dates
Protein family analysis: Infer evolutionary relationships within a gene family
Horizontal gene transfer detection: Identify genes with discordant species/gene trees
Ancestral sequence reconstruction: Infer ancestral protein sequences
Molecular clock analysis: Estimate divergence dates using temporal sampling
GWAS companion: Place variants in evolutionary context (e.g., SARS-CoV-2 variants)
Microbiology: Species phylogeny from 16S rRNA or core genome phylogeny

在以下场景中使用系统发育分析：

进化关系分析：哪种生物/基因与我的序列亲缘关系最近？
病毒系统动力学：追踪疫情传播并估算传播日期
蛋白家族分析：推断基因家族内的进化关系
水平基因转移检测：识别物种/基因树不一致的基因
祖先序列重建：推断祖先蛋白质序列
分子钟分析：利用时间采样估算分化日期
GWAS辅助分析：将变异置于进化背景中（如SARS-CoV-2变异株）
微生物学：基于16S rRNA的物种系统发育或核心基因组系统发育

Standard Workflow

标准工作流程

1. Multiple Sequence Alignment with MAFFT

1. 使用MAFFT进行多序列比对

python

import subprocess
import os

def run_mafft(input_fasta: str, output_fasta: str, method: str = "auto",
               n_threads: int = 4) -> str:
    """
    Align sequences with MAFFT.

    Args:
        input_fasta: Path to unaligned FASTA file
        output_fasta: Path for aligned output
        method: 'auto' (auto-select), 'einsi' (accurate), 'linsi' (accurate, slow),
                'fftnsi' (medium), 'fftns' (fast), 'retree2' (fast)
        n_threads: Number of CPU threads

    Returns:
        Path to aligned FASTA file
    """
    methods = {
        "auto": ["mafft", "--auto"],
        "einsi": ["mafft", "--genafpair", "--maxiterate", "1000"],
        "linsi": ["mafft", "--localpair", "--maxiterate", "1000"],
        "fftnsi": ["mafft", "--fftnsi"],
        "fftns": ["mafft", "--fftns"],
        "retree2": ["mafft", "--retree", "2"],
    }

    cmd = methods.get(method, methods["auto"])
    cmd += ["--thread", str(n_threads), "--inputorder", input_fasta]

    with open(output_fasta, 'w') as out:
        result = subprocess.run(cmd, stdout=out, stderr=subprocess.PIPE, text=True)

    if result.returncode != 0:
        raise RuntimeError(f"MAFFT failed:\n{result.stderr}")

    # Count aligned sequences
    with open(output_fasta) as f:
        n_seqs = sum(1 for line in f if line.startswith('>'))
    print(f"MAFFT: aligned {n_seqs} sequences → {output_fasta}")

    return output_fasta

python

import subprocess
import os

def run_mafft(input_fasta: str, output_fasta: str, method: str = "auto",
               n_threads: int = 4) -> str:
    """
    使用MAFFT比对序列。

    参数:
        input_fasta: 未比对FASTA文件的路径
        output_fasta: 比对结果输出路径
        method: 'auto'（自动选择）、'einsi'（高精度）、'linsi'（高精度，慢）、
                'fftnsi'（中等速度）、'fftns'（快速）、'retree2'（快速）
        n_threads: CPU线程数

    返回:
        比对后FASTA文件的路径
    """
    methods = {
        "auto": ["mafft", "--auto"],
        "einsi": ["mafft", "--genafpair", "--maxiterate", "1000"],
        "linsi": ["mafft", "--localpair", "--maxiterate", "1000"],
        "fftnsi": ["mafft", "--fftnsi"],
        "fftns": ["mafft", "--fftns"],
        "retree2": ["mafft", "--retree", "2"],
    }

    cmd = methods.get(method, methods["auto"])
    cmd += ["--thread", str(n_threads), "--inputorder", input_fasta]

    with open(output_fasta, 'w') as out:
        result = subprocess.run(cmd, stdout=out, stderr=subprocess.PIPE, text=True)

    if result.returncode != 0:
        raise RuntimeError(f"MAFFT执行失败:\n{result.stderr}")

    # 统计比对序列数量
    with open(output_fasta) as f:
        n_seqs = sum(1 for line in f if line.startswith('>'))
    print(f"MAFFT: 完成 {n_seqs} 条序列比对 → {output_fasta}")

    return output_fasta

MAFFT method selection guide:

MAFFT方法选择指南:

Few sequences (<200), accurate: linsi or einsi

序列数量少(<200)、追求高精度: linsi或einsi

Many sequences (<1000), moderate: fftnsi

序列数量较多(<1000)、中等精度: fftnsi

Large datasets (>1000): fftns or auto

大型数据集(>1000): fftns或auto

Ultra-fast (>10000): mafft --retree 1

超快速需求(>10000): mafft --retree 1

undefined

undefined

2. Trim Alignment (Optional but Recommended)

2. 比对结果修剪（可选但推荐）

python

def trim_alignment_trimal(aligned_fasta: str, output_fasta: str,
                            method: str = "automated1") -> str:
    """
    Trim poorly aligned columns with TrimAl.

    Methods:
    - 'automated1': Automatic heuristic (recommended)
    - 'gappyout': Remove gappy columns
    - 'strict': Strict gap threshold
    """
    cmd = ["trimal", f"-{method}", "-in", aligned_fasta, "-out", output_fasta, "-fasta"]
    result = subprocess.run(cmd, capture_output=True, text=True)
    if result.returncode != 0:
        print(f"TrimAl warning: {result.stderr}")
        # Fall back to using the untrimmed alignment
        import shutil
        shutil.copy(aligned_fasta, output_fasta)
    return output_fasta

python

def trim_alignment_trimal(aligned_fasta: str, output_fasta: str,
                            method: str = "automated1") -> str:
    """
    使用TrimAl修剪比对质量差的列。

    方法:
    - 'automated1': 自动启发式方法（推荐）
    - 'gappyout': 移除含大量空位的列
    - 'strict': 严格空位阈值
    """
    cmd = ["trimal", f"-{method}", "-in", aligned_fasta, "-out", output_fasta, "-fasta"]
    result = subprocess.run(cmd, capture_output=True, text=True)
    if result.returncode != 0:
        print(f"TrimAl警告: {result.stderr}")
        # 回退使用未修剪的比对结果
        import shutil
        shutil.copy(aligned_fasta, output_fasta)
    return output_fasta

3. IQ-TREE 2 — Maximum Likelihood Tree

3. IQ-TREE 2 — 最大似然树构建

python

def run_iqtree(aligned_fasta: str, output_prefix: str,
                model: str = "TEST", bootstrap: int = 1000,
                n_threads: int = 4, extra_args: list = None) -> dict:
    """
    Build a maximum likelihood tree with IQ-TREE 2.

    Args:
        aligned_fasta: Aligned FASTA file
        output_prefix: Prefix for output files
        model: 'TEST' for automatic model selection, or specify (e.g., 'GTR+G' for DNA,
               'LG+G4' for proteins, 'JTT+G' for proteins)
        bootstrap: Number of ultrafast bootstrap replicates (1000 recommended)
        n_threads: Number of threads ('AUTO' to auto-detect)
        extra_args: Additional IQ-TREE arguments

    Returns:
        Dict with paths to output files
    """
    cmd = [
        "iqtree2",
        "-s", aligned_fasta,
        "--prefix", output_prefix,
        "-m", model,
        "-B", str(bootstrap),   # Ultrafast bootstrap
        "-T", str(n_threads),
        "--redo"                # Overwrite existing results
    ]

    if extra_args:
        cmd.extend(extra_args)

    result = subprocess.run(cmd, capture_output=True, text=True)

    if result.returncode != 0:
        raise RuntimeError(f"IQ-TREE failed:\n{result.stderr}")

    # Print model selection result
    log_file = f"{output_prefix}.log"
    if os.path.exists(log_file):
        with open(log_file) as f:
            for line in f:
                if "Best-fit model" in line:
                    print(f"IQ-TREE: {line.strip()}")

    output_files = {
        "tree": f"{output_prefix}.treefile",
        "log": f"{output_prefix}.log",
        "iqtree": f"{output_prefix}.iqtree",  # Full report
        "model": f"{output_prefix}.model.gz",
    }

    print(f"IQ-TREE: Tree saved to {output_files['tree']}")
    return output_files

python

def run_iqtree(aligned_fasta: str, output_prefix: str,
                model: str = "TEST", bootstrap: int = 1000,
                n_threads: int = 4, extra_args: list = None) -> dict:
    """
    使用IQ-TREE 2构建最大似然树。

    参数:
        aligned_fasta: 已比对的FASTA文件
        output_prefix: 输出文件前缀
        model: 'TEST'表示自动选择模型，或指定模型（如DNA序列用'GTR+G'，
               蛋白质用'LG+G4'、'JTT+G'）
        bootstrap: 超快自展重复次数（推荐1000次）
        n_threads: 线程数（'AUTO'表示自动检测）
        extra_args: 额外的IQ-TREE参数

    返回:
        包含输出文件路径的字典
    """
    cmd = [
        "iqtree2",
        "-s", aligned_fasta,
        "--prefix", output_prefix,
        "-m", model,
        "-B", str(bootstrap),   # 超快自展
        "-T", str(n_threads),
        "--redo"                # 覆盖已有结果
    ]

    if extra_args:
        cmd.extend(extra_args)

    result = subprocess.run(cmd, capture_output=True, text=True)

    if result.returncode != 0:
        raise RuntimeError(f"IQ-TREE执行失败:\n{result.stderr}")

    # 打印模型选择结果
    log_file = f"{output_prefix}.log"
    if os.path.exists(log_file):
        with open(log_file) as f:
            for line in f:
                if "Best-fit model" in line:
                    print(f"IQ-TREE: {line.strip()}")

    output_files = {
        "tree": f"{output_prefix}.treefile",
        "log": f"{output_prefix}.log",
        "iqtree": f"{output_prefix}.iqtree",  # 完整报告
        "model": f"{output_prefix}.model.gz",
    }

    print(f"IQ-TREE: 树文件已保存至 {output_files['tree']}")
    return output_files

IQ-TREE model selection guide:

IQ-TREE模型选择指南:

DNA: TEST → GTR+G, HKY+G, TrN+G

DNA序列: TEST → GTR+G, HKY+G, TrN+G

Protein: TEST → LG+G4, WAG+G, JTT+G, Q.pfam+G

蛋白质序列: TEST → LG+G4, WAG+G, JTT+G, Q.pfam+G

Codon: TEST → MG+F3X4

密码子序列: TEST → MG+F3X4

For temporal (molecular clock) analysis, add:

时间（分子钟）分析需添加:

extra_args = ["--date", "dates.txt", "--clock-test", "--date-CI", "95"]

undefined

undefined

4. FastTree — Fast Approximate ML

4. FastTree — 快速近似最大似然树构建

For large datasets (>1000 sequences) where IQ-TREE is too slow:

python

def run_fasttree(aligned_fasta: str, output_tree: str,
                  sequence_type: str = "nt", model: str = "gtr",
                  n_threads: int = 4) -> str:
    """
    Build a fast approximate ML tree with FastTree.

    Args:
        sequence_type: 'nt' for nucleotide or 'aa' for amino acid
        model: For nt: 'gtr' (recommended) or 'jc'; for aa: 'lg', 'wag', 'jtt'
    """
    if sequence_type == "nt":
        cmd = ["FastTree", "-nt", "-gtr"]
    else:
        cmd = ["FastTree", f"-{model}"]

    cmd += [aligned_fasta]

    with open(output_tree, 'w') as out:
        result = subprocess.run(cmd, stdout=out, stderr=subprocess.PIPE, text=True)

    if result.returncode != 0:
        raise RuntimeError(f"FastTree failed:\n{result.stderr}")

    print(f"FastTree: Tree saved to {output_tree}")
    return output_tree

针对大型数据集（>1000条序列），IQ-TREE速度过慢时可使用：

python

def run_fasttree(aligned_fasta: str, output_tree: str,
                  sequence_type: str = "nt", model: str = "gtr",
                  n_threads: int = 4) -> str:
    """
    使用FastTree构建快速近似最大似然树。

    参数:
        sequence_type: 'nt'表示核苷酸序列，'aa'表示氨基酸序列
        model: 核苷酸序列可选'gtr'（推荐）或'jc'; 氨基酸序列可选'lg', 'wag', 'jtt'
    """
    if sequence_type == "nt":
        cmd = ["FastTree", "-nt", "-gtr"]
    else:
        cmd = ["FastTree", f"-{model}"]

    cmd += [aligned_fasta]

    with open(output_tree, 'w') as out:
        result = subprocess.run(cmd, stdout=out, stderr=subprocess.PIPE, text=True)

    if result.returncode != 0:
        raise RuntimeError(f"FastTree执行失败:\n{result.stderr}")

    print(f"FastTree: 树文件已保存至 {output_tree}")
    return output_tree

5. Tree Analysis and Visualization with ETE3

5. 使用ETE3进行树分析与可视化

python

from ete3 import Tree, TreeStyle, NodeStyle, TextFace, PhyloTree
import matplotlib.pyplot as plt

def load_tree(tree_file: str) -> Tree:
    """Load a Newick tree file."""
    t = Tree(tree_file)
    print(f"Tree: {len(t)} leaves, {len(list(t.traverse()))} nodes")
    return t

def basic_tree_stats(t: Tree) -> dict:
    """Compute basic tree statistics."""
    leaves = t.get_leaves()
    distances = [t.get_distance(l1, l2) for l1 in leaves[:min(50, len(leaves))]
                 for l2 in leaves[:min(50, len(leaves))] if l1 != l2]

    stats = {
        "n_leaves": len(leaves),
        "n_internal_nodes": len(t) - len(leaves),
        "total_branch_length": sum(n.dist for n in t.traverse()),
        "max_leaf_distance": max(distances) if distances else 0,
        "mean_leaf_distance": sum(distances)/len(distances) if distances else 0,
    }
    return stats

def find_mrca(t: Tree, leaf_names: list) -> Tree:
    """Find the most recent common ancestor of a set of leaves."""
    return t.get_common_ancestor(*leaf_names)

def visualize_tree(t: Tree, output_file: str = "tree.png",
                    show_branch_support: bool = True,
                    color_groups: dict = None,
                    width: int = 800) -> None:
    """
    Render phylogenetic tree to image.

    Args:
        t: ETE3 Tree object
        color_groups: Dict mapping leaf_name → color (for coloring taxa)
        show_branch_support: Show bootstrap values
    """
    ts = TreeStyle()
    ts.show_leaf_name = True
    ts.show_branch_support = show_branch_support
    ts.mode = "r"  # 'r' = rectangular, 'c' = circular

    if color_groups:
        for node in t.traverse():
            if node.is_leaf() and node.name in color_groups:
                nstyle = NodeStyle()
                nstyle["fgcolor"] = color_groups[node.name]
                nstyle["size"] = 8
                node.set_style(nstyle)

    t.render(output_file, tree_style=ts, w=width, units="px")
    print(f"Tree saved to: {output_file}")

def midpoint_root(t: Tree) -> Tree:
    """Root tree at midpoint (use when outgroup unknown)."""
    t.set_outgroup(t.get_midpoint_outgroup())
    return t

def prune_tree(t: Tree, keep_leaves: list) -> Tree:
    """Prune tree to keep only specified leaves."""
    t.prune(keep_leaves, preserve_branch_length=True)
    return t

python

from ete3 import Tree, TreeStyle, NodeStyle, TextFace, PhyloTree
import matplotlib.pyplot as plt

def load_tree(tree_file: str) -> Tree:
    """加载Newick格式的树文件。"""
    t = Tree(tree_file)
    print(f"树结构: {len(t)}个叶节点, {len(list(t.traverse()))}个节点")
    return t

def basic_tree_stats(t: Tree) -> dict:
    """计算树的基本统计信息。"""
    leaves = t.get_leaves()
    distances = [t.get_distance(l1, l2) for l1 in leaves[:min(50, len(leaves))]
                 for l2 in leaves[:min(50, len(leaves))] if l1 != l2]

    stats = {
        "n_leaves": len(leaves),
        "n_internal_nodes": len(t) - len(leaves),
        "total_branch_length": sum(n.dist for n in t.traverse()),
        "max_leaf_distance": max(distances) if distances else 0,
        "mean_leaf_distance": sum(distances)/len(distances) if distances else 0,
    }
    return stats

def find_mrca(t: Tree, leaf_names: list) -> Tree:
    """查找一组叶节点的最近共同祖先。"""
    return t.get_common_ancestor(*leaf_names)

def visualize_tree(t: Tree, output_file: str = "tree.png",
                    show_branch_support: bool = True,
                    color_groups: dict = None,
                    width: int = 800) -> None:
    """
    将系统发育树渲染为图片。

    参数:
        t: ETE3 Tree对象
        color_groups: 字典，键为叶节点名称，值为颜色（用于分类群着色）
        show_branch_support: 是否显示自展支持值
    """
    ts = TreeStyle()
    ts.show_leaf_name = True
    ts.show_branch_support = show_branch_support
    ts.mode = "r"  # 'r' = 矩形布局, 'c' = 圆形布局

    if color_groups:
        for node in t.traverse():
            if node.is_leaf() and node.name in color_groups:
                nstyle = NodeStyle()
                nstyle["fgcolor"] = color_groups[node.name]
                nstyle["size"] = 8
                node.set_style(nstyle)

    t.render(output_file, tree_style=ts, w=width, units="px")
    print(f"树可视化结果已保存至: {output_file}")

def midpoint_root(t: Tree) -> Tree:
    """将树以中点为根（当外类群未知时使用）。"""
    t.set_outgroup(t.get_midpoint_outgroup())
    return t

def prune_tree(t: Tree, keep_leaves: list) -> Tree:
    """修剪树，仅保留指定叶节点。"""
    t.prune(keep_leaves, preserve_branch_length=True)
    return t

6. Complete Analysis Script

6. 完整分析脚本

python

import subprocess, os
from ete3 import Tree

def full_phylogenetic_analysis(
    input_fasta: str,
    output_dir: str = "phylo_results",
    sequence_type: str = "nt",
    n_threads: int = 4,
    bootstrap: int = 1000,
    use_fasttree: bool = False
) -> dict:
    """
    Complete phylogenetic pipeline: align → trim → tree → visualize.

    Args:
        input_fasta: Unaligned FASTA
        sequence_type: 'nt' (nucleotide) or 'aa' (amino acid/protein)
        use_fasttree: Use FastTree instead of IQ-TREE (faster for large datasets)
    """
    os.makedirs(output_dir, exist_ok=True)
    prefix = os.path.join(output_dir, "phylo")

    print("=" * 50)
    print("Step 1: Multiple Sequence Alignment (MAFFT)")
    aligned = run_mafft(input_fasta, f"{prefix}_aligned.fasta",
                         method="auto", n_threads=n_threads)

    print("\nStep 2: Tree Inference")
    if use_fasttree:
        tree_file = run_fasttree(
            aligned, f"{prefix}.tree",
            sequence_type=sequence_type,
            model="gtr" if sequence_type == "nt" else "lg"
        )
    else:
        model = "TEST" if sequence_type == "nt" else "TEST"
        iqtree_files = run_iqtree(
            aligned, prefix,
            model=model,
            bootstrap=bootstrap,
            n_threads=n_threads
        )
        tree_file = iqtree_files["tree"]

    print("\nStep 3: Tree Analysis")
    t = Tree(tree_file)
    t = midpoint_root(t)

    stats = basic_tree_stats(t)
    print(f"Tree statistics: {stats}")

    print("\nStep 4: Visualization")
    visualize_tree(t, f"{prefix}_tree.png", show_branch_support=True)

    # Save rooted tree
    rooted_tree_file = f"{prefix}_rooted.nwk"
    t.write(format=1, outfile=rooted_tree_file)

    results = {
        "aligned_fasta": aligned,
        "tree_file": tree_file,
        "rooted_tree": rooted_tree_file,
        "visualization": f"{prefix}_tree.png",
        "stats": stats
    }

    print("\n" + "=" * 50)
    print("Phylogenetic analysis complete!")
    print(f"Results in: {output_dir}/")
    return results

python

import subprocess, os
from ete3 import Tree

def full_phylogenetic_analysis(
    input_fasta: str,
    output_dir: str = "phylo_results",
    sequence_type: str = "nt",
    n_threads: int = 4,
    bootstrap: int = 1000,
    use_fasttree: bool = False
) -> dict:
    """
    完整系统发育分析流程: 比对 → 修剪 → 建树 → 可视化。

    参数:
        input_fasta: 未比对的FASTA文件
        sequence_type: 'nt'（核苷酸）或'aa'（氨基酸/蛋白质）
        use_fasttree: 使用FastTree替代IQ-TREE（大型数据集速度更快）
    """
    os.makedirs(output_dir, exist_ok=True)
    prefix = os.path.join(output_dir, "phylo")

    print("=" * 50)
    print("步骤1: 多序列比对（MAFFT）")
    aligned = run_mafft(input_fasta, f"{prefix}_aligned.fasta",
                         method="auto", n_threads=n_threads)

    print("\n步骤2: 树推断")
    if use_fasttree:
        tree_file = run_fasttree(
            aligned, f"{prefix}.tree",
            sequence_type=sequence_type,
            model="gtr" if sequence_type == "nt" else "lg"
        )
    else:
        model = "TEST" if sequence_type == "nt" else "TEST"
        iqtree_files = run_iqtree(
            aligned, prefix,
            model=model,
            bootstrap=bootstrap,
            n_threads=n_threads
        )
        tree_file = iqtree_files["tree"]

    print("\n步骤3: 树分析")
    t = Tree(tree_file)
    t = midpoint_root(t)

    stats = basic_tree_stats(t)
    print(f"树统计信息: {stats}")

    print("\n步骤4: 可视化")
    visualize_tree(t, f"{prefix}_tree.png", show_branch_support=True)

    # 保存有根树
    rooted_tree_file = f"{prefix}_rooted.nwk"
    t.write(format=1, outfile=rooted_tree_file)

    results = {
        "aligned_fasta": aligned,
        "tree_file": tree_file,
        "rooted_tree": rooted_tree_file,
        "visualization": f"{prefix}_tree.png",
        "stats": stats
    }

    print("\n" + "=" * 50)
    print("系统发育分析完成!")
    print(f"结果保存至: {output_dir}/")
    return results

IQ-TREE Model Guide

IQ-TREE模型指南

DNA Models

DNA模型

Model	Description	Use case
`GTR+G4`	General Time Reversible + Gamma	Most flexible DNA model
`HKY+G4`	Hasegawa-Kishino-Yano + Gamma	Two-rate model (common)
`TrN+G4`	Tamura-Nei	Unequal transitions
`JC`	Jukes-Cantor	Simplest; all rates equal

模型	描述	适用场景
`GTR+G4`	通用时间可逆模型+Gamma分布	最灵活的DNA模型
`HKY+G4`	Hasegawa-Kishino-Yano模型+Gamma分布	双速率模型（常用）
`TrN+G4`	Tamura-Nei模型	不等转换速率
`JC`	Jukes-Cantor模型	最简单模型；所有速率相等

Protein Models

蛋白质模型

Model	Description	Use case
`LG+G4`	Le-Gascuel + Gamma	Best average protein model
`WAG+G4`	Whelan-Goldman	Widely used
`JTT+G4`	Jones-Taylor-Thornton	Classical model
`Q.pfam+G4`	pfam-trained	For Pfam-like protein families
`Q.bird+G4`	Bird-specific	Vertebrate proteins

Tip: Use

-m TEST

to let IQ-TREE automatically select the best model.

模型	描述	适用场景
`LG+G4`	Le-Gascuel模型+Gamma分布	平均性能最佳的蛋白质模型
`WAG+G4`	Whelan-Goldman模型	广泛使用
`JTT+G4`	Jones-Taylor-Thornton模型	经典模型
`Q.pfam+G4`	Pfam训练模型	类Pfam蛋白质家族
`Q.bird+G4`	鸟类特异性模型	脊椎动物蛋白质

提示: 使用

-m TEST

让IQ-TREE自动选择最优模型。

Best Practices

最佳实践

Alignment quality first: Poor alignment → unreliable trees; check alignment manually
Use
linsi
for small (<200 seq),
fftns
or
auto
for large alignments
Model selection: Always use
```
-m TEST
```
for IQ-TREE unless you have a specific reason
Bootstrap: Use ≥1000 ultrafast bootstraps (
```
-B 1000
```
) for branch support
Root the tree: Unrooted trees can be misleading; use outgroup or midpoint rooting
FastTree for >5000 sequences: IQ-TREE becomes slow; FastTree is 10–100× faster
Trim long alignments: TrimAl removes unreliable columns; improves tree accuracy
Check for recombination in viral/bacterial sequences before building trees (
```
RDP4
```
,
```
GARD
```
)

优先保证比对质量：差的比对会导致不可靠的树；需手动检查比对结果
小数据集(<200条序列)用
linsi
，大数据集用
fftns
或
auto
模型选择：除非有特定理由，否则IQ-TREE始终使用
```
-m TEST
```
自展验证：使用≥1000次超快自展（
```
-B 1000
```
）获取分支支持值
树的根节点：无根树易产生误导；使用外类群或中点法定根
超大数据集(>5000条序列)用FastTree：IQ-TREE速度变慢；FastTree快10–100倍
修剪长比对序列：TrimAl移除不可靠列；提升树的准确性
病毒/细菌序列建树前检查重组：使用
```
RDP4
```
、
```
GARD
```
工具

Additional Resources

额外资源

MAFFT: https://mafft.cbrc.jp/alignment/software/
IQ-TREE 2: http://www.iqtree.org/ | Tutorial: https://www.iqtree.org/workshop/molevol2022
FastTree: http://www.microbesonline.org/fasttree/
ETE3: http://etetoolkit.org/
FigTree (GUI visualization): https://tree.bio.ed.ac.uk/software/figtree/
iTOL (web visualization): https://itol.embl.de/
MUSCLE (alternative aligner): https://www.drive5.com/muscle/
TrimAl (alignment trimming): https://vicfero.github.io/trimal/

MAFFT: https://mafft.cbrc.jp/alignment/software/
IQ-TREE 2: http://www.iqtree.org/ | 教程: https://www.iqtree.org/workshop/molevol2022
FastTree: http://www.microbesonline.org/fasttree/
ETE3: http://etetoolkit.org/
FigTree（GUI可视化工具）: https://tree.bio.ed.ac.uk/software/figtree/
iTOL（网页可视化工具）: https://itol.embl.de/
MUSCLE（替代比对工具）: https://www.drive5.com/muscle/
TrimAl（比对修剪工具）: https://vicfero.github.io/trimal/