seaborn

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Seaborn - Statistical Data Visualization

Seaborn - 统计数据可视化

Seaborn helps you explore and understand your data through beautiful, informative statistical plots. It automates complex tasks like calculating confidence intervals, aggregating data, and creating faceted grids.
Seaborn 帮助你通过美观、信息丰富的统计图表探索和理解数据。它可以自动完成计算置信区间、聚合数据以及创建分面网格等复杂任务。

When to Use

适用场景

  • Visualizing complex relationships between multiple variables (relplot)
  • Examining univariate and bivariate distributions (displot, kdeplot)
  • Comparing categories with statistical summaries (catplot, boxplot, violinplot)
  • Visualizing linear regression models and their uncertainty (regplot, lmplot)
  • Creating heatmaps and cluster maps for large matrices
  • Building multi-plot grids based on data subsets (FacetGrid)
  • Setting high-level aesthetic themes for Matplotlib figures
  • 可视化多个变量之间的复杂关系(relplot)
  • 检查单变量和双变量分布(displot、kdeplot)
  • 通过统计摘要比较分类数据(catplot、boxplot、violinplot)
  • 可视化线性回归模型及其不确定性(regplot、lmplot)
  • 为大型矩阵创建热力图和聚类图
  • 基于数据子集构建多图网格(FacetGrid)
  • 为Matplotlib图形设置高级美学主题

Reference Documentation

参考文档

Official docs: https://seaborn.pydata.org/
Example gallery: https://seaborn.pydata.org/examples/index.html
Search patterns:
sns.load_dataset
,
sns.relplot
,
sns.catplot
,
sns.set_theme
,
sns.heatmap
官方文档https://seaborn.pydata.org/
示例图库https://seaborn.pydata.org/examples/index.html
常用搜索关键词
sns.load_dataset
,
sns.relplot
,
sns.catplot
,
sns.set_theme
,
sns.heatmap

Core Principles

核心原则

Figure-Level vs. Axes-Level Functions

图级别函数 vs 轴级别函数

Function TypeExamplesKey Characteristic
Figure-Levelrelplot, displot, catplotCreates its own figure (FacetGrid). Best for subplots (col, row).
Axes-Levelscatterplot, histplot, boxplotPlots onto a specific ax. Best for integration with Matplotlib layouts.
函数类型示例核心特性
图级别relplot, displot, catplot创建独立的图形(FacetGrid),最适合子图布局(col、row参数)。
轴级别scatterplot, histplot, boxplot在指定的轴上绘图,最适合与Matplotlib布局集成。

Use Seaborn For

适合用Seaborn的场景

  • Statistical analysis and exploratory data research (EDA).
  • Working directly with Pandas DataFrames in "tidy" (long-form) format.
  • Automatic calculation of 95% confidence intervals (error bars).
  • Rapidly changing visual themes and color palettes.
  • 统计分析与探索性数据研究(EDA)。
  • 直接处理“整洁格式”(长格式)的Pandas DataFrames。
  • 自动计算95%置信区间(误差棒)。
  • 快速切换可视化主题和调色板。

Do NOT Use For

不适合用Seaborn的场景

  • Very low-level custom graphics (use Matplotlib).
  • Interactive web visualizations (use Plotly).
  • 3D plotting (use Matplotlib mplot3d or PyVista).
  • Network graphs (use NetworkX with Matplotlib).
  • 极低层级的自定义图形(使用Matplotlib)。
  • 交互式网页可视化(使用Plotly)。
  • 3D绘图(使用Matplotlib mplot3d或PyVista)。
  • 网络图(使用NetworkX结合Matplotlib)。

Quick Reference

快速参考

Installation

安装

bash
pip install seaborn
bash
pip install seaborn

Standard Imports

标准导入

python
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
python
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

Apply the default theme

应用默认主题

sns.set_theme()
undefined
sns.set_theme()
undefined

Basic Pattern - Tidy Data Mapping

基础模式 - 整洁数据映射

python
import seaborn as sns
python
import seaborn as sns

Load an example dataset

加载示例数据集

tips = sns.load_dataset("tips")
tips = sns.load_dataset("tips")

Create a scatter plot with semantic mapping

创建带语义映射的散点图

sns.relplot( data=tips, x="total_bill", y="tip", hue="smoker", style="time", size="size", ) plt.show()
undefined
sns.relplot( data=tips, x="total_bill", y="tip", hue="smoker", style="time", size="size", ) plt.show()
undefined

Critical Rules

关键规则

✅ DO

✅ 正确做法

  • Use Tidy Data - Ensure your DataFrame is in "long-form" (one row per observation).
  • Prefer Figure-Level Functions - Use relplot/displot/catplot for better default layouts and faceting.
  • Use the data= parameter - Always pass the DataFrame to keep code clean.
  • Set Themes - Use
    sns.set_theme(style="whitegrid", palette="muted")
    early in your script.
  • Leverage hue - Use semantic color mapping to add extra dimensions to 2D plots.
  • Context matters - Use
    sns.set_context("paper")
    for publications or "talk" for presentations.
  • 使用整洁数据 - 确保你的DataFrame是“长格式”(每行对应一个观测值)。
  • 优先使用图级别函数 - 使用relplot/displot/catplot获得更优的默认布局和分面功能。
  • 使用data=参数 - 始终传入DataFrame以保持代码整洁。
  • 设置主题 - 在脚本开头使用
    sns.set_theme(style="whitegrid", palette="muted")
  • 利用hue参数 - 使用语义颜色映射为2D图表添加额外维度。
  • 根据场景选择上下文 - 对于出版物使用
    sns.set_context("paper")
    ,演示文稿使用"talk"。

❌ DON'T

❌ 错误做法

  • Pass 1D arrays manually - Avoid
    sns.plot(x_array, y_array)
    ; it ignores the power of Pandas integration.
  • Ignore the Index - Unlike Matplotlib, Seaborn mostly ignores the DataFrame index (use columns instead).
  • Overcrowd plots - Too many semantic mappings (hue, size, style) make graphs unreadable.
  • Forget Matplotlib - Remember that Seaborn functions return Matplotlib objects; use
    ax.set_title()
    to tweak them.
  • 手动传入一维数组 - 避免使用
    sns.plot(x_array, y_array)
    ;这会浪费Pandas集成的优势。
  • 忽略索引 - 与Matplotlib不同,Seaborn基本忽略DataFrame的索引(改用列)。
  • 过度拥挤的图表 - 过多的语义映射(hue、size、style)会使图形难以阅读。
  • 忘记Matplotlib - 记住Seaborn函数返回Matplotlib对象;使用
    ax.set_title()
    进行微调。

Anti-Patterns (NEVER)

反模式(绝对避免)

python
import seaborn as sns
import matplotlib.pyplot as plt
python
import seaborn as sns
import matplotlib.pyplot as plt

❌ BAD: Iterating through groups to plot manually

❌ 错误示范:手动遍历分组绘图

for s in df['species'].unique(): subset = df[df['species'] == s] plt.scatter(subset['x'], subset['y'], label=s)
for s in df['species'].unique(): subset = df[df['species'] == s] plt.scatter(subset['x'], subset['y'], label=s)

✅ GOOD: Let Seaborn handle grouping and legend

✅ 正确示范:让Seaborn处理分组和图例

sns.scatterplot(data=df, x='x', y='y', hue='species')
sns.scatterplot(data=df, x='x', y='y', hue='species')

❌ BAD: Mixing Seaborn and Matplotlib titles incorrectly

❌ 错误示范:错误混合Seaborn和Matplotlib标题设置

sns.displot(data=df, x='val') plt.title("My Title") # ⚠️ Might apply to the wrong axis in a FacetGrid!
sns.displot(data=df, x='val') plt.title("My Title") # ⚠️ 在FacetGrid中可能会应用到错误的轴!

✅ GOOD: Use the returned object

✅ 正确示范:使用返回的对象

g = sns.displot(data=df, x='val') g.set_axis_labels("Value", "Count") g.figure.suptitle("Correct Global Title", y=1.05)
undefined
g = sns.displot(data=df, x='val') g.set_axis_labels("Value", "Count") g.figure.suptitle("Correct Global Title", y=1.05)
undefined

Relational Plots (relplot)

关系图(relplot)

Scatter and Line Plots

散点图和折线图

python
undefined
python
undefined

Multi-faceted scatter plot

多分面散点图

sns.relplot( data=tips, x="total_bill", y="tip", col="time", hue="day", style="sex", kind="scatter" )
sns.relplot( data=tips, x="total_bill", y="tip", col="time", hue="day", style="sex", kind="scatter" )

Line plot with automatic aggregation (mean + 95% CI)

带自动聚合的折线图(均值 + 95%置信区间)

fmri = sns.load_dataset("fmri") sns.relplot( data=fmri, x="timepoint", y="signal", hue="event", style="region", kind="line", errorbar="sd" # "sd" for standard deviation instead of CI )
undefined
fmri = sns.load_dataset("fmri") sns.relplot( data=fmri, x="timepoint", y="signal", hue="event", style="region", kind="line", errorbar="sd" # "sd"表示用标准差代替置信区间 )
undefined

Distribution Plots (displot)

分布图(displot)

Histograms and KDEs

直方图和核密度估计图

python
penguins = sns.load_dataset("penguins")
python
penguins = sns.load_dataset("penguins")

Histogram with Kernel Density Estimate

带核密度估计的直方图

sns.displot(data=penguins, x="flipper_length_mm", hue="species", kde=True)
sns.displot(data=penguins, x="flipper_length_mm", hue="species", kde=True)

Bivariate distribution (Heatmap style)

双变量分布(热力图样式)

sns.displot(data=penguins, x="bill_length_mm", y="bill_depth_mm", hue="species", kind="kde")
sns.displot(data=penguins, x="bill_length_mm", y="bill_depth_mm", hue="species", kind="kde")

Empirical Cumulative Distribution (ECDF)

经验累积分布(ECDF)

sns.displot(data=penguins, x="flipper_length_mm", hue="species", kind="ecdf")
undefined
sns.displot(data=penguins, x="flipper_length_mm", hue="species", kind="ecdf")
undefined

Categorical Plots (catplot)

分类图(catplot)

Comparisons and Distribution within categories

分类内的比较与分布

python
undefined
python
undefined

Boxplot (Show quartiles and outliers)

箱线图(展示四分位数和异常值)

sns.catplot(data=tips, x="day", y="total_bill", kind="box")
sns.catplot(data=tips, x="day", y="total_bill", kind="box")

Violin plot (Show density and quartiles)

小提琴图(展示密度和四分位数)

sns.catplot(data=tips, x="day", y="total_bill", hue="sex", kind="violin", split=True)
sns.catplot(data=tips, x="day", y="total_bill", hue="sex", kind="violin", split=True)

Swarm plot (Show every point without overlap)

蜂群图(展示所有数据点且无重叠)

sns.catplot(data=tips, x="day", y="total_bill", kind="swarm")
sns.catplot(data=tips, x="day", y="total_bill", kind="swarm")

Bar plot (Show mean and error bars)

条形图(展示均值和误差棒)

sns.catplot(data=tips, x="day", y="total_bill", kind="bar", errorbar=("pi", 95))
undefined
sns.catplot(data=tips, x="day", y="total_bill", kind="bar", errorbar=("pi", 95))
undefined

Regression Plots

回归图

Visualizing Linear Trends

可视化线性趋势

python
undefined
python
undefined

Simple regression with scatter

带散点的简单回归图

sns.regplot(data=tips, x="total_bill", y="tip")
sns.regplot(data=tips, x="total_bill", y="tip")

Faceted regression

分面回归图

sns.lmplot(data=tips, x="total_bill", y="tip", col="smoker", hue="time")
sns.lmplot(data=tips, x="total_bill", y="tip", col="smoker", hue="time")

Logistic regression (for binary data)

逻辑回归(适用于二元数据)

sns.lmplot(data=df, x="variable", y="binary_outcome", logistic=True)
undefined
sns.lmplot(data=df, x="variable", y="binary_outcome", logistic=True)
undefined

Matrix Plots

矩阵图

Heatmaps and Clustering

热力图和聚类

python
flights = sns.load_dataset("flights").pivot(index="month", columns="year", values="passengers")
python
flights = sns.load_dataset("flights").pivot(index="month", columns="year", values="passengers")

Heatmap

热力图

plt.figure(figsize=(10, 8)) sns.heatmap(flights, annot=True, fmt="d", cmap="YlGnBu")
plt.figure(figsize=(10, 8)) sns.heatmap(flights, annot=True, fmt="d", cmap="YlGnBu")

Cluster map (Hierarchical clustering)

聚类图(层次聚类)

sns.clustermap(flights, standard_scale=1, cmap="mako")
undefined
sns.clustermap(flights, standard_scale=1, cmap="mako")
undefined

Grid Objects (Advanced)

网格对象(进阶)

Custom Multi-plot Layouts

自定义多图布局

python
undefined
python
undefined

JointPlot (Scatter + Marginals)

联合图(散点图 + 边缘分布图)

sns.jointplot(data=penguins, x="bill_length_mm", y="bill_depth_mm", hue="species", kind="kde")
sns.jointplot(data=penguins, x="bill_length_mm", y="bill_depth_mm", hue="species", kind="kde")

PairPlot (All-against-all relations)

配对图(所有变量两两关系)

sns.pairplot(data=penguins, hue="species", corner=True)
sns.pairplot(data=penguins, hue="species", corner=True)

Custom FacetGrid

自定义FacetGrid

g = sns.FacetGrid(tips, col="time", row="sex") g.map(sns.scatterplot, "total_bill", "tip")
undefined
g = sns.FacetGrid(tips, col="time", row="sex") g.map(sns.scatterplot, "total_bill", "tip")
undefined

Styling and Aesthetics

样式与美学

Themes and Palettes

主题与调色板

python
undefined
python
undefined

Set overall look

设置整体外观

sns.set_style("darkgrid") # white, dark, whitegrid, ticks sns.set_context("talk") # paper, notebook, talk, poster
sns.set_style("darkgrid") # 可选值:white, dark, whitegrid, ticks sns.set_context("talk") # 可选值:paper, notebook, talk, poster

Custom palettes

自定义调色板

sns.set_palette("husl") # Set global palette my_pal = sns.color_palette("rocket", as_cmap=True) # Get palette as object
sns.set_palette("husl") # 设置全局调色板 my_pal = sns.color_palette("rocket", as_cmap=True) # 获取调色板对象

Viewing a palette

查看调色板

sns.palplot(sns.color_palette("Set2"))
undefined
sns.palplot(sns.color_palette("Set2"))
undefined

Practical Workflows

实用工作流

1. Exploratory Data Analysis (EDA) Pipeline

1. 探索性数据分析(EDA)流程

python
def initial_eda(df, target_col):
    """Generate basic visual summary of a dataset."""
    # 1. Distribution of target
    sns.displot(data=df, x=target_col, kde=True)
    
    # 2. Pairwise relations of numeric features
    sns.pairplot(data=df, hue=target_col if df[target_col].nunique() < 10 else None)
    
    # 3. Correlation heatmap
    plt.figure(figsize=(12, 10))
    sns.heatmap(df.corr(numeric_only=True), annot=True, cmap='coolwarm', fmt=".2f")
python
def initial_eda(df, target_col):
    """生成数据集的基础可视化摘要。"""
    # 1. 目标变量的分布
    sns.displot(data=df, x=target_col, kde=True)
    
    # 2. 数值特征的两两关系
    sns.pairplot(data=df, hue=target_col if df[target_col].nunique() < 10 else None)
    
    # 3. 相关系数热力图
    plt.figure(figsize=(12, 10))
    sns.heatmap(df.corr(numeric_only=True), annot=True, cmap='coolwarm', fmt=".2f")

initial_eda(iris, "species")

initial_eda(iris, "species")

undefined
undefined

2. Scientific Result Comparison

2. 科学结果对比

python
def plot_experiment_results(df):
    """Plot results of an experiment with multiple conditions."""
    g = sns.catplot(
        data=df, kind="bar",
        x="condition", y="metric", hue="group",
        palette="viridis", alpha=.6, height=6
    )
    g.despine(left=True)
    g.set_axis_labels("Experimental Condition", "Accuracy (%)")
    g.legend.set_title("User Group")
    return g
python
def plot_experiment_results(df):
    """绘制多条件实验结果。"""
    g = sns.catplot(
        data=df, kind="bar",
        x="condition", y="metric", hue="group",
        palette="viridis", alpha=.6, height=6
    )
    g.despine(left=True)
    g.set_axis_labels("实验条件", "准确率(%)")
    g.legend.set_title("用户组")
    return g

3. Time-Series Trends by Category

3. 按分类展示时间序列趋势

python
def plot_trends(df, time_col, val_col, cat_col):
    """Visualizes trends over time with confidence intervals."""
    plt.figure(figsize=(12, 6))
    sns.lineplot(
        data=df, x=time_col, y=val_col, hue=cat_col,
        marker="o", err_style="bars"
    )
    plt.xticks(rotation=45)
    plt.tight_layout()
python
def plot_trends(df, time_col, val_col, cat_col):
    """可视化带置信区间的时间趋势。"""
    plt.figure(figsize=(12, 6))
    sns.lineplot(
        data=df, x=time_col, y=val_col, hue=cat_col,
        marker="o", err_style="bars"
    )
    plt.xticks(rotation=45)
    plt.tight_layout()

Common Pitfalls and Solutions

常见问题与解决方案

Legend Outside the Plot

图例超出图表范围

python
undefined
python
undefined

❌ Problem: Legend covers data in narrow plots

❌ 问题:在窄图表中图例覆盖数据

✅ Solution: Move legend manually using Matplotlib logic

✅ 解决方案:使用Matplotlib逻辑手动移动图例

g = sns.scatterplot(data=tips, x="total_bill", y="tip", hue="day") sns.move_legend(g, "upper left", bbox_to_anchor=(1, 1))
undefined
g = sns.scatterplot(data=tips, x="total_bill", y="tip", hue="day") sns.move_legend(g, "upper left", bbox_to_anchor=(1, 1))
undefined

Slow Performance with Large Data

大数据集下性能缓慢

python
undefined
python
undefined

❌ Problem: sns.pairplot(large_df) hangs

❌ 问题:sns.pairplot(large_df) 卡顿

✅ Solution: Sample data or use simpler plots

✅ 解决方案:采样数据或使用更简单的图表

sns.pairplot(df.sample(1000), hue='category')
sns.pairplot(df.sample(1000), hue='category')

OR use hist instead of scatter

或者使用直方图代替散点图

sns.jointplot(data=df, x='x', y='y', kind="hist")
undefined
sns.jointplot(data=df, x='x', y='y', kind="hist")
undefined

Overlapping Labels

标签重叠

python
undefined
python
undefined

❌ Problem: Categorical labels on X-axis overlap

❌ 问题:X轴上的分类标签重叠

✅ Solution: Rotate labels using Matplotlib

✅ 解决方案:使用Matplotlib旋转标签

g = sns.boxplot(data=df, x='very_long_category_name', y='value') g.set_xticklabels(g.get_xticklabels(), rotation=45, horizontalalignment='right')
undefined
g = sns.boxplot(data=df, x='very_long_category_name', y='value') g.set_xticklabels(g.get_xticklabels(), rotation=45, horizontalalignment='right')
undefined

Best Practices

最佳实践

  1. Use tidy data format - Ensure your DataFrame is in long-form (one row per observation)
  2. Prefer figure-level functions - Use
    relplot
    ,
    displot
    , and
    catplot
    for better default layouts and faceting
  3. Always use the
    data=
    parameter
    - Pass the DataFrame directly to keep code clean and readable
  4. Set themes early - Use
    sns.set_theme()
    at the beginning of your script for consistent styling
  5. Leverage semantic mappings - Use
    hue
    ,
    size
    , and
    style
    to add dimensions to your plots
  6. Choose appropriate context - Use
    sns.set_context("paper")
    for publications or "talk" for presentations
  7. Remember Seaborn returns Matplotlib objects - Use Matplotlib methods like
    ax.set_title()
    for fine-tuning
  8. Don't overcrowd plots - Limit semantic mappings to maintain readability
  9. Use figure-level functions for faceting - They handle subplot layouts automatically
  10. Sample large datasets - Use
    df.sample()
    before plotting to improve performance with big data
Seaborn makes statistical visualization a joy by providing high-level abstractions that produce beautiful, publication-quality graphics with minimal effort.
  1. 使用整洁数据格式 - 确保你的DataFrame是长格式(每行对应一个观测值)
  2. 优先使用图级别函数 - 使用
    relplot
    displot
    catplot
    获得更优的默认布局和分面功能
  3. 始终使用
    data=
    参数
    - 直接传入DataFrame以保持代码整洁可读
  4. 尽早设置主题 - 在脚本开头使用
    sns.set_theme()
    确保样式一致性
  5. 利用语义映射 - 使用
    hue
    size
    style
    为图表添加更多维度
  6. 选择合适的上下文 - 出版物使用
    sns.set_context("paper")
    ,演示文稿使用"talk"
  7. 记住Seaborn返回Matplotlib对象 - 使用Matplotlib方法如
    ax.set_title()
    进行微调
  8. 不要过度拥挤图表 - 限制语义映射数量以保持可读性
  9. 使用图级别函数进行分面 - 它们会自动处理子图布局
  10. 对大数据集进行采样 - 绘图前使用
    df.sample()
    提升性能
Seaborn 通过提供高级抽象,让你只需少量工作就能生成美观、达到出版级质量的统计图形,让统计可视化成为一种享受。