data-science-visualization

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Data Visualization

数据可视化

Use this skill for creating effective visualizations: choosing the right library, chart type, and interactivity level for your data and audience.
使用该技能创建高效的可视化内容:根据你的数据和受众选择合适的库、图表类型和交互级别。

When to use this skill

何时使用该技能

  • Choosing a visualization library for a project
  • Creating exploratory charts during EDA
  • Building interactive dashboards
  • Producing publication-quality figures
  • Understanding tradeoffs between libraries
  • 为项目选择可视化库
  • 在探索性数据分析(EDA)期间创建探索性图表
  • 构建交互式仪表板
  • 制作出版级质量的图形
  • 理解不同库之间的权衡

Library selection guide (2026)

库选择指南(2026年)

LibraryBest ForInteractivityLearning Curve
MatplotlibPublication-quality static plots, fine controlStaticModerate
SeabornStatistical visualization, quick EDAStaticEasy
PlotlyInteractive web charts, dashboardsHighEasy
AltairDeclarative statistical charts, large datasetsMediumEasy
hvPlot/HoloVizLarge data, linked brushing, geospatialHighModerate
BokehCustom interactive web appsHighModerate
最佳适用场景交互性学习曲线
Matplotlib出版级静态图、精细控制静态中等
Seaborn统计可视化、快速探索性数据分析静态简单
Plotly交互式网页图表、仪表板简单
Altair声明式统计图表、大型数据集中等简单
hvPlot/HoloViz大型数据、联动刷选、地理空间数据中等
Bokeh自定义交互式网页应用中等

Quick decision tree

快速决策树

Static publication figure?
  → Matplotlib (full control) or Seaborn (quick statistical)

Interactive web/dashboard?
  → Plotly (easiest), Dash (full apps)
  → Panel/HoloViz (complex linked views)
  → Bokeh (custom web apps)

Large datasets (100k+ points)?
  → hvPlot + Datashader (automatic rasterization)
  → Altair (smart aggregation with Vega-Lite)

Declarative grammar preferred?
  → Altair (Vega-Lite) or Plotly Express

Already using Pandas?
  → df.plot() → Matplotlib
  → df.hvplot() → HoloViz
  → px.scatter(df) → Plotly
需要静态出版级图形?
  → Matplotlib(完全控制)或Seaborn(快速统计可视化)

需要交互式网页/仪表板?
  → Plotly(最简单)、Dash(完整应用)
  → Panel/HoloViz(复杂联动视图)
  → Bokeh(自定义网页应用)

处理大型数据集(10万+数据点)?
  → hvPlot + Datashader(自动光栅化)
  → Altair(结合Vega-Lite的智能聚合)

偏好声明式语法?
  → Altair(Vega-Lite)或Plotly Express

已在使用Pandas?
  → df.plot() → Matplotlib
  → df.hvplot() → HoloViz
  → px.scatter(df) → Plotly

Core principles

核心原则

1) Match chart to data and question

1) 匹配图表与数据及问题

QuestionChart Type
Distribution?Histogram, KDE, boxplot, violin
Relationship?Scatter, line, heatmap (correlation)
Composition?Pie (avoid), stacked bar, treemap
Comparison?Bar, grouped bar, dot plot
Trend over time?Line, area, candlestick
Geographic?Choropleth, scatter map, heatmap
问题图表类型
分布情况?直方图、KDE、箱线图、小提琴图
变量关系?散点图、折线图、热力图(相关性)
构成情况?饼图(避免使用)、堆积柱状图、树状图
对比分析?柱状图、分组柱状图、点图
时间趋势?折线图、面积图、K线图
地理数据?分级统计图、散点地图、热力图

2) Maximize data-ink ratio

2) 最大化数据墨水比

  • Remove unnecessary gridlines, borders, backgrounds
  • Use color purposefully (not decoration)
  • Label directly when possible
  • One message per visualization
  • 移除不必要的网格线、边框、背景
  • 有目的地使用颜色(而非装饰)
  • 尽可能直接标注
  • 每个可视化只传递一个核心信息

3) Choose interactivity appropriately

3) 合理选择交互性

AudienceInteractivity Level
Paper/reportStatic (Matplotlib/Seaborn)
PresentationLimited (Plotly static export)
Exploratory analysisHigh (zoom, pan, filter, hover)
Stakeholder dashboardMedium (linked views, drill-down)
受众交互级别
论文/报告静态(Matplotlib/Seaborn)
演示文稿有限交互(Plotly静态导出)
探索性分析高交互(缩放、平移、筛选、悬停)
利益相关者仪表板中等交互(联动视图、下钻)

Quick examples

快速示例

Matplotlib (fine control)

Matplotlib(精细控制)

python
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(10, 6))
ax.scatter(x, y, c=colors, alpha=0.6, edgecolors='none')
ax.set_xlabel('Feature X', fontsize=12)
ax.set_ylabel('Target Y', fontsize=12)
ax.set_title('Relationship Analysis', fontsize=14, fontweight='bold')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.tight_layout()
python
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(10, 6))
ax.scatter(x, y, c=colors, alpha=0.6, edgecolors='none')
ax.set_xlabel('Feature X', fontsize=12)
ax.set_ylabel('Target Y', fontsize=12)
ax.set_title('Relationship Analysis', fontsize=14, fontweight='bold')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.tight_layout()

Seaborn (statistical)

Seaborn(统计可视化)

python
import seaborn as sns
python
import seaborn as sns

Distribution with KDE

带KDE的分布

sns.histplot(data=df, x='value', hue='category', kde=True, bins=30)
sns.histplot(data=df, x='value', hue='category', kde=True, bins=30)

Correlation heatmap

相关性热力图

corr = df.corr() sns.heatmap(corr, annot=True, fmt='.2f', cmap='coolwarm', center=0)
corr = df.corr() sns.heatmap(corr, annot=True, fmt='.2f', cmap='coolwarm', center=0)

Categorical comparison

分类对比箱线图

sns.boxplot(data=df, x='category', y='value', palette='viridis')
undefined
sns.boxplot(data=df, x='category', y='value', palette='viridis')
undefined

Plotly (interactive web)

Plotly(交互式网页)

python
import plotly.express as px
python
import plotly.express as px

Scatter with marginal distributions

带边缘分布的散点图

fig = px.scatter(df, x='x', y='y', color='category', size='size', marginal_x='histogram', marginal_y='rug', hover_data=['label']) fig.show()
fig = px.scatter(df, x='x', y='y', color='category', size='size', marginal_x='histogram', marginal_y='rug', hover_data=['label']) fig.show()

Faceted small multiples

分面小多图

fig = px.line(df, x='date', y='value', facet_col='category', facet_col_wrap=3, height=800) fig.show()
undefined
fig = px.line(df, x='date', y='value', facet_col='category', facet_col_wrap=3, height=800) fig.show()
undefined

Altair (declarative, large data)

Altair(声明式、大型数据)

python
import altair as alt
python
import altair as alt

Smart aggregation for large datasets

针对大型数据集的智能聚合

chart = alt.Chart(df).mark_circle().encode( x=alt.X('x:Q', bin=alt.Bin(maxbins=50)), y=alt.Y('y:Q', bin=alt.Bin(maxbins=50)), size='count()' ).interactive()
chart.save('chart.html') # Self-contained HTML
undefined
chart = alt.Chart(df).mark_circle().encode( x=alt.X('x:Q', bin=alt.Bin(maxbins=50)), y=alt.Y('y:Q', bin=alt.Bin(maxbins=50)), size='count()' ).interactive()
chart.save('chart.html') # 独立HTML文件
undefined

hvPlot/HoloViz (large data, linked views)

hvPlot/HoloViz(大型数据、联动视图)

python
import hvplot.pandas
import panel as pn
python
import hvplot.pandas
import panel as pn

Linked brushing

联动刷选

scatter = df.hvplot.scatter(x='x', y='y', c='category', tools=['box_select'], width=400, height=400) hist = df.hvplot.hist(y='y', width=400, height=200)
layout = pn.Row(scatter, hist) layout.servable()
undefined
scatter = df.hvplot.scatter(x='x', y='y', c='category', tools=['box_select'], width=400, height=400) hist = df.hvplot.hist(y='y', width=400, height=200)
layout = pn.Row(scatter, hist) layout.servable()
undefined

Bokeh (custom web apps)

Bokeh(自定义网页应用)

python
from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource, HoverTool

source = ColumnDataSource(df)

p = figure(title="Interactive Plot", tools="pan,wheel_zoom,box_select")
p.circle('x', 'y', source=source, size=10, alpha=0.6)

hover = HoverTool(tooltips=[("X", "@x"), ("Y", "@y"), ("Label", "@label")])
p.add_tools(hover)

show(p)
python
from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource, HoverTool

source = ColumnDataSource(df)

p = figure(title="Interactive Plot", tools="pan,wheel_zoom,box_select")
p.circle('x', 'y', source=source, size=10, alpha=0.6)

hover = HoverTool(tooltips=[("X", "@x"), ("Y", "@y"), ("Label", "@label")])
p.add_tools(hover)

show(p)

Anti-patterns

反模式

  • ❌ Pie charts with many slices (use bar charts)
  • ❌ Dual y-axes (hard to read, try normalization or small multiples)
  • ❌ 3D charts (distorts perception)
  • ❌ Rainbow colormaps (use perceptually uniform: viridis, plasma)
  • ❌ Missing labels, titles, or units
  • ❌ Overplotting without handling (sampling, alpha, or Datashader)
  • ❌ 包含多个扇区的饼图(改用柱状图)
  • ❌ 双Y轴(可读性差,尝试归一化或小多图)
  • ❌ 3D图表(扭曲视觉感知)
  • ❌ 彩虹色配色(使用感知均匀的配色:viridis、plasma)
  • ❌ 缺失标签、标题或单位
  • ❌ 未处理的过度绘制(采用采样、透明度或Datashader)

Common issues and solutions

常见问题及解决方案

ProblemSolution
Overplotting (100k+ points)Use Datashader (rasterization), hexbin, or 2D histogram
Slow interactivityReduce data points, use WebGL (Plotly), or pre-aggregate
Large file sizeSave as JSON (Plotly/Altair) or use static images
Color blindnessUse colorblind-friendly palettes (viridis, colorbrewer)
问题解决方案
过度绘制(10万+数据点)使用Datashader(光栅化)、六边形分箱或二维直方图
交互响应缓慢减少数据点、使用WebGL(Plotly)或预聚合数据
文件体积过大保存为JSON(Plotly/Altair)或使用静态图片
色盲友好性使用色盲友好的配色方案(viridis、colorbrewer)

Progressive disclosure

进阶参考

  • references/matplotlib-advanced.md
    — Subplots, annotations, custom styles
  • references/seaborn-statistical.md
    — Complex statistical plots
  • references/plotly-dash.md
    — Full dashboards with callbacks
  • references/altair-grammar.md
    — Vega-Lite transformations
  • references/holoviz-datashader.md
    — Large data visualization
  • references/bokeh-server.md
    — Real-time streaming apps
  • references/matplotlib-advanced.md
    — 子图、注释、自定义样式
  • references/seaborn-statistical.md
    — 复杂统计图表
  • references/plotly-dash.md
    — 带回调的完整仪表板
  • references/altair-grammar.md
    — Vega-Lite转换
  • references/holoviz-datashader.md
    — 大型数据可视化
  • references/bokeh-server.md
    — 实时流应用

Related skills

相关技能

  • @data-science-eda
    — Exploration patterns
  • @data-science-interactive-apps
    — Dashboard deployment
  • @data-science-notebooks
    — Notebook-specific visualization
  • @data-science-eda
    — 探索性分析模式
  • @data-science-interactive-apps
    — 仪表板部署
  • @data-science-notebooks
    — 笔记本专属可视化

References

参考资料