data-science-visualization
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseData Visualization
数据可视化
Use this skill for creating effective visualizations: choosing the right library, chart type, and interactivity level for your data and audience.
使用该技能创建高效的可视化内容:根据你的数据和受众选择合适的库、图表类型和交互级别。
When to use this skill
何时使用该技能
- Choosing a visualization library for a project
- Creating exploratory charts during EDA
- Building interactive dashboards
- Producing publication-quality figures
- Understanding tradeoffs between libraries
- 为项目选择可视化库
- 在探索性数据分析(EDA)期间创建探索性图表
- 构建交互式仪表板
- 制作出版级质量的图形
- 理解不同库之间的权衡
Library selection guide (2026)
库选择指南(2026年)
| Library | Best For | Interactivity | Learning Curve |
|---|---|---|---|
| Matplotlib | Publication-quality static plots, fine control | Static | Moderate |
| Seaborn | Statistical visualization, quick EDA | Static | Easy |
| Plotly | Interactive web charts, dashboards | High | Easy |
| Altair | Declarative statistical charts, large datasets | Medium | Easy |
| hvPlot/HoloViz | Large data, linked brushing, geospatial | High | Moderate |
| Bokeh | Custom interactive web apps | High | Moderate |
| 库 | 最佳适用场景 | 交互性 | 学习曲线 |
|---|---|---|---|
| Matplotlib | 出版级静态图、精细控制 | 静态 | 中等 |
| Seaborn | 统计可视化、快速探索性数据分析 | 静态 | 简单 |
| Plotly | 交互式网页图表、仪表板 | 高 | 简单 |
| Altair | 声明式统计图表、大型数据集 | 中等 | 简单 |
| hvPlot/HoloViz | 大型数据、联动刷选、地理空间数据 | 高 | 中等 |
| Bokeh | 自定义交互式网页应用 | 高 | 中等 |
Quick decision tree
快速决策树
Static publication figure?
→ Matplotlib (full control) or Seaborn (quick statistical)
Interactive web/dashboard?
→ Plotly (easiest), Dash (full apps)
→ Panel/HoloViz (complex linked views)
→ Bokeh (custom web apps)
Large datasets (100k+ points)?
→ hvPlot + Datashader (automatic rasterization)
→ Altair (smart aggregation with Vega-Lite)
Declarative grammar preferred?
→ Altair (Vega-Lite) or Plotly Express
Already using Pandas?
→ df.plot() → Matplotlib
→ df.hvplot() → HoloViz
→ px.scatter(df) → Plotly需要静态出版级图形?
→ Matplotlib(完全控制)或Seaborn(快速统计可视化)
需要交互式网页/仪表板?
→ Plotly(最简单)、Dash(完整应用)
→ Panel/HoloViz(复杂联动视图)
→ Bokeh(自定义网页应用)
处理大型数据集(10万+数据点)?
→ hvPlot + Datashader(自动光栅化)
→ Altair(结合Vega-Lite的智能聚合)
偏好声明式语法?
→ Altair(Vega-Lite)或Plotly Express
已在使用Pandas?
→ df.plot() → Matplotlib
→ df.hvplot() → HoloViz
→ px.scatter(df) → PlotlyCore principles
核心原则
1) Match chart to data and question
1) 匹配图表与数据及问题
| Question | Chart Type |
|---|---|
| Distribution? | Histogram, KDE, boxplot, violin |
| Relationship? | Scatter, line, heatmap (correlation) |
| Composition? | Pie (avoid), stacked bar, treemap |
| Comparison? | Bar, grouped bar, dot plot |
| Trend over time? | Line, area, candlestick |
| Geographic? | Choropleth, scatter map, heatmap |
| 问题 | 图表类型 |
|---|---|
| 分布情况? | 直方图、KDE、箱线图、小提琴图 |
| 变量关系? | 散点图、折线图、热力图(相关性) |
| 构成情况? | 饼图(避免使用)、堆积柱状图、树状图 |
| 对比分析? | 柱状图、分组柱状图、点图 |
| 时间趋势? | 折线图、面积图、K线图 |
| 地理数据? | 分级统计图、散点地图、热力图 |
2) Maximize data-ink ratio
2) 最大化数据墨水比
- Remove unnecessary gridlines, borders, backgrounds
- Use color purposefully (not decoration)
- Label directly when possible
- One message per visualization
- 移除不必要的网格线、边框、背景
- 有目的地使用颜色(而非装饰)
- 尽可能直接标注
- 每个可视化只传递一个核心信息
3) Choose interactivity appropriately
3) 合理选择交互性
| Audience | Interactivity Level |
|---|---|
| Paper/report | Static (Matplotlib/Seaborn) |
| Presentation | Limited (Plotly static export) |
| Exploratory analysis | High (zoom, pan, filter, hover) |
| Stakeholder dashboard | Medium (linked views, drill-down) |
| 受众 | 交互级别 |
|---|---|
| 论文/报告 | 静态(Matplotlib/Seaborn) |
| 演示文稿 | 有限交互(Plotly静态导出) |
| 探索性分析 | 高交互(缩放、平移、筛选、悬停) |
| 利益相关者仪表板 | 中等交互(联动视图、下钻) |
Quick examples
快速示例
Matplotlib (fine control)
Matplotlib(精细控制)
python
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(10, 6))
ax.scatter(x, y, c=colors, alpha=0.6, edgecolors='none')
ax.set_xlabel('Feature X', fontsize=12)
ax.set_ylabel('Target Y', fontsize=12)
ax.set_title('Relationship Analysis', fontsize=14, fontweight='bold')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.tight_layout()python
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(10, 6))
ax.scatter(x, y, c=colors, alpha=0.6, edgecolors='none')
ax.set_xlabel('Feature X', fontsize=12)
ax.set_ylabel('Target Y', fontsize=12)
ax.set_title('Relationship Analysis', fontsize=14, fontweight='bold')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.tight_layout()Seaborn (statistical)
Seaborn(统计可视化)
python
import seaborn as snspython
import seaborn as snsDistribution with KDE
带KDE的分布
sns.histplot(data=df, x='value', hue='category', kde=True, bins=30)
sns.histplot(data=df, x='value', hue='category', kde=True, bins=30)
Correlation heatmap
相关性热力图
corr = df.corr()
sns.heatmap(corr, annot=True, fmt='.2f', cmap='coolwarm', center=0)
corr = df.corr()
sns.heatmap(corr, annot=True, fmt='.2f', cmap='coolwarm', center=0)
Categorical comparison
分类对比箱线图
sns.boxplot(data=df, x='category', y='value', palette='viridis')
undefinedsns.boxplot(data=df, x='category', y='value', palette='viridis')
undefinedPlotly (interactive web)
Plotly(交互式网页)
python
import plotly.express as pxpython
import plotly.express as pxScatter with marginal distributions
带边缘分布的散点图
fig = px.scatter(df, x='x', y='y', color='category', size='size',
marginal_x='histogram', marginal_y='rug',
hover_data=['label'])
fig.show()
fig = px.scatter(df, x='x', y='y', color='category', size='size',
marginal_x='histogram', marginal_y='rug',
hover_data=['label'])
fig.show()
Faceted small multiples
分面小多图
fig = px.line(df, x='date', y='value', facet_col='category',
facet_col_wrap=3, height=800)
fig.show()
undefinedfig = px.line(df, x='date', y='value', facet_col='category',
facet_col_wrap=3, height=800)
fig.show()
undefinedAltair (declarative, large data)
Altair(声明式、大型数据)
python
import altair as altpython
import altair as altSmart aggregation for large datasets
针对大型数据集的智能聚合
chart = alt.Chart(df).mark_circle().encode(
x=alt.X('x:Q', bin=alt.Bin(maxbins=50)),
y=alt.Y('y:Q', bin=alt.Bin(maxbins=50)),
size='count()'
).interactive()
chart.save('chart.html') # Self-contained HTML
undefinedchart = alt.Chart(df).mark_circle().encode(
x=alt.X('x:Q', bin=alt.Bin(maxbins=50)),
y=alt.Y('y:Q', bin=alt.Bin(maxbins=50)),
size='count()'
).interactive()
chart.save('chart.html') # 独立HTML文件
undefinedhvPlot/HoloViz (large data, linked views)
hvPlot/HoloViz(大型数据、联动视图)
python
import hvplot.pandas
import panel as pnpython
import hvplot.pandas
import panel as pnLinked brushing
联动刷选
scatter = df.hvplot.scatter(x='x', y='y', c='category',
tools=['box_select'],
width=400, height=400)
hist = df.hvplot.hist(y='y', width=400, height=200)
layout = pn.Row(scatter, hist)
layout.servable()
undefinedscatter = df.hvplot.scatter(x='x', y='y', c='category',
tools=['box_select'],
width=400, height=400)
hist = df.hvplot.hist(y='y', width=400, height=200)
layout = pn.Row(scatter, hist)
layout.servable()
undefinedBokeh (custom web apps)
Bokeh(自定义网页应用)
python
from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource, HoverTool
source = ColumnDataSource(df)
p = figure(title="Interactive Plot", tools="pan,wheel_zoom,box_select")
p.circle('x', 'y', source=source, size=10, alpha=0.6)
hover = HoverTool(tooltips=[("X", "@x"), ("Y", "@y"), ("Label", "@label")])
p.add_tools(hover)
show(p)python
from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource, HoverTool
source = ColumnDataSource(df)
p = figure(title="Interactive Plot", tools="pan,wheel_zoom,box_select")
p.circle('x', 'y', source=source, size=10, alpha=0.6)
hover = HoverTool(tooltips=[("X", "@x"), ("Y", "@y"), ("Label", "@label")])
p.add_tools(hover)
show(p)Anti-patterns
反模式
- ❌ Pie charts with many slices (use bar charts)
- ❌ Dual y-axes (hard to read, try normalization or small multiples)
- ❌ 3D charts (distorts perception)
- ❌ Rainbow colormaps (use perceptually uniform: viridis, plasma)
- ❌ Missing labels, titles, or units
- ❌ Overplotting without handling (sampling, alpha, or Datashader)
- ❌ 包含多个扇区的饼图(改用柱状图)
- ❌ 双Y轴(可读性差,尝试归一化或小多图)
- ❌ 3D图表(扭曲视觉感知)
- ❌ 彩虹色配色(使用感知均匀的配色:viridis、plasma)
- ❌ 缺失标签、标题或单位
- ❌ 未处理的过度绘制(采用采样、透明度或Datashader)
Common issues and solutions
常见问题及解决方案
| Problem | Solution |
|---|---|
| Overplotting (100k+ points) | Use Datashader (rasterization), hexbin, or 2D histogram |
| Slow interactivity | Reduce data points, use WebGL (Plotly), or pre-aggregate |
| Large file size | Save as JSON (Plotly/Altair) or use static images |
| Color blindness | Use colorblind-friendly palettes (viridis, colorbrewer) |
| 问题 | 解决方案 |
|---|---|
| 过度绘制(10万+数据点) | 使用Datashader(光栅化)、六边形分箱或二维直方图 |
| 交互响应缓慢 | 减少数据点、使用WebGL(Plotly)或预聚合数据 |
| 文件体积过大 | 保存为JSON(Plotly/Altair)或使用静态图片 |
| 色盲友好性 | 使用色盲友好的配色方案(viridis、colorbrewer) |
Progressive disclosure
进阶参考
- — Subplots, annotations, custom styles
references/matplotlib-advanced.md - — Complex statistical plots
references/seaborn-statistical.md - — Full dashboards with callbacks
references/plotly-dash.md - — Vega-Lite transformations
references/altair-grammar.md - — Large data visualization
references/holoviz-datashader.md - — Real-time streaming apps
references/bokeh-server.md
- — 子图、注释、自定义样式
references/matplotlib-advanced.md - — 复杂统计图表
references/seaborn-statistical.md - — 带回调的完整仪表板
references/plotly-dash.md - — Vega-Lite转换
references/altair-grammar.md - — 大型数据可视化
references/holoviz-datashader.md - — 实时流应用
references/bokeh-server.md
Related skills
相关技能
- — Exploration patterns
@data-science-eda - — Dashboard deployment
@data-science-interactive-apps - — Notebook-specific visualization
@data-science-notebooks
- — 探索性分析模式
@data-science-eda - — 仪表板部署
@data-science-interactive-apps - — 笔记本专属可视化
@data-science-notebooks