data-visualization

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Data Visualization

数据可视化

Create compelling visualizations to explore and communicate data insights.
制作有表现力的可视化图表,用于探索和传递数据洞察。

Quick Start

快速开始

Matplotlib Basics

Matplotlib基础

python
import matplotlib.pyplot as plt
python
import matplotlib.pyplot as plt

Line plot

Line plot

plt.figure(figsize=(10, 6)) plt.plot(x, y, marker='o', linestyle='-', color='blue', label='Series 1') plt.xlabel('X Label') plt.ylabel('Y Label') plt.title('Title') plt.legend() plt.grid(True, alpha=0.3) plt.show()
plt.figure(figsize=(10, 6)) plt.plot(x, y, marker='o', linestyle='-', color='blue', label='Series 1') plt.xlabel('X Label') plt.ylabel('Y Label') plt.title('Title') plt.legend() plt.grid(True, alpha=0.3) plt.show()

Bar chart

Bar chart

plt.bar(categories, values, color='skyblue', edgecolor='black') plt.xlabel('Categories') plt.ylabel('Values') plt.xticks(rotation=45) plt.tight_layout() plt.show()
undefined
plt.bar(categories, values, color='skyblue', edgecolor='black') plt.xlabel('Categories') plt.ylabel('Values') plt.xticks(rotation=45) plt.tight_layout() plt.show()
undefined

Seaborn for Statistical Plots

用Seaborn绘制统计图表

python
import seaborn as sns
python
import seaborn as sns

Set style

Set style

sns.set_style("whitegrid")
sns.set_style("whitegrid")

Distribution

Distribution

sns.histplot(data=df, x='value', kde=True, bins=30)
sns.histplot(data=df, x='value', kde=True, bins=30)

Box plot

Box plot

sns.boxplot(data=df, x='category', y='value')
sns.boxplot(data=df, x='category', y='value')

Violin plot

Violin plot

sns.violinplot(data=df, x='category', y='value')
sns.violinplot(data=df, x='category', y='value')

Heatmap

Heatmap

corr = df.corr() sns.heatmap(corr, annot=True, cmap='coolwarm', center=0)
corr = df.corr() sns.heatmap(corr, annot=True, cmap='coolwarm', center=0)

Pairplot

Pairplot

sns.pairplot(df, hue='target', diag_kind='kde')
undefined
sns.pairplot(df, hue='target', diag_kind='kde')
undefined

Exploratory Data Analysis

探索性数据分析

python
undefined
python
undefined

Quick overview

Quick overview

df.info() df.describe()
df.info() df.describe()

Missing values

Missing values

df.isnull().sum()
df.isnull().sum()

Value counts

Value counts

df['category'].value_counts().plot(kind='bar')
df['category'].value_counts().plot(kind='bar')

Distribution

Distribution

df.hist(figsize=(12, 10), bins=30) plt.tight_layout() plt.show()
df.hist(figsize=(12, 10), bins=30) plt.tight_layout() plt.show()

Correlation matrix

Correlation matrix

plt.figure(figsize=(10, 8)) sns.heatmap(df.corr(), annot=True, cmap='coolwarm', center=0, square=True) plt.title('Correlation Matrix') plt.show()
undefined
plt.figure(figsize=(10, 8)) sns.heatmap(df.corr(), annot=True, cmap='coolwarm', center=0, square=True) plt.title('Correlation Matrix') plt.show()
undefined

Interactive Visualizations with Plotly

用Plotly制作交互式可视化

python
import plotly.express as px
import plotly.graph_objects as go
python
import plotly.express as px
import plotly.graph_objects as go

Interactive scatter

Interactive scatter

fig = px.scatter(df, x='feature1', y='target', color='category', size='value', hover_data=['name', 'date'], title='Interactive Scatter Plot') fig.show()
fig = px.scatter(df, x='feature1', y='target', color='category', size='value', hover_data=['name', 'date'], title='Interactive Scatter Plot') fig.show()

Time series

Time series

fig = px.line(df, x='date', y='value', color='category', title='Time Series') fig.update_xaxes(rangeslider_visible=True) fig.show()
fig = px.line(df, x='date', y='value', color='category', title='Time Series') fig.update_xaxes(rangeslider_visible=True) fig.show()

3D scatter

3D scatter

fig = px.scatter_3d(df, x='x', y='y', z='z', color='category', size='value') fig.show()
undefined
fig = px.scatter_3d(df, x='x', y='y', z='z', color='category', size='value') fig.show()
undefined

Dashboard with Plotly Dash

用Plotly Dash搭建仪表板

python
import dash
from dash import dcc, html
from dash.dependencies import Input, Output

app = dash.Dash(__name__)

app.layout = html.Div([
    html.H1('Sales Dashboard'),

    dcc.Dropdown(
        id='category-dropdown',
        options=[{'label': cat, 'value': cat}
                for cat in df['category'].unique()],
        value=df['category'].unique()[0]
    ),

    dcc.Graph(id='sales-graph'),

    dcc.RangeSlider(
        id='year-slider',
        min=df['year'].min(),
        max=df['year'].max(),
        value=[df['year'].min(), df['year'].max()],
        marks={str(year): str(year)
              for year in df['year'].unique()}
    )
])

@app.callback(
    Output('sales-graph', 'figure'),
    [Input('category-dropdown', 'value'),
     Input('year-slider', 'value')]
)
def update_graph(selected_category, year_range):
    filtered_df = df[
        (df['category'] == selected_category) &
        (df['year'] >= year_range[0]) &
        (df['year'] <= year_range[1])
    ]
    fig = px.line(filtered_df, x='date', y='sales')
    return fig

if __name__ == '__main__':
    app.run_server(debug=True)
python
import dash
from dash import dcc, html
from dash.dependencies import Input, Output

app = dash.Dash(__name__)

app.layout = html.Div([
    html.H1('Sales Dashboard'),

    dcc.Dropdown(
        id='category-dropdown',
        options=[{'label': cat, 'value': cat}
                for cat in df['category'].unique()],
        value=df['category'].unique()[0]
    ),

    dcc.Graph(id='sales-graph'),

    dcc.RangeSlider(
        id='year-slider',
        min=df['year'].min(),
        max=df['year'].max(),
        value=[df['year'].min(), df['year'].max()],
        marks={str(year): str(year)
              for year in df['year'].unique()}
    )
])

@app.callback(
    Output('sales-graph', 'figure'),
    [Input('category-dropdown', 'value'),
     Input('year-slider', 'value')]
)
def update_graph(selected_category, year_range):
    filtered_df = df[
        (df['category'] == selected_category) &
        (df['year'] >= year_range[0]) &
        (df['year'] <= year_range[1])
    ]
    fig = px.line(filtered_df, x='date', y='sales')
    return fig

if __name__ == '__main__':
    app.run_server(debug=True)

Subplots

子图

python
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
python
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

Top left

Top left

axes[0, 0].hist(data1, bins=30) axes[0, 0].set_title('Histogram')
axes[0, 0].hist(data1, bins=30) axes[0, 0].set_title('Histogram')

Top right

Top right

axes[0, 1].scatter(x, y) axes[0, 1].set_title('Scatter')
axes[0, 1].scatter(x, y) axes[0, 1].set_title('Scatter')

Bottom left

Bottom left

axes[1, 0].plot(x, y) axes[1, 0].set_title('Line Plot')
axes[1, 0].plot(x, y) axes[1, 0].set_title('Line Plot')

Bottom right

Bottom right

axes[1, 1].boxplot([data1, data2, data3]) axes[1, 1].set_title('Box Plot')
plt.tight_layout() plt.show()
undefined
axes[1, 1].boxplot([data1, data2, data3]) axes[1, 1].set_title('Box Plot')
plt.tight_layout() plt.show()
undefined

Visualization Best Practices

可视化最佳实践

  1. Choose the right chart type:
    • Comparison: Bar chart
    • Distribution: Histogram, box plot
    • Relationship: Scatter plot
    • Time series: Line chart
    • Composition: Pie chart, stacked bar
  2. Design principles:
    • Clear labels and titles
    • Appropriate color schemes
    • Remove chart junk
    • Consistent formatting
    • Accessibility (color-blind friendly)
  3. Common pitfalls to avoid:
    • Misleading axes (non-zero baseline)
    • Too many colors
    • 3D charts (distort perception)
    • Pie charts with many categories
    • Dual y-axes (confusing)
  1. 选择合适的图表类型:
    • 对比:柱状图
    • 分布:直方图、箱线图
    • 关联关系:散点图
    • 时间序列:折线图
    • 构成:饼图、堆叠柱状图
  2. 设计原则:
    • 清晰的标签和标题
    • 合适的配色方案
    • 移除图表冗余元素
    • 格式统一
    • 无障碍设计(适配色盲用户)
  3. 需要避免的常见误区:
    • 坐标轴误导(非零基线)
    • 颜色过多
    • 3D图表(会扭曲感知)
    • 包含过多分类的饼图
    • 双Y轴(易造成混淆)

Color Palettes

配色方案

python
undefined
python
undefined

Seaborn palettes

Seaborn palettes

sns.color_palette("viridis", as_cmap=True) sns.color_palette("coolwarm", as_cmap=True) sns.color_palette("Set2")
sns.color_palette("viridis", as_cmap=True) sns.color_palette("coolwarm", as_cmap=True) sns.color_palette("Set2")

Custom colors

Custom colors

colors = ['#FF6B6B', '#4ECDC4', '#45B7D1', '#FFA07A']
undefined
colors = ['#FF6B6B', '#4ECDC4', '#45B7D1', '#FFA07A']
undefined

Export Figures

导出图表

python
undefined
python
undefined

High-resolution PNG

High-resolution PNG

plt.savefig('figure.png', dpi=300, bbox_inches='tight')
plt.savefig('figure.png', dpi=300, bbox_inches='tight')

Vector format (PDF, SVG)

Vector format (PDF, SVG)

plt.savefig('figure.pdf', bbox_inches='tight') plt.savefig('figure.svg', bbox_inches='tight')
undefined
plt.savefig('figure.pdf', bbox_inches='tight') plt.savefig('figure.svg', bbox_inches='tight')
undefined