csv-data-visualizer

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

CSV Data Visualizer

CSV数据可视化工具

Overview

概述

This skill enables comprehensive data visualization and analysis for CSV files. It provides three main capabilities: (1) creating individual interactive visualizations using Plotly, (2) automatic data profiling with statistical summaries, and (3) generating multi-plot dashboards. The skill is optimized for exploratory data analysis, statistical reporting, and creating presentation-ready visualizations.
此Skill可为CSV文件提供全面的数据可视化与分析能力。它具备三大核心功能:(1) 使用Plotly创建单个交互式可视化图表;(2) 带统计摘要的自动数据剖析;(3) 生成多图表仪表板。该Skill针对探索性数据分析、统计报告以及创建可直接用于演示的可视化内容进行了优化。

When to Use This Skill

何时使用此Skill

Invoke this skill when users request:
  • "Visualize this CSV data"
  • "Create a histogram/scatter plot/box plot from this data"
  • "Show me the distribution of [column]"
  • "Generate a dashboard for this dataset"
  • "Profile this CSV file" or "Analyze this data"
  • "Create a correlation heatmap"
  • "Show trends over time"
  • "Compare [variable] across [categories]"
当用户提出以下需求时,调用此Skill:
  • "可视化这份CSV数据"
  • "根据这份数据创建直方图/散点图/箱线图"
  • "展示[列名]的分布情况"
  • "为这份数据集生成仪表板"
  • "剖析这份CSV文件" 或 "分析这份数据"
  • "创建相关性热力图"
  • "展示随时间变化的趋势"
  • "对比[变量]在不同[类别]中的情况"

Core Capabilities

核心功能

1. Individual Visualizations

1. 单个可视化图表

Create specific chart types for detailed analysis using the
visualize_csv.py
script.
Available Chart Types:
Statistical Plots:
bash
undefined
使用
visualize_csv.py
脚本创建特定类型的图表以进行详细分析。
支持的图表类型:
统计图表:
bash
undefined

Histogram - distribution of numeric data

直方图 - 数值数据的分布

python3 scripts/visualize_csv.py data.csv --histogram column_name --bins 30
python3 scripts/visualize_csv.py data.csv --histogram column_name --bins 30

Box plot - show quartiles and outliers

箱线图 - 显示四分位数和异常值

python3 scripts/visualize_csv.py data.csv --boxplot column_name
python3 scripts/visualize_csv.py data.csv --boxplot column_name

Box plot grouped by category

按类别分组的箱线图

python3 scripts/visualize_csv.py data.csv --boxplot salary --group-by department
python3 scripts/visualize_csv.py data.csv --boxplot salary --group-by department

Violin plot - distribution with probability density

小提琴图 - 带概率密度的分布

python3 scripts/visualize_csv.py data.csv --violin column_name --group-by category

**Relationship Analysis:**
```bash
python3 scripts/visualize_csv.py data.csv --violin column_name --group-by category

**关系分析:**
```bash

Scatter plot with automatic trend line

带自动趋势线的散点图

python3 scripts/visualize_csv.py data.csv --scatter height weight
python3 scripts/visualize_csv.py data.csv --scatter height weight

Scatter plot with color and size encoding

带颜色和大小编码的散点图

python3 scripts/visualize_csv.py data.csv --scatter x y --color category --size value
python3 scripts/visualize_csv.py data.csv --scatter x y --color category --size value

Correlation heatmap for all numeric columns

所有数值列的相关性热力图

python3 scripts/visualize_csv.py data.csv --correlation

**Time Series:**
```bash
python3 scripts/visualize_csv.py data.csv --correlation

**时间序列:**
```bash

Line chart for single variable

单个变量的折线图

python3 scripts/visualize_csv.py data.csv --line date sales
python3 scripts/visualize_csv.py data.csv --line date sales

Multiple variables on same chart

同一图表展示多个变量

python3 scripts/visualize_csv.py data.csv --line date "sales,revenue,profit"

**Categorical Data:**
```bash
python3 scripts/visualize_csv.py data.csv --line date "sales,revenue,profit"

**分类数据:**
```bash

Bar chart (counts categories automatically)

柱状图(自动统计类别数量)

python3 scripts/visualize_csv.py data.csv --bar category
python3 scripts/visualize_csv.py data.csv --bar category

Pie chart for composition

饼图(展示构成比例)

python3 scripts/visualize_csv.py data.csv --pie region

**Output Formats:**
Specify output file with desired format extension:
```bash
python3 scripts/visualize_csv.py data.csv --pie region

**输出格式:**
通过指定带所需格式扩展名的输出文件来选择格式:
```bash

Interactive HTML (default)

交互式HTML(默认格式)

python3 scripts/visualize_csv.py data.csv --histogram age -o output.html
python3 scripts/visualize_csv.py data.csv --histogram age -o output.html

Static image formats

静态图片格式

python3 scripts/visualize_csv.py data.csv --scatter x y -o plot.png python3 scripts/visualize_csv.py data.csv --correlation -o heatmap.pdf python3 scripts/visualize_csv.py data.csv --bar category -o chart.svg
undefined
python3 scripts/visualize_csv.py data.csv --scatter x y -o plot.png python3 scripts/visualize_csv.py data.csv --correlation -o heatmap.pdf python3 scripts/visualize_csv.py data.csv --bar category -o chart.svg
undefined

2. Automatic Data Profiling

2. 自动数据剖析

Generate comprehensive data quality and statistical reports using the
data_profile.py
script.
Text Report (default):
bash
python3 scripts/data_profile.py data.csv
HTML Report:
bash
python3 scripts/data_profile.py data.csv -f html -o report.html
JSON Report:
bash
python3 scripts/data_profile.py data.csv -f json -o profile.json
What the Profiler Provides:
  • File information (size, dimensions)
  • Dataset overview (shape, memory usage, duplicates)
  • Column-by-column analysis (types, missing data, unique values)
  • Missing data patterns and completeness
  • Statistical summary for numeric columns (mean, std, quartiles, skewness, kurtosis)
  • Categorical column analysis (frequency counts, most/least common values)
  • Data quality checks (high missing data, duplicate rows, constant columns, high cardinality)
When to Use Profiling: Always recommend running data profiling BEFORE creating visualizations when:
  • User is unfamiliar with the dataset
  • Data quality is unknown
  • Need to identify appropriate visualization types
  • Exploring a new dataset for the first time
使用
data_profile.py
脚本生成全面的数据质量与统计报告。
文本报告(默认):
bash
python3 scripts/data_profile.py data.csv
HTML报告:
bash
python3 scripts/data_profile.py data.csv -f html -o report.html
JSON报告:
bash
python3 scripts/data_profile.py data.csv -f json -o profile.json
剖析工具提供的内容:
  • 文件信息(大小、维度)
  • 数据集概览(形状、内存占用、重复项)
  • 逐列分析(类型、缺失数据、唯一值)
  • 缺失数据模式与完整性
  • 数值列的统计摘要(均值、标准差、四分位数、偏度、峰度)
  • 分类列分析(频次统计、最/最少见值)
  • 数据质量检查(高缺失率、重复行、常量列、高基数)
何时使用剖析功能: 当出现以下情况时,建议在创建可视化图表之前先运行数据剖析:
  • 用户不熟悉该数据集
  • 数据质量未知
  • 需要确定合适的可视化类型
  • 首次探索新数据集

3. Multi-Plot Dashboards

3. 多图表仪表板

Create comprehensive dashboards with multiple visualizations using the
create_dashboard.py
script.
Automatic Dashboard: Analyzes data types and automatically creates appropriate visualizations:
bash
python3 scripts/create_dashboard.py data.csv
Custom output location:
bash
python3 scripts/create_dashboard.py data.csv -o my_dashboard.html
Control number of plots:
bash
python3 scripts/create_dashboard.py data.csv --max-plots 9
Custom Dashboard from Config: Create a JSON configuration file specifying exact plots:
bash
python3 scripts/create_dashboard.py data.csv --config config.json
Dashboard Config Format:
json
{
  "title": "Sales Analysis Dashboard",
  "plots": [
    {"type": "histogram", "column": "revenue"},
    {"type": "box", "column": "revenue", "group_by": "region"},
    {"type": "scatter", "column": "advertising", "group_by": "revenue"},
    {"type": "bar", "column": "product_category"},
    {"type": "correlation"}
  ]
}
Dashboard Plot Types:
  • histogram
    : Distribution of numeric column
  • box
    : Box plot, optionally grouped by category
  • scatter
    : Relationship between two numeric columns
  • bar
    : Count of categorical values
  • correlation
    : Heatmap of numeric correlations
使用
create_dashboard.py
脚本创建包含多个可视化图表的综合仪表板。
自动生成仪表板: 分析数据类型并自动创建合适的可视化图表:
bash
python3 scripts/create_dashboard.py data.csv
自定义输出位置:
bash
python3 scripts/create_dashboard.py data.csv -o my_dashboard.html
控制图表数量:
bash
python3 scripts/create_dashboard.py data.csv --max-plots 9
通过配置文件创建自定义仪表板: 创建一个JSON配置文件指定具体要生成的图表:
bash
python3 scripts/create_dashboard.py data.csv --config config.json
仪表板配置文件格式:
json
{
  "title": "销售分析仪表板",
  "plots": [
    {"type": "histogram", "column": "revenue"},
    {"type": "box", "column": "revenue", "group_by": "region"},
    {"type": "scatter", "column": "advertising", "group_by": "revenue"},
    {"type": "bar", "column": "product_category"},
    {"type": "correlation"}
  ]
}
仪表板支持的图表类型:
  • histogram
    : 数值列的分布
  • box
    : 箱线图,可按类别分组
  • scatter
    : 两个数值列之间的关系
  • bar
    : 分类值的计数
  • correlation
    : 数值列相关性热力图

Workflow Decision Tree

工作流决策树

Use this decision tree to determine the appropriate approach:
User provides CSV file
├─ "Profile this data" / "Analyze this data" / Unfamiliar dataset
│  └─> Run data_profile.py first
│     Then offer visualization options based on findings
├─ "Create dashboard" / "Overview of the data" / Multiple visualizations needed
│  ├─ User knows exact plots wanted
│  │  └─> Create JSON config → run create_dashboard.py with config
│  └─ User wants automatic dashboard
│     └─> Run create_dashboard.py (auto mode)
└─ Specific visualization requested ("histogram", "scatter plot", etc.)
   └─> Use visualize_csv.py with appropriate flag
使用以下决策树确定合适的处理方式:
用户提供CSV文件
├─ "剖析这份数据" / "分析这份数据" / 不熟悉的数据集
│  └─> 先运行data_profile.py
│     然后根据结果提供可视化选项
├─ "创建仪表板" / "数据概览" / 需要多个可视化图表
│  ├─ 用户明确知道需要哪些图表
│  │  └─> 创建JSON配置文件 → 使用配置文件运行create_dashboard.py
│  └─ 用户需要自动生成的仪表板
│     └─> 运行create_dashboard.py(自动模式)
└─ 用户请求特定可视化图表("直方图"、"散点图"等)
   └─> 使用visualize_csv.py并添加相应参数

Best Practices

最佳实践

Starting Analysis

分析起步

  1. Always profile first for unfamiliar datasets:
    python3 scripts/data_profile.py data.csv
  2. Review the profiling output to understand:
    • Column data types and ranges
    • Missing data patterns
    • Data quality issues
    • Statistical distributions
  1. 对于不熟悉的数据集,务必先进行剖析
    python3 scripts/data_profile.py data.csv
  2. 查看剖析结果以了解:
    • 列的数据类型与范围
    • 缺失数据模式
    • 数据质量问题
    • 统计分布

Choosing Visualizations

选择可视化图表

Consult
references/visualization_guide.md
for detailed guidance. Quick reference:
  • Distribution: Histogram, box plot, violin plot
  • Relationship: Scatter plot, correlation heatmap
  • Time series: Line chart
  • Categories: Bar chart (preferred) or pie chart (use sparingly)
  • Comparison: Box plot grouped by category
参考
references/visualization_guide.md
获取详细指导。快速参考:
  • 分布情况:直方图、箱线图、小提琴图
  • 关系分析:散点图、相关性热力图
  • 时间序列:折线图
  • 分类数据:柱状图(优先选择)或饼图(谨慎使用)
  • 对比分析:按类别分组的箱线图

Creating Dashboards

创建仪表板

  • Automatic dashboard: Good for initial exploration
  • Custom dashboard: Better for presentations or specific analysis goals
  • Limit plots: Keep to 6-9 plots maximum for readability
  • Logical grouping: Group related visualizations together
  • 自动仪表板:适合初始探索
  • 自定义仪表板:更适合演示或特定分析目标
  • 限制图表数量:最多保留6-9个图表以保证可读性
  • 逻辑分组:将相关的可视化图表放在一起

Output Considerations

输出格式选择

  • HTML: Best for interactive exploration (zoom, pan, hover tooltips)
  • PNG/PDF: Best for reports and presentations
  • SVG: Best for publications requiring vector graphics
  • HTML:最适合交互式探索(缩放、平移、悬停提示)
  • PNG/PDF:最适合报告与演示
  • SVG:最适合需要矢量图形的出版物

Dependencies

依赖项

The scripts require these Python packages:
bash
pip install pandas plotly numpy
For static image export (PNG, PDF, SVG), also install:
bash
pip install kaleido
运行这些脚本需要以下Python包:
bash
pip install pandas plotly numpy
若要导出静态图片(PNG、PDF、SVG),还需安装:
bash
pip install kaleido

Example Workflows

示例工作流

Exploratory Data Analysis

探索性数据分析

bash
undefined
bash
undefined

1. Profile the data

1. 剖析数据

python3 scripts/data_profile.py sales_data.csv -f html -o profile.html
python3 scripts/data_profile.py sales_data.csv -f html -o profile.html

2. Create automatic dashboard

2. 自动生成仪表板

python3 scripts/create_dashboard.py sales_data.csv -o dashboard.html
python3 scripts/create_dashboard.py sales_data.csv -o dashboard.html

3. Dive deeper with specific plots

3. 使用特定图表深入分析

python3 scripts/visualize_csv.py sales_data.csv --scatter price sales --color region python3 scripts/visualize_csv.py sales_data.csv --boxplot revenue --group-by product
undefined
python3 scripts/visualize_csv.py sales_data.csv --scatter price sales --color region python3 scripts/visualize_csv.py sales_data.csv --boxplot revenue --group-by product
undefined

Report Generation

报告生成

bash
undefined
bash
undefined

Create specific visualizations for report

创建用于报告的特定可视化图表

python3 scripts/visualize_csv.py data.csv --histogram age -o fig1_distribution.png python3 scripts/visualize_csv.py data.csv --scatter income age -o fig2_correlation.png python3 scripts/visualize_csv.py data.csv --bar category -o fig3_categories.png
python3 scripts/visualize_csv.py data.csv --histogram age -o fig1_distribution.png python3 scripts/visualize_csv.py data.csv --scatter income age -o fig2_correlation.png python3 scripts/visualize_csv.py data.csv --bar category -o fig3_categories.png

Generate data summary

生成数据摘要

python3 scripts/data_profile.py data.csv -f html -o data_summary.html
undefined
python3 scripts/data_profile.py data.csv -f html -o data_summary.html
undefined

Interactive Dashboard

交互式仪表板

bash
undefined
bash
undefined

Create custom dashboard for presentation

创建用于演示的自定义仪表板

1. First, create config.json with desired plots

1. 先创建包含所需图表的config.json文件

2. Generate dashboard

2. 生成仪表板

python3 scripts/create_dashboard.py data.csv --config config.json -o presentation_dashboard.html
undefined
python3 scripts/create_dashboard.py data.csv --config config.json -o presentation_dashboard.html
undefined

Troubleshooting

故障排除

"Column not found" errors:
  • Run data profiling to see exact column names
  • CSV columns are case-sensitive
  • Check for leading/trailing spaces in column names
Empty or incorrect visualizations:
  • Verify data types (numeric vs categorical)
  • Check for missing data in plotted columns
  • Ensure sufficient non-null values exist
Script execution errors:
  • Verify dependencies are installed:
    pip list | grep plotly
  • Check Python version: Python 3.6+ required
  • For image export issues, install kaleido:
    pip install kaleido
"列未找到"错误
  • 运行数据剖析查看准确的列名
  • CSV列名区分大小写
  • 检查列名是否存在前导/尾随空格
图表为空或显示错误
  • 验证数据类型(数值型 vs 分类型)
  • 检查绘制列中是否存在缺失数据
  • 确保有足够的非空值
脚本执行错误
  • 验证依赖项已安装:
    pip list | grep plotly
  • 检查Python版本:需要Python 3.6及以上
  • 若图片导出有问题,安装kaleido:
    pip install kaleido

Resources

资源

scripts/

scripts/

  • visualize_csv.py
    : Main visualization script with all chart types
  • data_profile.py
    : Automatic data profiling and quality analysis
  • create_dashboard.py
    : Multi-plot dashboard generator
  • visualize_csv.py
    :包含所有图表类型的主可视化脚本
  • data_profile.py
    :自动数据剖析与质量分析脚本
  • create_dashboard.py
    :多图表仪表板生成脚本

references/

references/

  • visualization_guide.md
    : Comprehensive guide for choosing appropriate chart types, best practices, and common patterns
  • visualization_guide.md
    :关于选择合适图表类型、最佳实践与常见模式的综合指南