csv-data-visualizer
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseCSV Data Visualizer
CSV数据可视化工具
Overview
概述
This skill enables comprehensive data visualization and analysis for CSV files. It provides three main capabilities: (1) creating individual interactive visualizations using Plotly, (2) automatic data profiling with statistical summaries, and (3) generating multi-plot dashboards. The skill is optimized for exploratory data analysis, statistical reporting, and creating presentation-ready visualizations.
此Skill可为CSV文件提供全面的数据可视化与分析能力。它具备三大核心功能:(1) 使用Plotly创建单个交互式可视化图表;(2) 带统计摘要的自动数据剖析;(3) 生成多图表仪表板。该Skill针对探索性数据分析、统计报告以及创建可直接用于演示的可视化内容进行了优化。
When to Use This Skill
何时使用此Skill
Invoke this skill when users request:
- "Visualize this CSV data"
- "Create a histogram/scatter plot/box plot from this data"
- "Show me the distribution of [column]"
- "Generate a dashboard for this dataset"
- "Profile this CSV file" or "Analyze this data"
- "Create a correlation heatmap"
- "Show trends over time"
- "Compare [variable] across [categories]"
当用户提出以下需求时,调用此Skill:
- "可视化这份CSV数据"
- "根据这份数据创建直方图/散点图/箱线图"
- "展示[列名]的分布情况"
- "为这份数据集生成仪表板"
- "剖析这份CSV文件" 或 "分析这份数据"
- "创建相关性热力图"
- "展示随时间变化的趋势"
- "对比[变量]在不同[类别]中的情况"
Core Capabilities
核心功能
1. Individual Visualizations
1. 单个可视化图表
Create specific chart types for detailed analysis using the script.
visualize_csv.pyAvailable Chart Types:
Statistical Plots:
bash
undefined使用脚本创建特定类型的图表以进行详细分析。
visualize_csv.py支持的图表类型:
统计图表:
bash
undefinedHistogram - distribution of numeric data
直方图 - 数值数据的分布
python3 scripts/visualize_csv.py data.csv --histogram column_name --bins 30
python3 scripts/visualize_csv.py data.csv --histogram column_name --bins 30
Box plot - show quartiles and outliers
箱线图 - 显示四分位数和异常值
python3 scripts/visualize_csv.py data.csv --boxplot column_name
python3 scripts/visualize_csv.py data.csv --boxplot column_name
Box plot grouped by category
按类别分组的箱线图
python3 scripts/visualize_csv.py data.csv --boxplot salary --group-by department
python3 scripts/visualize_csv.py data.csv --boxplot salary --group-by department
Violin plot - distribution with probability density
小提琴图 - 带概率密度的分布
python3 scripts/visualize_csv.py data.csv --violin column_name --group-by category
**Relationship Analysis:**
```bashpython3 scripts/visualize_csv.py data.csv --violin column_name --group-by category
**关系分析:**
```bashScatter plot with automatic trend line
带自动趋势线的散点图
python3 scripts/visualize_csv.py data.csv --scatter height weight
python3 scripts/visualize_csv.py data.csv --scatter height weight
Scatter plot with color and size encoding
带颜色和大小编码的散点图
python3 scripts/visualize_csv.py data.csv --scatter x y --color category --size value
python3 scripts/visualize_csv.py data.csv --scatter x y --color category --size value
Correlation heatmap for all numeric columns
所有数值列的相关性热力图
python3 scripts/visualize_csv.py data.csv --correlation
**Time Series:**
```bashpython3 scripts/visualize_csv.py data.csv --correlation
**时间序列:**
```bashLine chart for single variable
单个变量的折线图
python3 scripts/visualize_csv.py data.csv --line date sales
python3 scripts/visualize_csv.py data.csv --line date sales
Multiple variables on same chart
同一图表展示多个变量
python3 scripts/visualize_csv.py data.csv --line date "sales,revenue,profit"
**Categorical Data:**
```bashpython3 scripts/visualize_csv.py data.csv --line date "sales,revenue,profit"
**分类数据:**
```bashBar chart (counts categories automatically)
柱状图(自动统计类别数量)
python3 scripts/visualize_csv.py data.csv --bar category
python3 scripts/visualize_csv.py data.csv --bar category
Pie chart for composition
饼图(展示构成比例)
python3 scripts/visualize_csv.py data.csv --pie region
**Output Formats:**
Specify output file with desired format extension:
```bashpython3 scripts/visualize_csv.py data.csv --pie region
**输出格式:**
通过指定带所需格式扩展名的输出文件来选择格式:
```bashInteractive HTML (default)
交互式HTML(默认格式)
python3 scripts/visualize_csv.py data.csv --histogram age -o output.html
python3 scripts/visualize_csv.py data.csv --histogram age -o output.html
Static image formats
静态图片格式
python3 scripts/visualize_csv.py data.csv --scatter x y -o plot.png
python3 scripts/visualize_csv.py data.csv --correlation -o heatmap.pdf
python3 scripts/visualize_csv.py data.csv --bar category -o chart.svg
undefinedpython3 scripts/visualize_csv.py data.csv --scatter x y -o plot.png
python3 scripts/visualize_csv.py data.csv --correlation -o heatmap.pdf
python3 scripts/visualize_csv.py data.csv --bar category -o chart.svg
undefined2. Automatic Data Profiling
2. 自动数据剖析
Generate comprehensive data quality and statistical reports using the script.
data_profile.pyText Report (default):
bash
python3 scripts/data_profile.py data.csvHTML Report:
bash
python3 scripts/data_profile.py data.csv -f html -o report.htmlJSON Report:
bash
python3 scripts/data_profile.py data.csv -f json -o profile.jsonWhat the Profiler Provides:
- File information (size, dimensions)
- Dataset overview (shape, memory usage, duplicates)
- Column-by-column analysis (types, missing data, unique values)
- Missing data patterns and completeness
- Statistical summary for numeric columns (mean, std, quartiles, skewness, kurtosis)
- Categorical column analysis (frequency counts, most/least common values)
- Data quality checks (high missing data, duplicate rows, constant columns, high cardinality)
When to Use Profiling:
Always recommend running data profiling BEFORE creating visualizations when:
- User is unfamiliar with the dataset
- Data quality is unknown
- Need to identify appropriate visualization types
- Exploring a new dataset for the first time
使用脚本生成全面的数据质量与统计报告。
data_profile.py文本报告(默认):
bash
python3 scripts/data_profile.py data.csvHTML报告:
bash
python3 scripts/data_profile.py data.csv -f html -o report.htmlJSON报告:
bash
python3 scripts/data_profile.py data.csv -f json -o profile.json剖析工具提供的内容:
- 文件信息(大小、维度)
- 数据集概览(形状、内存占用、重复项)
- 逐列分析(类型、缺失数据、唯一值)
- 缺失数据模式与完整性
- 数值列的统计摘要(均值、标准差、四分位数、偏度、峰度)
- 分类列分析(频次统计、最/最少见值)
- 数据质量检查(高缺失率、重复行、常量列、高基数)
何时使用剖析功能:
当出现以下情况时,建议在创建可视化图表之前先运行数据剖析:
- 用户不熟悉该数据集
- 数据质量未知
- 需要确定合适的可视化类型
- 首次探索新数据集
3. Multi-Plot Dashboards
3. 多图表仪表板
Create comprehensive dashboards with multiple visualizations using the script.
create_dashboard.pyAutomatic Dashboard:
Analyzes data types and automatically creates appropriate visualizations:
bash
python3 scripts/create_dashboard.py data.csvCustom output location:
bash
python3 scripts/create_dashboard.py data.csv -o my_dashboard.htmlControl number of plots:
bash
python3 scripts/create_dashboard.py data.csv --max-plots 9Custom Dashboard from Config:
Create a JSON configuration file specifying exact plots:
bash
python3 scripts/create_dashboard.py data.csv --config config.jsonDashboard Config Format:
json
{
"title": "Sales Analysis Dashboard",
"plots": [
{"type": "histogram", "column": "revenue"},
{"type": "box", "column": "revenue", "group_by": "region"},
{"type": "scatter", "column": "advertising", "group_by": "revenue"},
{"type": "bar", "column": "product_category"},
{"type": "correlation"}
]
}Dashboard Plot Types:
- : Distribution of numeric column
histogram - : Box plot, optionally grouped by category
box - : Relationship between two numeric columns
scatter - : Count of categorical values
bar - : Heatmap of numeric correlations
correlation
使用脚本创建包含多个可视化图表的综合仪表板。
create_dashboard.py自动生成仪表板:
分析数据类型并自动创建合适的可视化图表:
bash
python3 scripts/create_dashboard.py data.csv自定义输出位置:
bash
python3 scripts/create_dashboard.py data.csv -o my_dashboard.html控制图表数量:
bash
python3 scripts/create_dashboard.py data.csv --max-plots 9通过配置文件创建自定义仪表板:
创建一个JSON配置文件指定具体要生成的图表:
bash
python3 scripts/create_dashboard.py data.csv --config config.json仪表板配置文件格式:
json
{
"title": "销售分析仪表板",
"plots": [
{"type": "histogram", "column": "revenue"},
{"type": "box", "column": "revenue", "group_by": "region"},
{"type": "scatter", "column": "advertising", "group_by": "revenue"},
{"type": "bar", "column": "product_category"},
{"type": "correlation"}
]
}仪表板支持的图表类型:
- : 数值列的分布
histogram - : 箱线图,可按类别分组
box - : 两个数值列之间的关系
scatter - : 分类值的计数
bar - : 数值列相关性热力图
correlation
Workflow Decision Tree
工作流决策树
Use this decision tree to determine the appropriate approach:
User provides CSV file
│
├─ "Profile this data" / "Analyze this data" / Unfamiliar dataset
│ └─> Run data_profile.py first
│ Then offer visualization options based on findings
│
├─ "Create dashboard" / "Overview of the data" / Multiple visualizations needed
│ ├─ User knows exact plots wanted
│ │ └─> Create JSON config → run create_dashboard.py with config
│ └─ User wants automatic dashboard
│ └─> Run create_dashboard.py (auto mode)
│
└─ Specific visualization requested ("histogram", "scatter plot", etc.)
└─> Use visualize_csv.py with appropriate flag使用以下决策树确定合适的处理方式:
用户提供CSV文件
│
├─ "剖析这份数据" / "分析这份数据" / 不熟悉的数据集
│ └─> 先运行data_profile.py
│ 然后根据结果提供可视化选项
│
├─ "创建仪表板" / "数据概览" / 需要多个可视化图表
│ ├─ 用户明确知道需要哪些图表
│ │ └─> 创建JSON配置文件 → 使用配置文件运行create_dashboard.py
│ └─ 用户需要自动生成的仪表板
│ └─> 运行create_dashboard.py(自动模式)
│
└─ 用户请求特定可视化图表("直方图"、"散点图"等)
└─> 使用visualize_csv.py并添加相应参数Best Practices
最佳实践
Starting Analysis
分析起步
- Always profile first for unfamiliar datasets:
python3 scripts/data_profile.py data.csv - Review the profiling output to understand:
- Column data types and ranges
- Missing data patterns
- Data quality issues
- Statistical distributions
- 对于不熟悉的数据集,务必先进行剖析:
python3 scripts/data_profile.py data.csv - 查看剖析结果以了解:
- 列的数据类型与范围
- 缺失数据模式
- 数据质量问题
- 统计分布
Choosing Visualizations
选择可视化图表
Consult for detailed guidance. Quick reference:
references/visualization_guide.md- Distribution: Histogram, box plot, violin plot
- Relationship: Scatter plot, correlation heatmap
- Time series: Line chart
- Categories: Bar chart (preferred) or pie chart (use sparingly)
- Comparison: Box plot grouped by category
参考获取详细指导。快速参考:
references/visualization_guide.md- 分布情况:直方图、箱线图、小提琴图
- 关系分析:散点图、相关性热力图
- 时间序列:折线图
- 分类数据:柱状图(优先选择)或饼图(谨慎使用)
- 对比分析:按类别分组的箱线图
Creating Dashboards
创建仪表板
- Automatic dashboard: Good for initial exploration
- Custom dashboard: Better for presentations or specific analysis goals
- Limit plots: Keep to 6-9 plots maximum for readability
- Logical grouping: Group related visualizations together
- 自动仪表板:适合初始探索
- 自定义仪表板:更适合演示或特定分析目标
- 限制图表数量:最多保留6-9个图表以保证可读性
- 逻辑分组:将相关的可视化图表放在一起
Output Considerations
输出格式选择
- HTML: Best for interactive exploration (zoom, pan, hover tooltips)
- PNG/PDF: Best for reports and presentations
- SVG: Best for publications requiring vector graphics
- HTML:最适合交互式探索(缩放、平移、悬停提示)
- PNG/PDF:最适合报告与演示
- SVG:最适合需要矢量图形的出版物
Dependencies
依赖项
The scripts require these Python packages:
bash
pip install pandas plotly numpyFor static image export (PNG, PDF, SVG), also install:
bash
pip install kaleido运行这些脚本需要以下Python包:
bash
pip install pandas plotly numpy若要导出静态图片(PNG、PDF、SVG),还需安装:
bash
pip install kaleidoExample Workflows
示例工作流
Exploratory Data Analysis
探索性数据分析
bash
undefinedbash
undefined1. Profile the data
1. 剖析数据
python3 scripts/data_profile.py sales_data.csv -f html -o profile.html
python3 scripts/data_profile.py sales_data.csv -f html -o profile.html
2. Create automatic dashboard
2. 自动生成仪表板
python3 scripts/create_dashboard.py sales_data.csv -o dashboard.html
python3 scripts/create_dashboard.py sales_data.csv -o dashboard.html
3. Dive deeper with specific plots
3. 使用特定图表深入分析
python3 scripts/visualize_csv.py sales_data.csv --scatter price sales --color region
python3 scripts/visualize_csv.py sales_data.csv --boxplot revenue --group-by product
undefinedpython3 scripts/visualize_csv.py sales_data.csv --scatter price sales --color region
python3 scripts/visualize_csv.py sales_data.csv --boxplot revenue --group-by product
undefinedReport Generation
报告生成
bash
undefinedbash
undefinedCreate specific visualizations for report
创建用于报告的特定可视化图表
python3 scripts/visualize_csv.py data.csv --histogram age -o fig1_distribution.png
python3 scripts/visualize_csv.py data.csv --scatter income age -o fig2_correlation.png
python3 scripts/visualize_csv.py data.csv --bar category -o fig3_categories.png
python3 scripts/visualize_csv.py data.csv --histogram age -o fig1_distribution.png
python3 scripts/visualize_csv.py data.csv --scatter income age -o fig2_correlation.png
python3 scripts/visualize_csv.py data.csv --bar category -o fig3_categories.png
Generate data summary
生成数据摘要
python3 scripts/data_profile.py data.csv -f html -o data_summary.html
undefinedpython3 scripts/data_profile.py data.csv -f html -o data_summary.html
undefinedInteractive Dashboard
交互式仪表板
bash
undefinedbash
undefinedCreate custom dashboard for presentation
创建用于演示的自定义仪表板
1. First, create config.json with desired plots
1. 先创建包含所需图表的config.json文件
2. Generate dashboard
2. 生成仪表板
python3 scripts/create_dashboard.py data.csv --config config.json -o presentation_dashboard.html
undefinedpython3 scripts/create_dashboard.py data.csv --config config.json -o presentation_dashboard.html
undefinedTroubleshooting
故障排除
"Column not found" errors:
- Run data profiling to see exact column names
- CSV columns are case-sensitive
- Check for leading/trailing spaces in column names
Empty or incorrect visualizations:
- Verify data types (numeric vs categorical)
- Check for missing data in plotted columns
- Ensure sufficient non-null values exist
Script execution errors:
- Verify dependencies are installed:
pip list | grep plotly - Check Python version: Python 3.6+ required
- For image export issues, install kaleido:
pip install kaleido
"列未找到"错误:
- 运行数据剖析查看准确的列名
- CSV列名区分大小写
- 检查列名是否存在前导/尾随空格
图表为空或显示错误:
- 验证数据类型(数值型 vs 分类型)
- 检查绘制列中是否存在缺失数据
- 确保有足够的非空值
脚本执行错误:
- 验证依赖项已安装:
pip list | grep plotly - 检查Python版本:需要Python 3.6及以上
- 若图片导出有问题,安装kaleido:
pip install kaleido
Resources
资源
scripts/
scripts/
- : Main visualization script with all chart types
visualize_csv.py - : Automatic data profiling and quality analysis
data_profile.py - : Multi-plot dashboard generator
create_dashboard.py
- :包含所有图表类型的主可视化脚本
visualize_csv.py - :自动数据剖析与质量分析脚本
data_profile.py - :多图表仪表板生成脚本
create_dashboard.py
references/
references/
- : Comprehensive guide for choosing appropriate chart types, best practices, and common patterns
visualization_guide.md
- :关于选择合适图表类型、最佳实践与常见模式的综合指南
visualization_guide.md