csv-data-summarizer

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

CSV Data Summarizer

CSV 数据汇总工具

This Skill analyzes CSV files and provides comprehensive summaries with statistical insights and visualizations.

本Skill可分析CSV文件，并提供包含统计洞察和可视化内容的全面汇总报告。

When to Use This Skill

何时使用本Skill

Claude should use this Skill whenever the user:

Uploads or references a CSV file
Asks to summarize, analyze, or visualize tabular data
Requests insights from CSV data
Wants to understand data structure and quality

当用户出现以下情况时，Claude应使用本Skill：

上传或提及CSV文件
要求汇总、分析或可视化表格数据
请求从CSV数据中获取洞察
想要了解数据的结构和质量

How It Works

工作原理

⚠️ CRITICAL BEHAVIOR REQUIREMENT ⚠️

⚠️ 关键行为要求 ⚠️

DO NOT ASK THE USER WHAT THEY WANT TO DO WITH THE DATA. DO NOT OFFER OPTIONS OR CHOICES. DO NOT SAY "What would you like me to help you with?" DO NOT LIST POSSIBLE ANALYSES.

IMMEDIATELY AND AUTOMATICALLY:

Run the comprehensive analysis
Generate ALL relevant visualizations
Present complete results
NO questions, NO options, NO waiting for user input

THE USER WANTS A FULL ANALYSIS RIGHT AWAY - JUST DO IT.

请勿询问用户想要如何处理这些数据。 请勿提供选项或让用户选择。 请勿说“您想要我帮您做什么？” 请勿列出可能的分析方向。

请立即自动执行以下操作：

运行全面分析
生成所有相关的可视化图表
呈现完整结果
无需提问，无需提供选项，无需等待用户输入

用户希望立即获得完整分析结果——直接执行即可。

Automatic Analysis Steps:

自动分析步骤：

The skill intelligently adapts to different data types and industries by inspecting the data first, then determining what analyses are most relevant.

Load and inspect the CSV file into pandas DataFrame
Identify data structure - column types, date columns, numeric columns, categories
Determine relevant analyses based on what's actually in the data:
- Sales/E-commerce data (order dates, revenue, products): Time-series trends, revenue analysis, product performance
- Customer data (demographics, segments, regions): Distribution analysis, segmentation, geographic patterns
- Financial data (transactions, amounts, dates): Trend analysis, statistical summaries, correlations
- Operational data (timestamps, metrics, status): Time-series, performance metrics, distributions
- Survey data (categorical responses, ratings): Frequency analysis, cross-tabulations, distributions
- Generic tabular data: Adapts based on column types found
Only create visualizations that make sense for the specific dataset:
- Time-series plots ONLY if date/timestamp columns exist
- Correlation heatmaps ONLY if multiple numeric columns exist
- Category distributions ONLY if categorical columns exist
- Histograms for numeric distributions when relevant
Generate comprehensive output automatically including:
- Data overview (rows, columns, types)
- Key statistics and metrics relevant to the data type
- Missing data analysis
- Multiple relevant visualizations (only those that apply)
- Actionable insights based on patterns found in THIS specific dataset
Present everything in one complete analysis - no follow-up questions

Example adaptations:

Healthcare data with patient IDs → Focus on demographics, treatment patterns, temporal trends
Inventory data with stock levels → Focus on quantity distributions, reorder patterns, SKU analysis
Web analytics with timestamps → Focus on traffic patterns, conversion metrics, time-of-day analysis
Survey responses → Focus on response distributions, demographic breakdowns, sentiment patterns

本Skill会先检查数据，智能适配不同的数据类型和行业，然后确定最相关的分析方向。

加载并检查CSV文件至pandas DataFrame
识别数据结构——列类型、日期列、数值列、分类列
根据实际数据确定相关分析方向：
- 销售/电商数据（订单日期、收入、产品）：时间序列趋势、收入分析、产品表现
- 客户数据（人口统计、细分群体、地区）：分布分析、用户细分、地域模式
- 财务数据（交易记录、金额、日期）：趋势分析、统计汇总、相关性分析
- 运营数据（时间戳、指标、状态）：时间序列、绩效指标、分布情况
- 调研数据（分类响应、评分）：频率分析、交叉制表、分布情况
- 通用表格数据：根据发现的列类型进行适配
仅创建对特定数据集有意义的可视化图表：
- 仅当存在日期/时间戳列时才生成时间序列图
- 仅当存在多个数值列时才生成相关性热力图
- 仅当存在分类列时才生成类别分布图
- 在相关情况下为数值分布生成直方图
自动生成全面输出内容，包括：
- 数据概览（行数、列数、类型）
- 与数据类型相关的关键统计指标
- 缺失数据分析
- 多个相关的可视化图表（仅生成适用的图表）
- 基于当前特定数据集发现的模式得出的可执行洞察
一次性呈现所有内容——无需后续提问

适配示例：

含患者ID的医疗数据 → 重点关注人口统计、治疗模式、时间趋势
含库存水平的库存数据 → 重点关注数量分布、补货模式、SKU分析
含时间戳的网站分析数据 → 重点关注流量模式、转化指标、时段分析
调研回复数据 → 重点关注回复分布、人口统计细分、情感模式

Behavior Guidelines

行为准则

✅ CORRECT APPROACH - SAY THIS:

"I'll analyze this data comprehensively right now."
"Here's the complete analysis with visualizations:"
"I've identified this as [type] data and generated relevant insights:"
Then IMMEDIATELY show the full analysis

✅ DO:

Immediately run the analysis script
Generate ALL relevant charts automatically
Provide complete insights without being asked
Be thorough and complete in first response
Act decisively without asking permission

❌ NEVER SAY THESE PHRASES:

"What would you like to do with this data?"
"What would you like me to help you with?"
"Here are some common options:"
"Let me know what you'd like help with"
"I can create a comprehensive analysis if you'd like!"
Any sentence ending with "?" asking for user direction
Any list of options or choices
Any conditional "I can do X if you want"

❌ FORBIDDEN BEHAVIORS:

Asking what the user wants
Listing options for the user to choose from
Waiting for user direction before analyzing
Providing partial analysis that requires follow-up
Describing what you COULD do instead of DOING it

✅ 正确做法——可以这样说：

“我现在就对这些数据进行全面分析。”
“以下是包含可视化内容的完整分析结果：”
“我已识别出这是[类型]数据，并生成了相关洞察：”
然后立即展示完整分析结果

✅ 需要执行：

立即运行分析脚本
自动生成所有相关图表
主动提供完整洞察，无需用户请求
在首次回复中做到全面完整
果断执行，无需征得许可

❌ 绝对不能说这些话：

“您想要如何处理这些数据？”
“您想要我帮您做什么？”
“以下是一些常见选项：”
“请告诉我您需要什么帮助”
“如果您需要，我可以创建一份全面的分析报告！”
任何以“？”结尾、寻求用户指示的句子
任何选项列表或选择项
任何带有条件的“如果您需要，我可以做X”

❌ 禁止行为：

询问用户的需求
列出选项供用户选择
在分析前等待用户指示
提供需要后续补充的部分分析结果
描述您“可以”做什么，而不是直接去做

Usage

使用方法

The Skill provides a Python function

summarize_csv(file_path)

that:

Accepts a path to a CSV file
Returns a comprehensive text summary with statistics
Generates multiple visualizations automatically based on data structure

本Skill提供Python函数

summarize_csv(file_path)

，该函数：

接受CSV文件路径作为参数
返回包含统计数据的全面文本汇总报告
根据数据结构自动生成多个可视化图表

Example Prompts

示例提示词

"Here's
sales_data.csv
. Can you summarize this file?"

"Analyze this customer data CSV and show me trends."

"What insights can you find in
orders.csv
?"

“这是
sales_data.csv
。您能帮我汇总这个文件吗？”

“分析这份客户数据CSV并向我展示趋势。”

“我能从
orders.csv
中获得哪些洞察？”

Example Output

示例输出

Dataset Overview

5,000 rows × 8 columns
3 numeric columns, 1 date column

Summary Statistics

Average order value: $58.2
Standard deviation: $12.4
Missing values: 2% (100 cells)

Insights

Sales show upward trend over time
Peak activity in Q4 (Attached: trend plot)

数据集概览

5000行 × 8列
3个数值列，1个日期列

汇总统计数据

平均订单价值：58.2美元
标准差：12.4
缺失值：2%（100个单元格）

洞察

销售额随时间呈上升趋势
第四季度活动量达到峰值 (附件：趋势图)

Files

文件列表

```
analyze.py
```
- Core analysis logic
```
requirements.txt
```
- Python dependencies
```
resources/sample.csv
```
- Example dataset for testing
```
resources/README.md
```
- Additional documentation

```
analyze.py
```
- 核心分析逻辑
```
requirements.txt
```
- Python依赖包
```
resources/sample.csv
```
- 用于测试的示例数据集
```
resources/README.md
```
- 附加文档

Notes

注意事项

Automatically detects date columns (columns containing 'date' in name)
Handles missing data gracefully
Generates visualizations only when date columns are present
All numeric columns are included in statistical summary

自动检测日期列（名称中包含'date'的列）
可优雅处理缺失数据
仅当存在日期列时才生成可视化图表
所有数值列均包含在统计汇总中