csv-data-summarizer

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

CSV Data Summarizer

CSV数据汇总工具

This Skill analyzes CSV files and provides comprehensive summaries with statistical insights and visualizations.

该Skill可分析CSV文件，提供包含统计洞察和可视化内容的全面汇总结果。

When to Use This Skill

何时使用该Skill

Claude should use this Skill whenever the user:

Uploads or references a CSV file
Asks to summarize, analyze, or visualize tabular data
Requests insights from CSV data
Wants to understand data structure and quality

当用户出现以下行为时，Claude应该使用该Skill：

上传或引用CSV文件
要求汇总、分析或可视化表格数据
请求从CSV数据中获取洞察
想要了解数据结构和质量

How It Works

工作原理

⚠️ CRITICAL BEHAVIOR REQUIREMENT ⚠️

⚠️ 关键行为要求 ⚠️

DO NOT ASK THE USER WHAT THEY WANT TO DO WITH THE DATA. DO NOT OFFER OPTIONS OR CHOICES. DO NOT SAY "What would you like me to help you with?" DO NOT LIST POSSIBLE ANALYSES.

IMMEDIATELY AND AUTOMATICALLY:

Run the comprehensive analysis
Generate ALL relevant visualizations
Present complete results
NO questions, NO options, NO waiting for user input

THE USER WANTS A FULL ANALYSIS RIGHT AWAY - JUST DO IT.

请勿询问用户想要如何处理数据。 请勿提供选项或选择。 请勿说「我可以帮您做什么？」 请勿列出可能的分析方向。

请立即自动执行以下操作：

运行全面分析
生成所有相关可视化内容
展示完整结果
不提出问题、不提供选项、不等待用户输入

用户想要立刻获得完整分析——直接执行即可。

Automatic Analysis Steps:

自动分析步骤：

The skill intelligently adapts to different data types and industries by inspecting the data first, then determining what analyses are most relevant.

Load and inspect the CSV file into pandas DataFrame
Identify data structure - column types, date columns, numeric columns, categories
Determine relevant analyses based on what's actually in the data:
- Sales/E-commerce data (order dates, revenue, products): Time-series trends, revenue analysis, product performance
- Customer data (demographics, segments, regions): Distribution analysis, segmentation, geographic patterns
- Financial data (transactions, amounts, dates): Trend analysis, statistical summaries, correlations
- Operational data (timestamps, metrics, status): Time-series, performance metrics, distributions
- Survey data (categorical responses, ratings): Frequency analysis, cross-tabulations, distributions
- Generic tabular data: Adapts based on column types found
Only create visualizations that make sense for the specific dataset:
- Time-series plots ONLY if date/timestamp columns exist
- Correlation heatmaps ONLY if multiple numeric columns exist
- Category distributions ONLY if categorical columns exist
- Histograms for numeric distributions when relevant
Generate comprehensive output automatically including:
- Data overview (rows, columns, types)
- Key statistics and metrics relevant to the data type
- Missing data analysis
- Multiple relevant visualizations (only those that apply)
- Actionable insights based on patterns found in THIS specific dataset
Present everything in one complete analysis - no follow-up questions

Example adaptations:

Healthcare data with patient IDs → Focus on demographics, treatment patterns, temporal trends
Inventory data with stock levels → Focus on quantity distributions, reorder patterns, SKU analysis
Web analytics with timestamps → Focus on traffic patterns, conversion metrics, time-of-day analysis
Survey responses → Focus on response distributions, demographic breakdowns, sentiment patterns

该Skill会先检查数据，再确定最相关的分析方向，可智能适配不同数据类型和行业。

加载并检查CSV文件，导入为pandas DataFrame
识别数据结构——列类型、日期列、数值列、分类列
基于数据实际内容确定相关分析方向：
- 销售/电商数据（订单日期、收入、产品）：时间序列趋势、收入分析、产品表现
- 客户数据（人口统计、分群、区域）：分布分析、分群、地理模式
- 金融数据（交易、金额、日期）：趋势分析、统计汇总、相关性
- 运营数据（时间戳、指标、状态）：时间序列、性能指标、分布
- 调研数据（分类回答、评分）：频次分析、交叉表、分布
- 通用表格数据：基于发现的列类型适配
仅生成适配特定数据集的合理可视化内容：
- 仅当存在日期/时间戳列时生成时间序列图
- 仅当存在多个数值列时生成相关性热力图
- 仅当存在分类列时生成分类分布图
- 相关情况下为数值分布生成直方图
自动生成全面输出，包含：
- 数据概览（行数、列数、类型）
- 适配数据类型的关键统计指标
- 缺失值分析
- 多个相关可视化内容（仅生成适用的内容）
- 基于该特定数据中发现的模式的可落地洞察
一次性展示所有完整分析内容——无后续问题

适配示例：

含患者ID的医疗数据 → 聚焦人口统计、治疗模式、时间趋势
含库存水平的库存数据 → 聚焦数量分布、补货模式、SKU分析
含时间戳的网站分析数据 → 聚焦流量模式、转化指标、时段分析
调研回答 → 聚焦回答分布、人口统计拆分、情绪模式

Behavior Guidelines

行为指南

✅ CORRECT APPROACH - SAY THIS:

"I'll analyze this data comprehensively right now."
"Here's the complete analysis with visualizations:"
"I've identified this as [type] data and generated relevant insights:"
Then IMMEDIATELY show the full analysis

✅ DO:

Immediately run the analysis script
Generate ALL relevant charts automatically
Provide complete insights without being asked
Be thorough and complete in first response
Act decisively without asking permission

❌ NEVER SAY THESE PHRASES:

"What would you like to do with this data?"
"What would you like me to help you with?"
"Here are some common options:"
"Let me know what you'd like help with"
"I can create a comprehensive analysis if you'd like!"
Any sentence ending with "?" asking for user direction
Any list of options or choices
Any conditional "I can do X if you want"

❌ FORBIDDEN BEHAVIORS:

Asking what the user wants
Listing options for the user to choose from
Waiting for user direction before analyzing
Providing partial analysis that requires follow-up
Describing what you COULD do instead of DOING it

✅ 正确做法 - 可以说：

「我现在就对该数据进行全面分析。」
「以下是带可视化内容的完整分析结果：」
「我已识别这是[类型]数据，并生成了相关洞察：」
说完后立刻展示完整分析内容

✅ 应当执行：

立即运行分析脚本
自动生成所有相关图表
无需额外询问即可提供完整洞察
首次回复即做到全面完整
无需请求许可即可果断执行操作

❌ 绝对不能说这些话：

「您想要如何处理该数据？」
「您想要我帮您做什么？」
「以下是一些常见选项：」
「请告诉我您需要什么帮助」
「如果您需要的话我可以生成全面分析结果！」
任何以「？」结尾、询问用户方向的句子
任何选项或选择列表
任何类似「如果您需要我可以做X」的条件句

❌ 禁止行为：

询问用户需求
列出选项供用户选择
等待用户指示后再开始分析
提供需要后续追问的部分分析结果
描述你可以做什么而不是直接执行

Usage

使用方法

The Skill provides a Python function

summarize_csv(file_path)

that:

Accepts a path to a CSV file
Returns a comprehensive text summary with statistics
Generates multiple visualizations automatically based on data structure

该Skill提供Python函数

summarize_csv(file_path)

，功能如下：

接收CSV文件路径作为入参
返回包含统计数据的全面文本汇总
基于数据结构自动生成多个可视化内容

Example Prompts

示例提示词

"Here's
sales_data.csv
. Can you summarize this file?"

"Analyze this customer data CSV and show me trends."

"What insights can you find in
orders.csv
?"

「这是
sales_data.csv
，你可以汇总这个文件吗？」

「分析这个客户数据CSV，给我展示趋势。」

「你能从
orders.csv
中发现什么洞察？」

Example Output

示例输出

Dataset Overview

5,000 rows × 8 columns
3 numeric columns, 1 date column

Summary Statistics

Average order value: $58.2
Standard deviation: $12.4
Missing values: 2% (100 cells)

Insights

Sales show upward trend over time
Peak activity in Q4 (Attached: trend plot)

数据集概览

5000行 × 8列
3个数值列，1个日期列

汇总统计

平均订单价值：58.2美元
标准差：12.4美元
缺失值：2%（100个单元格）

洞察

销售额随时间呈上升趋势
Q4活跃度最高 （附件：趋势图）

Files

文件

```
analyze.py
```
- Core analysis logic
```
requirements.txt
```
- Python dependencies
```
resources/sample.csv
```
- Example dataset for testing
```
resources/README.md
```
- Additional documentation

```
analyze.py
```
- 核心分析逻辑
```
requirements.txt
```
- Python依赖项
```
resources/sample.csv
```
- 用于测试的示例数据集
```
resources/README.md
```
- 额外文档

Notes

注意事项

Automatically detects date columns (columns containing 'date' in name)
Handles missing data gracefully
Generates visualizations only when date columns are present
All numeric columns are included in statistical summary

自动检测日期列（名称中包含「date」的列）
可优雅处理缺失数据
仅当存在日期列时生成可视化内容
所有数值列都会纳入统计汇总