data-analysis

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Data Analysis Skill

数据分析技能

Operator Context

操作者背景

This skill operates as an operator for decision-first data analysis, configuring Claude's behavior for structured analytical reasoning with statistical rigor. It implements a Decision-First Framework -- every analysis begins with the decision being supported, works backward to the evidence required, and only then touches the data. This prevents the common failure mode where analysis produces impressive summaries that answer the wrong question.

Core thesis: "Analysis without a decision is just arithmetic."

本技能作为决策优先型数据分析的操作框架，配置Claude的行为以实现具备统计严谨性的结构化分析推理。它采用决策优先框架——所有分析均从要支持的决策开始，反向推导所需证据，之后才接触数据。这避免了常见的失效模式：分析得出看似出色的总结，但却回答了错误的问题。

核心论点：“无决策的分析只是算术运算。”

Hardcoded Behaviors (Always Apply)

硬编码行为（始终适用）

CLAUDE.md Compliance: Read and follow repository CLAUDE.md files before execution. Project instructions override default behaviors.
Over-Engineering Prevention: Analyze what was asked. No speculative analyses, no "while I'm at it" tangents into unrelated metrics.
Decision-First Ordering: ALWAYS establish the decision context (Phase 1) before loading data (Phase 3). Starting with data produces technically correct but practically useless analysis because the patterns found may not map to the decision-maker's options.
Separate Extraction from Interpretation: Phase 3 (EXTRACT) loads and profiles data. Phase 4 (ANALYZE) interprets it. Never combine these steps. Combining them causes confirmation bias -- you see what you expect instead of what the data shows.
Metric Definitions Are Immutable: Once Phase 2 (DEFINE) is complete and data loading begins, metric definitions cannot change silently. If they must change, re-enter Phase 2 and document why. This prevents the common anti-pattern of adjusting definitions to produce favorable results (p-hacking by another name).
Uncertainty Quantification: Report confidence intervals, not point estimates. "3-7% lift" is useful; "5% lift" is misleading because it implies false precision.

CLAUDE.md合规性：执行前阅读并遵循仓库中的CLAUDE.md文件。项目指令优先于默认行为。
防止过度设计：仅分析被请求的内容。不进行推测性分析，不“顺手”切入无关指标的分支内容。
决策优先排序：必须先确立决策背景（第一阶段），再加载数据（第三阶段）。从数据开始会产生技术上正确但实际无用的分析，因为发现的模式可能与决策者的选项不匹配。
提取与解释分离：第三阶段（提取）负责加载和分析数据概况。第四阶段（分析）负责解读数据。绝不能合并这两个步骤。合并会导致确认偏差——你会看到预期的结果，而非数据实际显示的内容。
指标定义不可变：第二阶段（定义）完成且开始加载数据后，指标定义不能悄悄更改。若必须更改，需重新进入第二阶段并记录原因。这避免了常见的反模式：调整定义以产生有利结果（换种说法的p-hacking）。
不确定性量化：报告置信区间，而非点估计值。“提升3-7%”是有用的；“提升5%”具有误导性，因为它暗示了虚假的精确性。

Default Behaviors (ON unless disabled)

默认行为（启用状态，除非被禁用）

Communication Style: Lead with insights, not methods. The decision-maker needs "Revenue is declining 3% month-over-month, driven by churning mid-tier accounts" -- not "I performed a linear regression on the time series data using OLS estimation."
Artifact Trail: Save artifacts at every phase. Context is ephemeral; files persist. Each phase produces a named artifact that can be audited later.
Graceful Tool Degradation: Detect pandas/matplotlib availability via try/except. Use them when available, fall back to stdlib (csv, json, statistics, collections) when not. Analysis quality must be identical -- only presentation differs.
Statistical Rigor Gates: Apply all four rigor gates during Phase 4. Violations must be remediated or documented as explicit limitations. See
```
references/rigor-gates.md
```
for detailed gate documentation.

沟通风格：以洞察为先导，而非方法。决策者需要的是“收入逐月下降3%，主要由中端客户流失导致”——而非“我使用OLS估计对时间序列数据进行了线性回归”。
工件追踪：在每个阶段保存工件。上下文是短暂的；文件会持久化。每个阶段生成一个可后续审计的命名工件。
工具优雅降级：通过try/except检测pandas/matplotlib是否可用。可用时使用它们，不可用时回退到标准库（csv、json、statistics、collections）。分析质量必须保持一致——仅呈现方式不同。
统计严谨性校验：在第四阶段应用全部四项严谨性校验。违规情况必须得到修正，或作为明确限制记录在案。详见
```
references/rigor-gates.md
```
中的详细校验文档。

Optional Behaviors (OFF unless enabled)

可选行为（禁用状态，除非启用）

Visualization Output: Generate matplotlib charts saved as PNG when matplotlib is available and user requests visual output.
Multi-Dataset Joins: Join across multiple data files when analysis requires cross-referencing (e.g., user events + revenue data).
Exploratory Mode: Skip Phase 1 framing when the user explicitly asks for open-ended exploration ("just show me what's interesting"). Still apply rigor gates and label all findings as exploratory.

可视化输出：当matplotlib可用且用户请求可视化输出时，生成保存为PNG的matplotlib图表。
多数据集关联：当分析需要交叉引用时（如用户事件+收入数据），关联多个数据文件。
探索模式：当用户明确要求开放式探索（“只需展示有趣的内容”）时，跳过第一阶段的框架搭建。仍需应用严谨性校验，并将所有发现标记为探索性结果。

What This Skill CAN Do

本技能可完成的任务

Analyze structured data (CSV, JSON, SQLite exports, log files) to support specific business decisions
Profile data quality: row counts, missing values, outliers, date range coverage, type distributions
Compute summary statistics with confidence intervals using Python stdlib or pandas
Compare groups (cohorts, A/B variants, time periods) with statistical rigor checks
Detect trends, distributions, anomalies, and correlations with appropriate caveats
Produce decision-oriented reports that lead with insights and state limitations explicitly

分析结构化数据（CSV、JSON、SQLite导出文件、日志文件）以支持特定商业决策
分析数据质量：行数、缺失值、异常值、日期范围覆盖、类型分布
使用Python标准库或pandas计算带置信区间的汇总统计数据
通过统计严谨性校验对比群组（群组、A/B变体、时间段）
检测趋势、分布、异常和相关性，并附上适当说明
生成以决策为导向的报告，以洞察为先导，并明确说明限制

What This Skill CANNOT Do

本技能不可完成的任务

Machine learning: No model training, prediction, or hyperparameter tuning. That is a separate capability.
Real-time monitoring: This is batch analysis of snapshot data, not live stream processing.
Database querying: The skill analyzes data already extracted. It does not connect to databases or APIs. The user provides the data file.
Codebase analysis: Use codebase-analyzer for code convention discovery. This skill analyzes business/operational data.
Automated recurring reports: Each analysis is a one-shot investigation. Scheduled analysis requires separate automation.

机器学习：不进行模型训练、预测或超参数调优。这是单独的能力范畴。
实时监控：这是对快照数据的批量分析，而非实时流处理。
数据库查询：本技能分析已提取的数据。不连接数据库或API。由用户提供数据文件。
代码库分析：代码规范发现请使用codebase-analyzer。本技能分析业务/运营数据。
自动化定期报告：每次分析都是一次性调查。定期分析需要单独的自动化流程。

Instructions

操作步骤

Phase 1: FRAME (Do NOT touch data before framing the decision)

第一阶段：框架搭建（在搭建框架前请勿触碰数据）

Goal: Establish what decision this analysis supports and what evidence would change it.

Why this phase exists: Starting with data before establishing the decision context is the single most common analytical failure. The analyst finds interesting patterns and presents them, but the decision-maker cannot act because the patterns do not map to their options. Framing first ensures every computation serves the decision.

Step 1: Identify the decision

What specific decision does this analysis support?
Who is the decision-maker?
What are their options? (Option A vs. Option B vs. do nothing)
What is the current default action if no analysis is performed?

If the user does not articulate a decision, ask: "What will you do differently based on this analysis?" If the answer is "nothing" or "I just want to see the data," switch to Exploratory Mode (optional behavior) and label all output as exploratory.

Step 2: Define evidence requirements

What evidence would favor Option A over Option B?
What is the minimum evidence threshold for changing the default action?
Are there deal-breakers? (e.g., "If churn exceeds 5%, we switch vendors regardless of cost")

Step 3: Save the frame artifact

Save

analysis-frame.md

markdown

undefined

目标：确立本次分析支持的决策，以及哪些证据会改变决策。

本阶段存在的原因：在确立决策背景前就从数据开始，是最常见的分析失败原因。分析师发现有趣的模式并呈现，但决策者无法采取行动，因为这些模式与他们的选项不匹配。先搭建框架确保每一次计算都服务于决策。

步骤1：明确决策

本次分析支持哪项具体决策？
决策者是谁？
他们有哪些选项？（选项A vs 选项B vs 不采取任何行动）
如果不进行分析，当前的默认行动是什么？

如果用户未明确说明决策，请询问：“基于本次分析，你会做出哪些不同的行动？”如果答案是“没有”或“我只是想看看数据”，则切换到探索模式（可选行为），并将所有输出标记为探索性结果。

步骤2：定义证据要求

哪些证据会支持选项A优于选项B？
改变默认行动的最低证据阈值是什么？
是否存在否决项？（例如：“如果流失率超过5%，无论成本如何，我们都更换供应商”）

步骤3：保存框架工件

保存为

analysis-frame.md

：

markdown

undefined

Analysis Frame

分析框架

Decision

决策

[What decision is being supported]

[本次分析支持的决策]

Decision-Maker

决策者

[Who will act on this analysis]

[将依据本次分析采取行动的人员]

Options

选项

Option A: [description]
Option B: [description]
Default (no action): [what happens if we don't decide]

选项A：[描述]
选项B：[描述]
默认（不行动）：[如果不做决定会发生什么]

Evidence Requirements

证据要求

Favors Option A if: [condition]
Favors Option B if: [condition]
Minimum threshold: [what bar must be cleared]

支持选项A的条件：[条件]
支持选项B的条件：[条件]
最低阈值：[必须达到的标准]

Deal-Breakers

否决项

[condition that forces a specific option regardless]


**GATE**: Decision identified, options enumerated, evidence requirements written to file. If the user cannot articulate a decision, explicitly switch to Exploratory Mode and document this in the frame. Proceed only when gate passes.

---

[无论其他情况如何，都会强制选择特定选项的条件]


**校验**：已明确决策、枚举选项、将证据要求写入文件。如果用户无法明确决策，需明确切换到探索模式并在框架中记录。仅在校验通过后继续。

---

Phase 2: DEFINE (Lock metrics before loading data)

第二阶段：指标定义（加载数据前锁定指标）

Goal: Define exactly what will be measured, how, and over what population. Write definitions to file before any data is loaded.

Why this phase exists: Defining metrics after seeing data enables (consciously or not) choosing definitions that produce favorable results. Locking definitions first makes the analysis auditable -- anyone can verify whether the definitions were followed.

Step 1: Define metrics

For each metric:

Name: Clear, unambiguous label
Formula: Exact computation (numerator/denominator for rates, aggregation method for summaries)
Population: Who/what is included and excluded
Time window: Start date, end date, granularity (daily/weekly/monthly)
Segments: How data will be sliced (by region, cohort, plan tier, etc.)

Step 2: Define comparison groups (if applicable)

For each comparison:

Group A: Definition and selection criteria
Group B: Definition and selection criteria
Fairness check: Are groups drawn from the same population and time window?

Step 3: Define success criteria

What threshold constitutes a meaningful result?
What is the minimum sample size per segment?
Is this a one-tailed or two-tailed question?

Step 4: Save definitions artifact

Save

metric-definitions.md

markdown

undefined

目标：准确定义将测量的内容、测量方式以及测量群体。在加载任何数据前将定义写入文件。

本阶段存在的原因：查看数据后再定义指标，会有意或无意地选择能产生有利结果的定义。先锁定指标定义使分析可审计——任何人都可以验证是否遵循了定义。

步骤1：定义指标

针对每个指标：

名称：清晰、无歧义的标签
公式：精确的计算方式（比率的分子/分母、汇总的聚合方法）
群体：包含和排除的对象
时间窗口：开始日期、结束日期、粒度（每日/每周/每月）
细分维度：数据的拆分方式（按地区、群组、套餐等级等）

步骤2：定义对比群组（如适用）

针对每个对比：

群组A：定义和选择标准
群组B：定义和选择标准
公平性检查：群组是否来自同一群体和时间窗口？

步骤3：定义成功标准

什么阈值构成有意义的结果？
每个细分维度的最小样本量是多少？
这是单尾还是双尾问题？

步骤4：保存定义工件

保存为

metric-definitions.md

：

markdown

undefined

Metric Definitions

指标定义

Metrics

指标

[Metric Name]

[指标名称]

Formula: [exact computation]
Population: [inclusion/exclusion criteria]
Time window: [start - end, granularity]
Segments: [how data is sliced]

公式：[精确计算方式]
群体：[包含/排除标准]
时间窗口：[开始-结束，粒度]
细分维度：[数据拆分方式]

Comparison Groups (if applicable)

对比群组（如适用）

Group A: [Name]

群组A：[名称]

Selection: [criteria]

选择标准：[标准]

Group B: [Name]

群组B：[名称]

Selection: [criteria]
Fairness: [same population? same time window?]

选择标准：[标准]
公平性：[是否同一群体？是否同一时间窗口？]

Success Criteria

成功标准

Minimum meaningful effect: [threshold]
Minimum sample per segment: [N]
Test type: [one-tailed / two-tailed / descriptive only]


**GATE**: All metrics defined with formulas and populations. Definitions saved to file. If this is a comparison analysis, fairness checks documented. Proceed only when gate passes.

**Immutability rule**: Once Phase 3 begins, these definitions are locked. If the data reveals that a definition is unworkable (e.g., the column doesn't exist), return to Phase 2, update the definition, and document the change and its reason. Do not silently adjust.

---

最小有意义影响：[阈值]
每个细分维度的最小样本量：[N]
测试类型：[单尾/双尾/仅描述性]


**校验**：所有指标均已定义公式和群体。定义已保存到文件。如果是对比分析，公平性检查已记录。仅在校验通过后继续。

**不可变规则**：第三阶段开始后，这些定义将被锁定。如果数据显示某个定义不可行（例如：列不存在），需返回第二阶段，更新定义并记录原因。请勿悄悄调整。

---

Phase 3: EXTRACT (Load data. Assess quality. No interpretation.)

第三阶段：数据提取（加载数据，评估质量，不进行解读）

Goal: Load the data, profile its quality, and determine whether it is adequate for the planned analysis. Do NOT interpret results during this phase.

Why extraction is separate from analysis: Combining loading and interpretation causes confirmation bias. When you compute a metric and interpret it in the same breath, you see what you expect. Extracting first forces you to confront data quality issues (missing values, unexpected distributions, date gaps) before they silently distort your conclusions.

Step 1: Detect available tools

python

try:
    import pandas as pd
    HAS_PANDAS = True
except ImportError:
    HAS_PANDAS = False

try:
    import matplotlib
    matplotlib.use('Agg')
    import matplotlib.pyplot as plt
    HAS_MATPLOTLIB = True
except ImportError:
    HAS_MATPLOTLIB = False

If pandas is unavailable, fall back to

csv.DictReader

statistics

module. Analysis quality must be identical.

Step 2: Load and inspect data

Profile the dataset:

Row count
Column names and inferred types
Missing value count per column (absolute and percentage)
Date range (if temporal data)
Unique value counts for categorical columns
Basic distribution stats for numeric columns (min, max, mean, median, stdev)

Step 3: Assess data quality

Apply the Sample Adequacy gate (see

references/rigor-gates.md

Gate 1):

Check	Minimum	Action if Failed
Row count vs. population	Report sample fraction	State "N of M" and warn if <5% coverage
Time window completeness	No gaps >10% of window	Identify gaps, adjust window or note limitation
Segment minimums	30+ observations per segment	Merge small segments or exclude with disclosure
Missing value rate	<20% per critical column	Impute with disclosure or exclude column

Step 4: Save quality report

Save

data-quality-report.md

markdown

undefined

目标：加载数据，分析其质量，确定是否足以支持计划中的分析。本阶段请勿解读结果。

提取与分析分离的原因：合并加载和解读会导致确认偏差。当你在同一步骤中计算指标并解读时，你会看到预期的结果。先提取数据会迫使你在数据质量问题（缺失值、意外分布、日期缺口）悄悄扭曲结论前就直面它们。

步骤1：检测可用工具

python

try:
    import pandas as pd
    HAS_PANDAS = True
except ImportError:
    HAS_PANDAS = False

try:
    import matplotlib
    matplotlib.use('Agg')
    import matplotlib.pyplot as plt
    HAS_MATPLOTLIB = True
except ImportError:
    HAS_MATPLOTLIB = False

如果pandas不可用，回退到

csv.DictReader

statistics

模块。分析质量必须保持一致。

步骤2：加载并检查数据

分析数据集概况：

行数
列名和推断类型
每列的缺失值数量（绝对值和百分比）
日期范围（如果是时间数据）
分类列的唯一值数量
数值列的基本分布统计（最小值、最大值、平均值、中位数、标准差）

步骤3：评估数据质量

应用样本充足性校验（见

references/rigor-gates.md

校验1）：

检查项	最低要求	失败时的行动
行数 vs 群体规模	报告样本占比	注明“M中的N”，如果覆盖率<5%则发出警告
时间窗口完整性	缺口不超过窗口的10%	识别缺口，调整窗口或注明限制
细分维度最小样本量	每个细分维度≥30个观测值	合并小型细分维度，或披露后排除
缺失值率	关键列的缺失值率<20%	披露后插补，或排除该列

步骤4：保存质量报告

保存为

data-quality-report.md

：

markdown

undefined

Data Quality Report

数据质量报告

Dataset Overview

数据集概述

Source: [file path / description]
Rows: [N]
Columns: [N]
Date range: [start - end]

来源：[文件路径/描述]
行数：[N]
列数：[N]
日期范围：[开始-结束]

Column Profiles

列概况

Column	Type	Non-null	Missing %	Unique	Notes
[name]	[type]	[count]	[pct]	[count]	[flags]

列名	类型	非空值数量	缺失率%	唯一值数量	备注
[名称]	[类型]	[数量]	[百分比]	[数量]	[标记]

Quality Assessment

质量评估

Sample adequate (N=[count], population=[est])
Time window complete (gaps: [none / list])
Segment minimums met ([list segments below 30])
Missing values acceptable ([list columns above 20%])

样本充足（N=[数量]，群体规模=[估计值]）
时间窗口完整（缺口：[无/列表]）
细分维度最小样本量达标（[列出样本量<30的细分维度]）
缺失值可接受（[列出缺失值率>20%的列]）

Quality Issues

质量问题

[List any issues that affect planned analysis]

[列出所有影响计划分析的问题]

Data Ready: [YES / NO - with reason]

数据是否可用：[是/否 - 原因]


**GATE**: Data loaded, quality report saved, all four adequacy checks assessed. If data quality fails, document which analyses are affected and whether remediation is possible (merge segments, narrow time window, exclude columns). Proceed only when gate passes or failures are documented as limitations.

---


**校验**：数据已加载，质量报告已保存，四项充足性检查均已评估。如果数据质量不达标，记录哪些分析会受影响，以及是否可修复（合并细分维度、缩小时间窗口、排除列）。仅在校验通过或失败情况已作为限制记录后继续。

---

Phase 4: ANALYZE (Compute metrics. Apply rigor gates.)

第四阶段：数据分析（计算指标，应用严谨性校验）

Goal: Compute metrics per the locked definitions from Phase 2, applying statistical rigor gates at every step.

Step 1: Compute primary metrics

Calculate each metric defined in Phase 2 using the exact formula specified. Use Python stdlib or pandas as available:

python

undefined

目标：根据第二阶段锁定的定义计算指标，在每个步骤应用统计严谨性校验。

步骤1：计算主要指标

使用第二阶段指定的精确公式计算每个定义的指标。根据可用性选择Python标准库或pandas：

python

undefined

stdlib approach

标准库方法

import csv, statistics, collections, math

with open(data_file) as f: reader = csv.DictReader(f) rows = list(reader)

import csv, statistics, collections, math

with open(data_file) as f: reader = csv.DictReader(f) rows = list(reader)

Example: conversion rate with confidence interval

示例：带置信区间的转化率

successes = sum(1 for r in rows if r['converted'] == '1') total = len(rows) rate = successes / total

Wilson score interval for proportions

比例的Wilson得分区间

z = 1.96 # 95% CI denominator = 1 + z2 / total centre = (rate + z2 / (2 * total)) / denominator spread = z * math.sqrt((rate * (1 - rate) + z**2 / (4 * total)) / total) / denominator ci_lower = centre - spread ci_upper = centre + spread


```python

z = 1.96 # 95%置信区间 denominator = 1 + z2 / total centre = (rate + z2 / (2 * total)) / denominator spread = z * math.sqrt((rate * (1 - rate) + z**2 / (4 * total)) / total) / denominator ci_lower = centre - spread ci_upper = centre + spread


```python

pandas approach (when available)

pandas方法（可用时）

import pandas as pd

df = pd.read_csv(data_file) rate = df['converted'].mean()

import pandas as pd

df = pd.read_csv(data_file) rate = df['converted'].mean()

Bootstrap CI or Wilson as above

自助法置信区间或上述Wilson方法


**Step 2: Apply Comparison Fairness gate** (if comparing groups)

Before interpreting any group comparison, verify (see `references/rigor-gates.md` Gate 2):
- Same time window for all groups
- Same population definition for all groups
- Known confounders identified and documented
- Survivorship bias checked

**Step 3: Apply Multiple Testing Correction** (if testing multiple hypotheses)

See `references/rigor-gates.md` Gate 3:

| Scenario | Correction |
|----------|------------|
| 2-5 comparisons | Report all p-values, flag that they are unadjusted |
| 6+ comparisons | Apply Bonferroni: adjusted threshold = 0.05 / N |
| Exploratory sweep | Label as exploratory, make no causal claims |

**Step 4: Apply Practical Significance gate**

See `references/rigor-gates.md` Gate 4:
- Report effect size alongside statistical significance
- Report confidence intervals, not just point estimates
- Assess whether the effect exceeds the minimum actionable threshold from Phase 2
- Provide base rate context: "from 2.1% to 2.3%" not just "+10% lift"

**Step 5: Save analysis results**

Save `analysis-results.md`:
```markdown


**步骤2：应用对比公平性校验**（如果进行群组对比）

在解读任何群组对比结果前，验证（见`references/rigor-gates.md`校验2）：
- 所有群组使用相同的时间窗口
- 所有群组使用相同的群体定义
- 已识别并记录已知混淆变量
- 已检查幸存者偏差

**步骤3：应用多重检验校正**（如果测试多个假设）

见`references/rigor-gates.md`校验3：

| 场景 | 校正方法 |
|----------|------------|
| 2-5次对比 | 报告所有p值，标记为未校正 |
| 6+次对比 | 应用Bonferroni校正：调整后的阈值=0.05/N |
| 探索性扫描 | 标记为探索性结果，不做因果声明 |

**步骤4：应用实际显著性校验**

见`references/rigor-gates.md`校验4：
- 报告效应量和统计显著性
- 报告置信区间，而非仅点估计值
- 评估效应是否超过第二阶段定义的最小可行动阈值
- 提供基准率上下文：使用“从2.1%到2.3%”而非仅“提升10%”

**步骤5：保存分析结果**

保存为`analysis-results.md`：
```markdown

Analysis Results

分析结果

Metrics

指标

[Metric Name]

[指标名称]

Value: [point estimate]
95% CI: [lower - upper]
Sample: N=[count]

数值：[点估计值]
95%置信区间：[下限-上限]
样本量：N=[数量]

Comparisons (if applicable)

对比结果（如适用）

[Group A] vs [Group B]

[群组A] vs [群组B]

Group A: [metric] = [value] (N=[count])
Group B: [metric] = [value] (N=[count])
Difference: [absolute] ([relative]%)
95% CI of difference: [lower - upper]
Practical significance: [above/below minimum threshold]

群组A：[指标] = [数值]（N=[数量]）
群组B：[指标] = [数值]（N=[数量]）
差异：[绝对值]（[相对值]%）
差异的95%置信区间：[下限-上限]
实际显著性：[高于/低于最小阈值]

Rigor Gate Results

严谨性校验结果

Sample Adequacy: [PASS / FAIL - details]
Comparison Fairness: [PASS / FAIL / N/A - details]
Multiple Testing: [PASS / FAIL / N/A - details]
Practical Significance: [PASS / FAIL - details]

样本充足：[通过/失败 - 详情]
对比公平性：[通过/失败/不适用 - 详情]
多重检验：[通过/失败/不适用 - 详情]
实际显著性：[通过/失败 - 详情]

Rigor Violations (if any)

严谨性违规情况（如有）

[List violations and their impact on conclusions]


**GATE**: All defined metrics computed. Rigor gates applied and results documented. Violations either remediated or recorded as limitations. Proceed only when gate passes.

---

[列出违规情况及其对结论的影响]


**校验**：所有定义的指标均已计算。严谨性校验已应用并记录结果。违规情况已得到修正，或作为限制记录在案。仅在校验通过后继续。

---

Phase 5: CONCLUDE (Lead with insights. Return to the decision.)

第五阶段：结论输出（以洞察为先导，回归决策）

Goal: Translate analytical results into a decision-oriented report. Lead with what the data says, not how you computed it.

Why this phase is separate: Phase 4 produces numbers. Phase 5 produces meaning. Separating them prevents the analyst from burying the insight under methodology. The decision-maker reads Phase 5; the auditor reads Phases 2-4.

Step 1: State the headline finding

One sentence that directly addresses the decision from Phase 1:

"The data supports Option A: churn in the test group is 2.3% lower (95% CI: 1.1-3.5%) than control, exceeding the 1% threshold for switching."
"The data is inconclusive: while conversion improved by 0.8%, the confidence interval (-0.2% to 1.8%) includes zero."
"The data supports neither option: both segments show identical retention within measurement error."

Step 2: Present supporting evidence

Summarize the key metrics that support the headline, in order of importance:

Primary metric with confidence interval
Secondary metrics that reinforce or qualify
Segment breakdowns if they reveal important variation

Step 3: State limitations explicitly

What the data does NOT tell you
Rigor gate violations and their implications
Known confounders that could not be controlled
Sample limitations (size, coverage, time window)

Step 4: Return to the decision

Explicitly map findings back to the decision frame:

Does the evidence meet the minimum threshold from Phase 1?
Are there deal-breakers triggered?
What is the recommended action, with stated confidence?
What additional data would increase confidence?

Step 5: Save final report

Save

analysis-report.md

markdown

undefined

目标：将分析结果转化为以决策为导向的报告。以数据说明的内容为先导，而非计算方法。

本阶段独立的原因：第四阶段产生数字。第五阶段产生意义。分离两者可避免分析师将洞察埋没在方法论之下。决策者阅读第五阶段；审计人员阅读第二至第四阶段。

步骤1：陈述核心发现

用一句话直接回应第一阶段的决策：

“数据支持选项A：测试组的流失率比对照组低2.3%（95%置信区间：1.1-3.5%），超过了切换方案的1%阈值。”
“数据无定论：虽然转化率提升了0.8%，但置信区间（-0.2%至1.8%）包含零。”
“数据不支持任何选项：两个细分维度的留存率在测量误差范围内完全相同。”

步骤2：呈现支持证据

按重要性顺序总结支持核心发现的关键指标：

带置信区间的主要指标
强化或限定结论的次要指标
细分维度 breakdown（如果揭示了重要差异）

步骤3：明确陈述限制

数据无法告诉你的内容
严谨性校验违规情况及其影响
无法控制的已知混淆变量
样本限制（规模、覆盖率、时间窗口）

步骤4：回归决策

明确将发现映射回分析框架：

证据是否达到第一阶段的最低阈值？
是否触发了否决项？
建议采取什么行动，置信度如何？
哪些额外数据会提升置信度？

步骤5：保存最终报告

保存为

analysis-report.md

：

markdown

undefined

Analysis Report

分析报告

Headline

核心结论

[One sentence: what the data says about the decision]

[一句话：数据对决策的说明]

Decision Context

决策背景

[Recap from Phase 1 frame]

[第一阶段框架的回顾]

Key Findings

关键发现

[Primary finding with CI]
[Supporting finding]
[Qualifying finding or important segment variation]

[主要发现及置信区间]
[支持性发现]
[限定性发现或重要的细分维度差异]

Limitations

限制

[Limitation 1]
[Limitation 2]

[限制1]
[限制2]

Recommendation

建议

[Action recommendation with confidence level]

[带有置信度的行动建议]

What Would Increase Confidence

提升置信度的方法

[Additional data or analysis that would help]

[有助于提升置信度的额外数据或分析]

Appendix: Methodology

附录：方法论

Data source: [file]
Rows analyzed: [N]
Time window: [range]
Tools: [pandas/stdlib]
Metrics: See metric-definitions.md
Quality: See data-quality-report.md
Detailed results: See analysis-results.md


**GATE**: Report saved with headline finding, limitations, and explicit recommendation tied back to the decision. All artifact files referenced. Analysis complete.

---

数据来源：[文件]
分析行数：[N]
时间窗口：[范围]
使用工具：[pandas/标准库]
指标定义：详见metric-definitions.md
数据质量：详见data-quality-report.md
详细结果：详见analysis-results.md


**校验**：已保存包含核心发现、限制、与决策明确关联的建议的报告。所有工件文件均已引用。分析完成。

---

Examples

示例

Example 1: A/B Test Evaluation

示例1：A/B测试评估

User says: "Evaluate this A/B test - here's the CSV of results" Actions:

FRAME: "Should we ship variant B?" Options: ship B, keep A, extend test. Evidence: conversion lift >1% with 95% CI excluding zero.
DEFINE: Conversion = orders/visitors per variant. Time: test period. Segments: mobile/desktop.
EXTRACT: Load CSV, profile 45k rows, check group sizes balanced, verify no date gaps.
ANALYZE: Variant B conversion 4.2% vs A 3.9%. Difference 0.3% (CI: -0.1% to 0.7%). Fails practical significance -- CI includes zero.
CONCLUDE: "Data is inconclusive. The observed 0.3% lift has a confidence interval that includes zero. Recommend extending the test for 2 more weeks to reach adequate power."

用户说：“评估这个A/B测试——这是结果的CSV文件” 行动：

框架搭建：“我们是否应该发布变体B？”选项：发布B、保留A、延长测试。证据：转化率提升>1%且95%置信区间不包含零。
指标定义：转化率=每个变体的订单数/访客数。时间范围：测试周期。细分维度：移动端/桌面端。
数据提取：加载CSV文件，分析45000行数据，检查组规模是否平衡，验证无日期缺口。
数据分析：变体B的转化率为4.2%，对照组A为3.9%。差异为0.3%（置信区间：-0.1%至0.7%）。未达到实际显著性——置信区间包含零。
结论输出：“数据无定论。观察到的0.3%提升的置信区间包含零。建议再延长2周测试以达到足够的统计效力。”

Example 2: Trend Analysis

示例2：趋势分析

User says: "What's happening with our monthly revenue? Here's 2 years of data." Actions:

FRAME: "Is revenue growth slowing, and should we invest in acquisition?" Options: increase spend, maintain, cut.
DEFINE: Revenue = sum of invoice amounts per month. Growth = month-over-month %. Segments: new vs returning customers.
EXTRACT: Load 24 months, verify no missing months, check for outliers (December spike).
ANALYZE: Overall +2.1%/mo but returning customer revenue flat. All growth from new customers. Seasonality adjusted.
CONCLUDE: "Revenue growth is entirely acquisition-driven. Returning customer revenue has been flat for 8 months, suggesting a retention problem. Recommend investigating churn before increasing acquisition spend."

用户说：“我们的月度收入情况如何？这是2年的数据。” 行动：

框架搭建：“收入增长是否在放缓，我们是否应该加大获客投入？”选项：增加投入、维持现状、削减投入。
指标定义：收入=每月发票金额总和。增长率=环比增长率。细分维度：新客户 vs 老客户。
数据提取：加载24个月的数据，验证无缺失月份，检查异常值（12月峰值）。
数据分析：整体增长率为每月+2.1%，但老客户收入持平。所有增长均来自新客户。已调整季节性因素。
结论输出：“收入增长完全由获客驱动。老客户收入已连续8个月持平，表明存在留存问题。建议在增加获客投入前先调查流失原因。”

Example 3: Distribution Profiling

示例3：分布分析

User says: "Our API response times feel slow. Here's a week of latency data." Actions:

FRAME: "Do we need to optimize the API?" Options: optimize, add caching, do nothing. Threshold: p99 >500ms warrants action.
DEFINE: Latency = request duration in ms. Segments: by endpoint, by hour. Key metrics: p50, p95, p99.
EXTRACT: Load 1.2M requests, check for timestamp gaps, identify endpoints.
ANALYZE: p50=45ms (fine), p99=890ms (exceeds threshold). /search endpoint contributes 73% of p99 violations. Peak hours 2x worse.
CONCLUDE: "p99 latency exceeds the 500ms threshold, concentrated in /search during peak hours. Recommend optimizing /search specifically rather than system-wide caching."

用户说：“我们的API响应时间感觉很慢。这是一周的延迟数据。” 行动：

框架搭建：“我们是否需要优化API？”选项：优化、添加缓存、不采取行动。阈值：p99延迟>500ms则需要行动。
指标定义：延迟=请求持续时间（毫秒）。细分维度：按端点、按小时。关键指标：p50、p95、p99。
数据提取：加载120万条请求数据，检查时间戳缺口，识别端点。
数据分析：p50=45ms（正常），p99=890ms（超过阈值）。/search端点占p99违规情况的73%。高峰时段情况恶化2倍。
结论输出：“p99延迟超过500ms阈值，集中在高峰时段的/search端点。建议专门优化/search端点，而非全系统添加缓存。”

Error Handling

错误处理

Error: "No decision context provided"

错误：“未提供决策上下文”

Cause: User provides data without stating what decision it supports ("just analyze this CSV"). Solution: Ask "What will you do differently based on this analysis?" If truly exploratory, switch to Exploratory Mode -- apply rigor gates but label all findings as exploratory with no causal claims.

原因：用户提供数据但未说明支持的决策（“只需分析这个CSV”）。 解决方案：询问“基于本次分析，你会做出哪些不同的行动？”如果确实是探索性需求，切换到探索模式——应用严谨性校验，但将所有发现标记为探索性结果，不做因果声明。

Error: "Data file cannot be parsed"

错误：“无法解析数据文件”

Cause: Malformed CSV, unexpected encoding, mixed delimiters, or binary file. Solution:

Try common encodings: utf-8, latin-1, utf-8-sig
Detect delimiter: comma, tab, semicolon, pipe
If JSON: validate structure, identify if it's array-of-objects or nested
If still failing: ask user for format details. Do not guess.

原因：CSV格式错误、编码异常、分隔符混用或二进制文件。 解决方案：

尝试常见编码：utf-8、latin-1、utf-8-sig
检测分隔符：逗号、制表符、分号、竖线
如果是JSON：验证结构，确定是对象数组还是嵌套结构
如果仍失败：询问用户格式细节。请勿猜测。

Error: "Insufficient data for planned segments"

错误：“计划的细分维度数据不足”

Cause: Metric definitions specify segments (by region, by tier) but some segments have <30 observations. Solution:

Report which segments are below minimum
Options: merge small segments into "Other", remove segmentation, or accept reduced confidence with disclosure
Return to Phase 2 to adjust definitions if needed, documenting the change

原因：指标定义指定了细分维度（按地区、按等级），但某些细分维度的观测值<30。 解决方案：

报告哪些细分维度未达到最小样本量
选项：将小型细分维度合并为“其他”、移除细分维度、或披露后接受置信度降低
如有需要，返回第二阶段调整定义并记录更改

Error: "Metrics changed after seeing data"

错误：“查看数据后更改了指标定义”

Cause: Analyst realizes original definitions don't work after loading data (column doesn't exist, wrong granularity). Solution: This is expected and acceptable IF handled properly:

Return explicitly to Phase 2
Document what changed and why
Save updated metric-definitions.md with change log
Do NOT silently adjust -- the change must be visible in the artifact trail

原因：分析师在加载数据后发现原始定义不可行（列不存在、粒度错误）。 解决方案：这是可接受的，但必须正确处理：

明确返回第二阶段
记录更改内容和原因
保存更新后的metric-definitions.md并附带变更日志
请勿悄悄调整——更改必须在工件追踪中可见

Anti-Patterns

反模式

Data-First Analysis

数据优先分析

What it looks like: Loading the CSV immediately and computing summary statistics before asking what decision the analysis supports. Why wrong: Produces technically correct summaries that answer the wrong question. The analyst finds "interesting patterns" that don't map to the decision-maker's options. Hours of work, zero actionable insight. Do instead: Complete Phase 1 (FRAME) before touching Phase 3 (EXTRACT). If the user pushes back, explain: "I want to make sure we compute the right metrics. What will you do differently based on this analysis?"

表现：立即加载CSV文件并计算汇总统计数据，而不询问分析支持的决策。 错误原因：产生技术上正确但回答错误问题的总结。分析师发现“有趣的模式”，但这些模式与决策者的选项不匹配。花费数小时工作，却没有可行动的洞察。 正确做法：完成第一阶段（框架搭建）后再进入第三阶段（数据提取）。如果用户反对，请解释：“我想确保我们计算正确的指标。基于本次分析，你会做出哪些不同的行动？”

Point Estimates Without Uncertainty

无不确定性的点估计值

What it looks like: "Conversion rate is 4.2%" with no confidence interval, sample size, or context. Why wrong: 4.2% from 100 observations means something very different from 4.2% from 100,000 observations. Without uncertainty bounds, the decision-maker cannot judge reliability. A 4.2% rate with CI [1.1%, 7.3%] is very different from 4.2% with CI [4.0%, 4.4%]. Do instead: Always report confidence intervals: "4.2% (95% CI: 3.8-4.6%, N=12,400)".

表现：“转化率为4.2%”，未提供置信区间、样本量或上下文。 错误原因：100个观测值得出的4.2%与10万个观测值得出的4.2%意义完全不同。没有不确定性范围，决策者无法判断结果的可靠性。置信区间为[1.1%,7.3%]的4.2%与置信区间为[4.0%,4.4%]的4.2%差异很大。 正确做法：始终报告置信区间：“4.2%（95%置信区间：3.8-4.6%，N=12400）”。

Silent Definition Changes

悄悄更改定义

What it looks like: Defining "active users" as "logged in last 30 days" in Phase 2, then computing it as "logged in last 7 days" in Phase 4 because the data only has 7-day granularity. Why wrong: This is p-hacking. Changing definitions after seeing data -- even for practical reasons -- invalidates the pre-registration. If the change is benign, it should be documented. If it is not documented, there is no way to audit whether it was benign. Do instead: Return to Phase 2, update the definition, document the reason, then proceed.

表现：在第二阶段将“活跃用户”定义为“过去30天登录”，但在第四阶段因数据只有7天粒度而计算为“过去7天登录”。 错误原因：这是p-hacking。查看数据后更改定义——即使出于实际原因——也会使预注册失效。如果更改是良性的，应记录在案。如果未记录，无法审计是否良性。 正确做法：返回第二阶段，更新定义，记录原因，然后继续。

Cherry-Picked Segments

挑选细分维度

What it looks like: "Conversion improved in the 25-34 age group!" without reporting all other age groups or applying multiple testing correction. Why wrong: If you test 10 segments, one will likely show significance by chance (5% false positive rate per test). Reporting only the significant one is misleading. Do instead: Report all segments tested. Apply Bonferroni correction for 6+ comparisons. Label exploratory findings as exploratory.

表现：“25-34岁年龄段的转化率提升了！”但未报告所有其他年龄段，也未应用多重检验校正。 错误原因：如果你测试10个细分维度，其中一个很可能因偶然因素显示显著性（每次测试5%的假阳性率）。仅报告显著的那个会产生误导。 正确做法：报告所有测试的细分维度。对6+次对比应用Bonferroni校正。将探索性发现标记为探索性结果。

Methods-First Communication

方法优先的沟通

What it looks like: "I performed a chi-squared test on the contingency table of conversion outcomes stratified by experimental group, yielding a test statistic of 4.12 with 1 degree of freedom..." Why wrong: The decision-maker needs the insight, not the methodology. Leading with methods buries the finding under jargon. The methodology belongs in the appendix for auditors. Do instead: Lead with the insight: "Variant B converts 12% better than A (95% CI: 3-21%). The effect is statistically significant and exceeds our 5% threshold for shipping." Put methodology in the appendix.

表现：“我对按实验组分层的转化结果列联表进行了卡方检验，得到的检验统计量为4.12，自由度为1……” 错误原因：决策者需要洞察，而非方法论。以方法为先导会将发现埋没在术语之下。方法论属于供审计人员查看的附录。 正确做法：以洞察为先导：“变体B的转化率比A高12%（95%置信区间：3-21%）。该影响具有统计显著性，且超过了我们发布的5%阈值。”将方法论放在附录中。

Anti-Rationalization

反合理化

See shared-patterns/anti-rationalization-core.md for universal patterns.

通用模式详见shared-patterns/anti-rationalization-core.md。

Domain-Specific Rationalizations

领域特定的合理化

Rationalization Attempt	Why It's Wrong	Required Action
"The user just wants numbers, skip framing"	Numbers without decision context are not actionable. The user may not know they need framing -- that is exactly why the skill enforces it.	Complete Phase 1. Ask "What will you do differently?"
"This sample is probably big enough"	"Probably" is not a statistical assessment. Small samples produce wide CIs that cannot support decisions.	Check the actual sample size against the adequacy gate. Report N and CI.
"The metric definition is close enough"	Close enough in a numerator or denominator can flip a conclusion. A/B tests have been decided on the wrong metric because "daily active" vs "monthly active" seemed interchangeable.	Use the exact definition from Phase 2. If it must change, return to Phase 2.
"This one significant segment is the real finding"	Cherry-picking the significant result from many tests is textbook p-hacking. The one segment may be a false positive.	Report all segments. Apply multiple testing correction. Label as exploratory if warranted.
"CIs are too wide, just report the point estimate"	Wide CIs ARE the finding -- they mean the data is insufficient to support a decision. Hiding this misleads the decision-maker.	Report the CI. State that the data is insufficient. Recommend more data.
"The analysis is complex, the user won't understand limitations"	Hiding limitations is more misleading than explaining them. Simple language makes limitations accessible.	State limitations in plain language. "We cannot be confident because..."

合理化尝试	错误原因	要求的行动
“用户只是想要数字，跳过框架搭建”	无决策上下文的数字是不可行动的。用户可能不知道他们需要框架搭建——这正是本技能强制执行它的原因。	完成第一阶段。询问“基于本次分析，你会做出哪些不同的行动？”
“这个样本可能足够大”	“可能”不是统计评估。小样本会产生宽置信区间，无法支持决策。	对照充足性校验检查实际样本量。报告N和置信区间。
“指标定义差不多就行”	分子或分母的细微差异可能会翻转结论。A/B测试曾因“日活跃”vs“月活跃”看似可互换而做出错误决策。	使用第二阶段的精确定义。如果必须更改，返回第二阶段。
“这个显著的细分维度是真正的发现”	从多次测试中挑选显著结果是典型的p-hacking。这个细分维度可能是假阳性。	报告所有测试的细分维度。对6+次对比应用多重检验校正。如适用，标记为探索性结果。
“置信区间太宽，只需报告点估计值”	宽置信区间本身就是发现——这意味着数据不足以支持决策。隐藏这一点会误导决策者。	报告置信区间。说明数据不足。建议补充更多数据。
“分析很复杂，用户不会理解限制”	隐藏限制比解释限制更具误导性。简单的语言可以让限制易于理解。	用平实的语言陈述限制。“我们无法确定，因为……”

Blocker Criteria

阻塞标准

STOP and ask the user (do NOT proceed autonomously) when:

Situation	Why Stop	Ask This
No decision context and user resists framing	Analysis without purpose wastes effort	"Help me understand: what will change based on this analysis?"
Data format unclear	Parsing errors corrupt analysis	"What format is this data in? What do the columns represent?"
Critical columns have >50% missing values	Analysis on mostly-missing data is unreliable	"Column X is 60% missing. Should we exclude it or is there another data source?"
Metric definitions contradict each other	Conflicting definitions produce conflicting results	"Metric A and B use different definitions of 'active user'. Which should we standardize on?"
Results are ambiguous (CI spans zero for primary metric)	User needs to know the data is inconclusive	State clearly: "The data does not support a confident decision. Here are options for getting more data."

出现以下情况时，请停止并询问用户（请勿自主继续）：

情况	停止原因	询问内容
无决策上下文且用户拒绝框架搭建	无目的的分析浪费精力	“请帮助我理解：基于本次分析，会有哪些变化？”
数据格式不明确	解析错误会破坏分析	“这份数据是什么格式？各列代表什么含义？”
关键列缺失值率>50%	基于大部分缺失数据的分析不可靠	“列X的缺失率为60%。我们应该排除它还是使用其他数据源？”
指标定义相互矛盾	冲突的定义会产生冲突的结果	“指标A和B对‘活跃用户’的定义不同。我们应该统一使用哪一个？”
结果不明确（主要指标的置信区间包含零）	用户需要知道数据无定论	明确说明：“数据无法支持有信心的决策。以下是获取更多数据的选项。”

Never Guess On

绝不要猜测

Column semantics (what does "status" mean? what values are valid?)
Population definitions (who is included/excluded from the analysis)
Business thresholds (what constitutes a "meaningful" change)
Causal claims (correlation is not causation -- do not imply otherwise)

列的语义（“status”是什么意思？哪些值是有效的？）
群体定义（分析包含/排除哪些对象？）
业务阈值（什么构成“有意义”的变化？）
因果声明（相关性不等于因果性——请勿暗示因果关系）

Death Loop Prevention

死循环预防

Retry Limits

重试限制

Maximum 3 attempts to parse a data file before asking the user for format help
Maximum 2 definition revisions in Phase 2 before flagging scope concern
Maximum 3 rigor gate remediation attempts before documenting as limitation

解析数据文件最多尝试3次，之后询问用户格式细节
第二阶段的定义修订最多2次，之后标记范围问题
严谨性校验修复最多尝试3次，之后作为限制记录

Recovery Protocol

恢复流程

Detection: Phase cycling (returning to Phase 2 repeatedly), growing artifact count without convergence, same error recurring
Intervention: Simplify the analysis scope. Drop segments, reduce metrics to the single most important one, narrow time window.
Prevention: Frame the decision tightly in Phase 1. Fewer options = fewer metrics = faster convergence.

检测：阶段循环（反复返回第二阶段）、工件数量增加但无收敛、重复出现相同错误
干预：简化分析范围。移除细分维度、将指标减少到唯一最重要的一个、缩小时间窗口。
预防：在第一阶段严格限定决策。选项越少=指标越少=收敛越快。

References

参考资料

For detailed information:

Rigor Gates: references/rigor-gates.md - Detailed statistical gate documentation with examples
Output Templates: references/output-templates.md - Templates for different analysis types (A/B test, trend, distribution, cohort)
Anti-Patterns: references/anti-patterns.md - Extended anti-pattern catalog with code examples

This skill uses these shared patterns:

Anti-Rationalization - Prevents shortcut rationalizations
Verification Checklist - Pre-completion checks
Gate Enforcement - Phase transition rules

详细信息请参阅：

严谨性校验：references/rigor-gates.md - 带示例的详细统计校验文档
输出模板：references/output-templates.md - 不同分析类型的模板（A/B测试、趋势、分布、群组）
反模式：references/anti-patterns.md - 扩展的反模式目录，带代码示例

本技能使用以下共享模式：

反合理化 - 防止捷径式合理化
验证清单 - 完成前检查
校验强制执行：shared-patterns/gate-enforcement.md - 阶段转换规则