data-analysis

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
<objective> Enable executive-grade data analysis for VC, PE, and C-suite presentations. Covers data ingestion from any format, SaaS metrics calculations (MRR, LTV, CAC, churn), cohort retention analysis, McKinsey-quality visualizations with Plotly, and Streamlit dashboards. </objective>
<quick_start> Universal data loader:
python
df = load_data("file.csv")  # Supports CSV, Excel, JSON, Parquet, PDF, PPTX
SaaS metrics:
python
metrics = calculate_saas_metrics(df)  # MRR, ARR, LTV, CAC, churn
retention = cohort_retention_analysis(df)  # Retention matrix
McKinsey-style charts: Action titles ("Q4 Revenue Exceeded Target by 23%"), not descriptive titles </quick_start>
<success_criteria> Analysis is successful when:
  • Data loaded and cleaned (dropna, dedup, type conversion)
  • Metrics calculated correctly (MRR, ARR, LTV:CAC, churn, cohort retention)
  • Charts follow McKinsey principles: action titles, data-ink ratio >80%, one message per chart
  • Executive colors used (#003366 primary, #2E7D32 positive, #C62828 negative)
  • Streamlit dashboard runs without errors
  • NO OPENAI: Use Claude for narrative generation if needed </success_criteria>
<core_content> Executive-grade data analysis for VC, PE, C-suite presentations using pandas, polars, Plotly, Altair, and Streamlit.
<objective> 为风投(VC)、私募股权(PE)及高管层演示提供高管级数据分析能力。支持任意格式的数据导入、SaaS指标计算(MRR、LTV、CAC、客户流失率)、同期群留存分析、基于Plotly的麦肯锡水准可视化,以及Streamlit仪表盘搭建。 </objective>
<quick_start> 通用数据加载器:
python
df = load_data("file.csv")  # 支持CSV、Excel、JSON、Parquet、PDF、PPTX格式
SaaS指标计算:
python
metrics = calculate_saas_metrics(df)  # 计算MRR、ARR、LTV、CAC、客户流失率
retention = cohort_retention_analysis(df)  # 生成留存矩阵
麦肯锡风格图表: 使用行动式标题(如“Q4营收超出目标23%”),而非描述性标题 </quick_start>
<success_criteria> 分析成功的判定标准:
  • 数据已加载并清洗完成(剔除空值、去重、类型转换)
  • 指标计算准确(MRR、ARR、LTV:CAC、客户流失率、同期群留存)
  • 图表遵循麦肯锡原则:行动式标题、数据墨水占比>80%、每张图表传递一个核心信息
  • 使用高管风格配色(主色调#003366、正向色#2E7D32、负向色#C62828)
  • Streamlit仪表盘可正常运行无报错
  • 禁止使用OpenAI:如需生成叙事内容请使用Claude </success_criteria>
<core_content> 借助pandas、polars、Plotly、Altair和Streamlit,为风投、私募及高管层演示提供高管级数据分析能力。

Quick Reference

快速参考

TaskToolsOutput
Data ingestionpandas, polars, pdfplumber, python-pptxDataFrame
Wranglingpandas/polars transformsClean dataset
Analysisnumpy, scipy, statsmodelsInsights
VisualizationPlotly, Altair, SeabornCharts
DashboardsStreamlit, DuckDBInteractive apps
PresentationsPlotly export, PDF generationInvestor-ready
任务工具输出结果
数据导入pandas, polars, pdfplumber, python-pptxDataFrame
数据清洗pandas/polars 转换操作清洗后的数据集
数据分析numpy, scipy, statsmodels分析洞察
数据可视化Plotly, Altair, Seaborn图表
仪表盘搭建Streamlit, DuckDB交互式应用
演示文稿制作Plotly导出、PDF生成符合投资者需求的材料

Data Ingestion Patterns

数据导入模式

Universal Data Loader

通用数据加载器

python
import pandas as pd
import polars as pl
from pathlib import Path

def load_data(file_path: str) -> pd.DataFrame:
    """Load data from any common format."""
    path = Path(file_path)
    suffix = path.suffix.lower()

    loaders = {
        '.csv': lambda p: pd.read_csv(p),
        '.xlsx': lambda p: pd.read_excel(p, engine='openpyxl'),
        '.xls': lambda p: pd.read_excel(p, engine='xlrd'),
        '.json': lambda p: pd.read_json(p),
        '.parquet': lambda p: pd.read_parquet(p),
        '.sql': lambda p: pd.read_sql(open(p).read(), conn),
        '.md': lambda p: parse_markdown_tables(p),
        '.pdf': lambda p: extract_pdf_tables(p),
        '.pptx': lambda p: extract_pptx_tables(p),
    }

    if suffix not in loaders:
        raise ValueError(f"Unsupported format: {suffix}")

    return loaders[suffix](path)
python
import pandas as pd
import polars as pl
from pathlib import Path

def load_data(file_path: str) -> pd.DataFrame:
    """从任意常见格式加载数据。"""
    path = Path(file_path)
    suffix = path.suffix.lower()

    loaders = {
        '.csv': lambda p: pd.read_csv(p),
        '.xlsx': lambda p: pd.read_excel(p, engine='openpyxl'),
        '.xls': lambda p: pd.read_excel(p, engine='xlrd'),
        '.json': lambda p: pd.read_json(p),
        '.parquet': lambda p: pd.read_parquet(p),
        '.sql': lambda p: pd.read_sql(open(p).read(), conn),
        '.md': lambda p: parse_markdown_tables(p),
        '.pdf': lambda p: extract_pdf_tables(p),
        '.pptx': lambda p: extract_pptx_tables(p),
    }

    if suffix not in loaders:
        raise ValueError(f"不支持的格式:{suffix}")

    return loaders[suffix](path)

PDF Table Extraction

PDF表格提取

python
import pdfplumber

def extract_pdf_tables(pdf_path: str) -> pd.DataFrame:
    """Extract tables from PDF using pdfplumber."""
    all_tables = []

    with pdfplumber.open(pdf_path) as pdf:
        for page in pdf.pages:
            tables = page.extract_tables()
            for table in tables:
                if table and len(table) > 1:
                    df = pd.DataFrame(table[1:], columns=table[0])
                    all_tables.append(df)

    return pd.concat(all_tables, ignore_index=True) if all_tables else pd.DataFrame()
python
import pdfplumber

def extract_pdf_tables(pdf_path: str) -> pd.DataFrame:
    """使用pdfplumber从PDF中提取表格。"""
    all_tables = []

    with pdfplumber.open(pdf_path) as pdf:
        for page in pdf.pages:
            tables = page.extract_tables()
            for table in tables:
                if table and len(table) > 1:
                    df = pd.DataFrame(table[1:], columns=table[0])
                    all_tables.append(df)

    return pd.concat(all_tables, ignore_index=True) if all_tables else pd.DataFrame()

PowerPoint Data Extraction

PowerPoint数据提取

python
from pptx import Presentation
from pptx.util import Inches

def extract_pptx_tables(pptx_path: str) -> list[pd.DataFrame]:
    """Extract all tables from PowerPoint."""
    prs = Presentation(pptx_path)
    tables = []

    for slide in prs.slides:
        for shape in slide.shapes:
            if shape.has_table:
                table = shape.table
                data = []
                for row in table.rows:
                    data.append([cell.text for cell in row.cells])
                df = pd.DataFrame(data[1:], columns=data[0])
                tables.append(df)

    return tables
python
from pptx import Presentation
from pptx.util import Inches

def extract_pptx_tables(pptx_path: str) -> list[pd.DataFrame]:
    """从PowerPoint中提取所有表格。"""
    prs = Presentation(pptx_path)
    tables = []

    for slide in prs.slides:
        for shape in slide.shapes:
            if shape.has_table:
                table = shape.table
                data = []
                for row in table.rows:
                    data.append([cell.text for cell in row.cells])
                df = pd.DataFrame(data[1:], columns=data[0])
                tables.append(df)

    return tables

Data Wrangling Patterns

数据清洗模式

Polars for Performance (30x faster than pandas)

高性能Polars(比pandas快30倍)

python
import polars as pl
python
import polars as pl

Lazy evaluation for large datasets

针对大型数据集的惰性评估

df = ( pl.scan_csv("large_file.csv") .filter(pl.col("revenue") > 0) .with_columns([ (pl.col("revenue") / pl.col("customers")).alias("arpu"), pl.col("date").str.to_date().alias("date_parsed"), ]) .group_by("segment") .agg([ pl.col("revenue").sum().alias("total_revenue"), pl.col("customers").mean().alias("avg_customers"), ]) .collect() )
undefined
df = ( pl.scan_csv("large_file.csv") .filter(pl.col("revenue") > 0) .with_columns([ (pl.col("revenue") / pl.col("customers")).alias("arpu"), pl.col("date").str.to_date().alias("date_parsed"), ]) .group_by("segment") .agg([ pl.col("revenue").sum().alias("total_revenue"), pl.col("customers").mean().alias("avg_customers"), ]) .collect() )
undefined

Common Transformations

常见转换操作

python
def prepare_for_analysis(df: pd.DataFrame) -> pd.DataFrame:
    """Standard data prep pipeline."""
    return (df
        .dropna(subset=['key_column'])
        .drop_duplicates()
        .assign(
            date=lambda x: pd.to_datetime(x['date']),
            revenue=lambda x: pd.to_numeric(x['revenue'], errors='coerce'),
            month=lambda x: x['date'].dt.to_period('M'),
        )
        .sort_values('date')
        .reset_index(drop=True)
    )
python
def prepare_for_analysis(df: pd.DataFrame) -> pd.DataFrame:
    """标准化数据预处理流程。"""
    return (df
        .dropna(subset=['key_column'])
        .drop_duplicates()
        .assign(
            date=lambda x: pd.to_datetime(x['date']),
            revenue=lambda x: pd.to_numeric(x['revenue'], errors='coerce'),
            month=lambda x: x['date'].dt.to_period('M'),
        )
        .sort_values('date')
        .reset_index(drop=True)
    )

SaaS Metrics Calculations

SaaS指标计算

Core Metrics

核心指标

python
def calculate_saas_metrics(df: pd.DataFrame) -> dict:
    """Calculate key SaaS metrics for investor reporting."""

    # MRR / ARR
    mrr = df.groupby('month')['mrr'].sum()
    arr = mrr.iloc[-1] * 12

    # Growth rates
    mrr_growth = mrr.pct_change().iloc[-1]

    # Churn
    churned = df[df['status'] == 'churned']['mrr'].sum()
    total_mrr = df['mrr'].sum()
    churn_rate = churned / total_mrr if total_mrr > 0 else 0

    # CAC & LTV
    total_sales_marketing = df['sales_cost'].sum() + df['marketing_cost'].sum()
    new_customers = df[df['is_new']]['customer_id'].nunique()
    cac = total_sales_marketing / new_customers if new_customers > 0 else 0

    avg_revenue_per_customer = df.groupby('customer_id')['mrr'].mean().mean()
    avg_lifespan_months = 1 / churn_rate if churn_rate > 0 else 36
    ltv = avg_revenue_per_customer * avg_lifespan_months

    ltv_cac_ratio = ltv / cac if cac > 0 else 0
    cac_payback_months = cac / avg_revenue_per_customer if avg_revenue_per_customer > 0 else 0

    return {
        'mrr': mrr.iloc[-1],
        'arr': arr,
        'mrr_growth': mrr_growth,
        'churn_rate': churn_rate,
        'cac': cac,
        'ltv': ltv,
        'ltv_cac_ratio': ltv_cac_ratio,
        'cac_payback_months': cac_payback_months,
    }
python
def calculate_saas_metrics(df: pd.DataFrame) -> dict:
    """计算用于投资者汇报的核心SaaS指标。"""

    # MRR / ARR
    mrr = df.groupby('month')['mrr'].sum()
    arr = mrr.iloc[-1] * 12

    # 增长率
    mrr_growth = mrr.pct_change().iloc[-1]

    # 客户流失率
    churned = df[df['status'] == 'churned']['mrr'].sum()
    total_mrr = df['mrr'].sum()
    churn_rate = churned / total_mrr if total_mrr > 0 else 0

    # CAC & LTV
    total_sales_marketing = df['sales_cost'].sum() + df['marketing_cost'].sum()
    new_customers = df[df['is_new']]['customer_id'].nunique()
    cac = total_sales_marketing / new_customers if new_customers > 0 else 0

    avg_revenue_per_customer = df.groupby('customer_id')['mrr'].mean().mean()
    avg_lifespan_months = 1 / churn_rate if churn_rate > 0 else 36
    ltv = avg_revenue_per_customer * avg_lifespan_months

    ltv_cac_ratio = ltv / cac if cac > 0 else 0
    cac_payback_months = cac / avg_revenue_per_customer if avg_revenue_per_customer > 0 else 0

    return {
        'mrr': mrr.iloc[-1],
        'arr': arr,
        'mrr_growth': mrr_growth,
        'churn_rate': churn_rate,
        'cac': cac,
        'ltv': ltv,
        'ltv_cac_ratio': ltv_cac_ratio,
        'cac_payback_months': cac_payback_months,
    }

Cohort Analysis

同期群分析

python
def cohort_retention_analysis(df: pd.DataFrame) -> pd.DataFrame:
    """Build cohort retention matrix for investor reporting."""

    # Assign cohort (first purchase month)
    df['cohort'] = df.groupby('customer_id')['date'].transform('min').dt.to_period('M')
    df['period'] = df['date'].dt.to_period('M')
    df['cohort_age'] = (df['period'] - df['cohort']).apply(lambda x: x.n)

    # Build retention matrix
    cohort_data = df.groupby(['cohort', 'cohort_age']).agg({
        'customer_id': 'nunique',
        'revenue': 'sum'
    }).reset_index()

    # Pivot for visualization
    cohort_counts = cohort_data.pivot(
        index='cohort',
        columns='cohort_age',
        values='customer_id'
    )

    # Calculate retention percentages
    cohort_sizes = cohort_counts.iloc[:, 0]
    retention = cohort_counts.divide(cohort_sizes, axis=0) * 100

    return retention
python
def cohort_retention_analysis(df: pd.DataFrame) -> pd.DataFrame:
    """构建用于投资者汇报的同期群留存矩阵。"""

    # 分配同期群(首次购买月份)
    df['cohort'] = df.groupby('customer_id')['date'].transform('min').dt.to_period('M')
    df['period'] = df['date'].dt.to_period('M')
    df['cohort_age'] = (df['period'] - df['cohort']).apply(lambda x: x.n)

    # 构建留存矩阵
    cohort_data = df.groupby(['cohort', 'cohort_age']).agg({
        'customer_id': 'nunique',
        'revenue': 'sum'
    }).reset_index()

    # 透视表用于可视化
    cohort_counts = cohort_data.pivot(
        index='cohort',
        columns='cohort_age',
        values='customer_id'
    )

    # 计算留存百分比
    cohort_sizes = cohort_counts.iloc[:, 0]
    retention = cohort_counts.divide(cohort_sizes, axis=0) * 100

    return retention

Executive Visualization

高管级可视化

McKinsey/BCG Chart Principles

麦肯锡/BCG图表原则

yaml
mckinsey_style:
  colors:
    primary: "#003366"      # Deep blue
    accent: "#0066CC"       # Bright blue
    positive: "#2E7D32"     # Green
    negative: "#C62828"     # Red
    neutral: "#757575"      # Gray

  typography:
    title: "Georgia, serif"
    body: "Arial, sans-serif"
    size_title: 18
    size_body: 12

  principles:
    - "One message per chart"
    - "Action title (not descriptive)"
    - "Data-ink ratio > 80%"
    - "Remove chartjunk"
    - "Label directly on chart"
yaml
mckinsey_style:
  colors:
    primary: "#003366"      # 深蓝色
    accent: "#0066CC"       # 亮蓝色
    positive: "#2E7D32"     # 绿色
    negative: "#C62828"     # 红色
    neutral: "#757575"      # 灰色

  typography:
    title: "Georgia, serif"
    body: "Arial, sans-serif"
    size_title: 18
    size_body: 12

  principles:
    - "每张图表传递一个核心信息"
    - "使用行动式标题(而非描述性标题)"
    - "数据墨水占比>80%"
    - "移除不必要的图表元素"
    - "直接在图表上标注"

Plotly Executive Charts

Plotly高管风格图表

python
import plotly.express as px
import plotly.graph_objects as go

EXEC_COLORS = {
    'primary': '#003366',
    'secondary': '#0066CC',
    'positive': '#2E7D32',
    'negative': '#C62828',
    'neutral': '#757575',
}

def exec_line_chart(df, x, y, title):
    """McKinsey-style line chart."""
    fig = px.line(df, x=x, y=y)

    fig.update_layout(
        title=dict(
            text=f"<b>{title}</b>",
            font=dict(size=18, family="Georgia"),
            x=0,
        ),
        font=dict(family="Arial", size=12),
        plot_bgcolor='white',
        xaxis=dict(showgrid=False, showline=True, linecolor='black'),
        yaxis=dict(showgrid=True, gridcolor='#E0E0E0', showline=True, linecolor='black'),
        margin=dict(l=60, r=40, t=60, b=40),
    )

    fig.update_traces(line=dict(color=EXEC_COLORS['primary'], width=3))

    return fig

def exec_waterfall(values, labels, title):
    """Waterfall chart for revenue/cost breakdown."""
    fig = go.Figure(go.Waterfall(
        orientation="v",
        measure=["relative"] * (len(values) - 1) + ["total"],
        x=labels,
        y=values,
        connector=dict(line=dict(color="rgb(63, 63, 63)")),
        increasing=dict(marker=dict(color=EXEC_COLORS['positive'])),
        decreasing=dict(marker=dict(color=EXEC_COLORS['negative'])),
        totals=dict(marker=dict(color=EXEC_COLORS['primary'])),
    ))

    fig.update_layout(
        title=dict(text=f"<b>{title}</b>", font=dict(size=18, family="Georgia")),
        font=dict(family="Arial", size=12),
        plot_bgcolor='white',
        showlegend=False,
    )

    return fig
python
import plotly.express as px
import plotly.graph_objects as go

EXEC_COLORS = {
    'primary': '#003366',
    'secondary': '#0066CC',
    'positive': '#2E7D32',
    'negative': '#C62828',
    'neutral': '#757575',
}

def exec_line_chart(df, x, y, title):
    """麦肯锡风格折线图。"""
    fig = px.line(df, x=x, y=y)

    fig.update_layout(
        title=dict(
            text=f"<b>{title}</b>",
            font=dict(size=18, family="Georgia"),
            x=0,
        ),
        font=dict(family="Arial", size=12),
        plot_bgcolor='white',
        xaxis=dict(showgrid=False, showline=True, linecolor='black'),
        yaxis=dict(showgrid=True, gridcolor='#E0E0E0', showline=True, linecolor='black'),
        margin=dict(l=60, r=40, t=60, b=40),
    )

    fig.update_traces(line=dict(color=EXEC_COLORS['primary'], width=3))

    return fig

def exec_waterfall(values, labels, title):
    """用于营收/成本拆解的瀑布图。"""
    fig = go.Figure(go.Waterfall(
        orientation="v",
        measure=["relative"] * (len(values) - 1) + ["total"],
        x=labels,
        y=values,
        connector=dict(line=dict(color="rgb(63, 63, 63)")),
        increasing=dict(marker=dict(color=EXEC_COLORS['positive'])),
        decreasing=dict(marker=dict(color=EXEC_COLORS['negative'])),
        totals=dict(marker=dict(color=EXEC_COLORS['primary'])),
    ))

    fig.update_layout(
        title=dict(text=f"<b>{title}</b>", font=dict(size=18, family="Georgia")),
        font=dict(family="Arial", size=12),
        plot_bgcolor='white',
        showlegend=False,
    )

    return fig

Cohort Heatmap

同期群热力图

python
def cohort_heatmap(retention_df, title="Customer Retention by Cohort"):
    """Publication-quality cohort retention heatmap."""
    import plotly.figure_factory as ff

    fig = px.imshow(
        retention_df.values,
        labels=dict(x="Months Since Acquisition", y="Cohort", color="Retention %"),
        x=list(retention_df.columns),
        y=[str(c) for c in retention_df.index],
        color_continuous_scale='Blues',
        aspect='auto',
    )

    # Add text annotations
    for i, row in enumerate(retention_df.values):
        for j, val in enumerate(row):
            if not pd.isna(val):
                fig.add_annotation(
                    x=j, y=i,
                    text=f"{val:.0f}%",
                    showarrow=False,
                    font=dict(color='white' if val > 50 else 'black', size=10)
                )

    fig.update_layout(
        title=dict(text=f"<b>{title}</b>", font=dict(size=18, family="Georgia")),
        font=dict(family="Arial", size=12),
    )

    return fig
python
def cohort_heatmap(retention_df, title="Customer Retention by Cohort"):
    """达到出版级别的同期群留存热力图。"""
    import plotly.figure_factory as ff

    fig = px.imshow(
        retention_df.values,
        labels=dict(x="用户获取后的月份数", y="同期群", color="留存率%"),
        x=list(retention_df.columns),
        y=[str(c) for c in retention_df.index],
        color_continuous_scale='Blues',
        aspect='auto',
    )

    # 添加文本标注
    for i, row in enumerate(retention_df.values):
        for j, val in enumerate(row):
            if not pd.isna(val):
                fig.add_annotation(
                    x=j, y=i,
                    text=f"{val:.0f}%",
                    showarrow=False,
                    font=dict(color='white' if val > 50 else 'black', size=10)
                )

    fig.update_layout(
        title=dict(text=f"<b>{title}</b>", font=dict(size=18, family="Georgia")),
        font=dict(family="Arial", size=12),
    )

    return fig

Streamlit Dashboard Template

Streamlit仪表盘模板

python
import streamlit as st
import pandas as pd
import plotly.express as px

st.set_page_config(page_title="Executive Dashboard", layout="wide")
python
import streamlit as st
import pandas as pd
import plotly.express as px

st.set_page_config(page_title="Executive Dashboard", layout="wide")

Custom CSS for executive styling

高管风格自定义CSS

st.markdown("""
<style> .metric-card { background: linear-gradient(135deg, #003366, #0066CC); padding: 20px; border-radius: 10px; color: white; } .stMetric label { font-family: Georgia, serif; } </style>
""", unsafe_allow_html=True)
st.markdown("""
<style> .metric-card { background: linear-gradient(135deg, #003366, #0066CC); padding: 20px; border-radius: 10px; color: white; } .stMetric label { font-family: Georgia, serif; } </style>
""", unsafe_allow_html=True)

Header

头部

st.title("Executive Dashboard") st.markdown("---")
st.title("高管仪表盘") st.markdown("---")

KPI Row

KPI行

col1, col2, col3, col4 = st.columns(4)
with col1: st.metric("MRR", f"${mrr:,.0f}", f"{mrr_growth:+.1%}") with col2: st.metric("ARR", f"${arr:,.0f}", f"{arr_growth:+.1%}") with col3: st.metric("LTV:CAC", f"{ltv_cac:.1f}x", delta_color="normal") with col4: st.metric("Churn", f"{churn:.1%}", f"{churn_delta:+.1%}", delta_color="inverse")
col1, col2, col3, col4 = st.columns(4)
with col1: st.metric("MRR", f"${mrr:,.0f}", f"{mrr_growth:+.1%}") with col2: st.metric("ARR", f"${arr:,.0f}", f"{arr_growth:+.1%}") with col3: st.metric("LTV:CAC", f"{ltv_cac:.1f}x", delta_color="normal") with col4: st.metric("客户流失率", f"{churn:.1%}", f"{churn_delta:+.1%}", delta_color="inverse")

Charts Row

图表行

st.markdown("## Revenue Trend") st.plotly_chart(exec_line_chart(df, 'month', 'revenue', 'MRR Growth Exceeds Target'), use_container_width=True)
st.markdown("## 营收趋势") st.plotly_chart(exec_line_chart(df, 'month', 'revenue', 'MRR增长超出目标'), use_container_width=True)

Cohort Analysis

同期群分析

st.markdown("## Cohort Retention") st.plotly_chart(cohort_heatmap(retention_df), use_container_width=True)
undefined
st.markdown("## 同期群留存") st.plotly_chart(cohort_heatmap(retention_df), use_container_width=True)
undefined

Investor Presentation Patterns

投资者演示文稿模式

Pitch Deck Metrics Sequence

融资演示指标序列

yaml
investor_metrics_flow:
  1_unit_economics:
    charts: ["CAC vs LTV bar", "LTV:CAC trend line"]
    key_message: "3x+ LTV:CAC proves efficient growth"

  2_mrr_waterfall:
    charts: ["MRR waterfall (new, expansion, churn, contraction)"]
    key_message: "Net revenue retention > 100%"

  3_cohort_retention:
    charts: ["Cohort heatmap", "Revenue retention curve"]
    key_message: "Strong retention = compounding value"

  4_growth_efficiency:
    charts: ["Magic Number", "CAC payback period"]
    key_message: "Efficient growth engine"

  5_projections:
    charts: ["ARR projection with scenarios"]
    key_message: "Clear path to $X ARR"
yaml
investor_metrics_flow:
  1_unit_economics:
    charts: ["CAC vs LTV柱状图", "LTV:CAC趋势线"]
    key_message: "3倍以上的LTV:CAC证明增长效率"

  2_mrr_waterfall:
    charts: ["MRR瀑布图(新增、拓展、流失、收缩)"]
    key_message: "净营收留存率>100%"

  3_cohort_retention:
    charts: ["同期群热力图", "营收留存曲线"]
    key_message: "强劲留存带来复利价值"

  4_growth_efficiency:
    charts: ["魔法数字", "CAC回收期"]
    key_message: "高效增长引擎"

  5_projections:
    charts: ["多场景ARR预测"]
    key_message: "清晰的X美元ARR路径"

Action Titles (McKinsey Style)

行动式标题(麦肯锡风格)

markdown
undefined
markdown
undefined

Bad (Descriptive) → Good (Action)

错误(描述性)→ 正确(行动式)

❌ "Revenue by Quarter" ✅ "Q4 Revenue Exceeded Target by 23%"
❌ "Customer Acquisition Cost" ✅ "CAC Decreased 40% While Maintaining Quality"
❌ "Cohort Analysis" ✅ "90-Day Retention Improved to 85%, Up From 72%"
❌ "Market Size" ✅ "TAM of $4.2B with Clear Path to $500M SAM"
undefined
❌ "季度营收" ✅ "Q4营收超出目标23%"
❌ "客户获取成本" ✅ "CAC降低40%,同时保持客户质量"
❌ "同期群分析" ✅ "90天留存率从72%提升至85%"
❌ "市场规模" ✅ "整体市场规模(TAM)达42亿美元,可触达市场(SAM)清晰可达5亿美元"
undefined

Quick Commands

快速命令

python
undefined
python
undefined

Load and analyze any file

加载并分析任意文件

df = load_data("data.csv") metrics = calculate_saas_metrics(df) retention = cohort_retention_analysis(df)
df = load_data("data.csv") metrics = calculate_saas_metrics(df) retention = cohort_retention_analysis(df)

Generate executive charts

生成高管风格图表

fig = exec_line_chart(df, 'month', 'mrr', 'MRR Growth Accelerating') fig.write_html("mrr_chart.html") fig.write_image("mrr_chart.png", scale=2)
fig = exec_line_chart(df, 'month', 'mrr', 'MRR增长加速') fig.write_html("mrr_chart.html") fig.write_image("mrr_chart.png", scale=2)

Run Streamlit dashboard

运行Streamlit仪表盘

streamlit run dashboard.py

streamlit run dashboard.py

undefined
undefined

Integration Notes

集成说明

  • Pairs with: revenue-ops-skill (metrics), pricing-strategy-skill (modeling)
  • Stack: Python 3.11+, pandas, polars, plotly, altair, streamlit
  • Projects: coperniq-forge (ROI calculators), thetaroom (trading analysis)
  • NO OPENAI: Use Claude for narrative generation
  • 搭配使用: revenue-ops-skill(指标计算)、pricing-strategy-skill(建模)
  • 技术栈: Python 3.11+、pandas、polars、plotly、altair、streamlit
  • 关联项目: coperniq-forge(ROI计算器)、thetaroom(交易分析)
  • 禁止使用OpenAI: 如需生成叙事内容请使用Claude

Reference Files

参考文件

  • reference/chart-gallery.md
    - 20+ chart templates with code
  • reference/saas-metrics.md
    - Complete SaaS KPI definitions
  • reference/streamlit-patterns.md
    - Production dashboard patterns
  • reference/data-wrangling.md
    - Format-specific extraction guides
  • reference/chart-gallery.md
    - 20+带代码的图表模板
  • reference/saas-metrics.md
    - 完整SaaS KPI定义
  • reference/streamlit-patterns.md
    - 生产级仪表盘模式
  • reference/data-wrangling.md
    - 特定格式提取指南 ",