analysis-artifacts

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Analysis Artifacts

分析工件

When to Use

使用场景

  • When asked to do a "deep dive" or "analysis" on a question with a non-obvious answer
  • When the analysis requires exploratory querying in BigQuery
  • When the output should be reproducible and shareable (not just a one-off answer)
  • 当被要求针对非显而易见的问题进行「深度挖掘」或「分析」时
  • 当分析需要在BigQuery中进行探索性查询时
  • 当输出结果需要可复现且可共享时(而非一次性答案)

Workflow

工作流程

1. Scaffold the analysis directory

1. 搭建分析目录结构

At the start of every analysis:
  • Create a new directory in the
    analyses
    folder, named according to the existing pattern there
  • Create subdirectories:
    /assets/queries
    and
    /assets/visualizations
  • Create a
    README.md
    at the root of the new directory — this is the main readable document for the analysis
在每次分析开始时:
  • analyses
    文件夹中创建一个新目录,命名遵循已有目录的格式
  • 创建子目录:
    /assets/queries
    /assets/visualizations
  • 在新目录的根目录下创建
    README.md
    文件——这是分析的主要可读文档

2. Plan the analysis

2. 制定分析计划

Always create a plan before starting, whether or not the user asked for one. Steps in the plan should map to the logical sub-questions or sub-areas you've deemed important to explore. Present the plan and wait for a go-ahead before proceeding.
无论用户是否要求,开始分析前都必须制定计划。计划中的步骤应对应你认为需要探索的关键子问题或子领域。先向用户展示计划,等待确认后再继续执行。

3. Set up the README

3. 配置README文件

Once the plan is approved:
  • Add a title, author, and date to the top of the README
  • Add a Problem Statement section summarizing the analysis question and the sub-pieces you'll explore
  • Add a Cohorts Definition section. This must be extremely explicit about the groups being compared. If comparing two groups (e.g., free vs. paid, new vs. old, before vs. after a milestone), define cohorts in a way that controls for confounding factors. Consider:
    • Signup/activation time (as defined by your product — e.g., first login, first meaningful action); this relates to user tenure
    • Plan type or subscription tier (e.g., free vs. paid)
    • Controlling for observation time window length across cohorts
    • Product-specific usage propensity metrics relevant to the analysis question
    Once defined, respect these cohort definitions in all queries throughout the analysis.
计划获批后:
  • 在README顶部添加标题、作者和日期
  • 添加「问题陈述」部分,总结分析问题及你将探索的子模块
  • 添加「群组定义」部分。此部分必须极其明确地说明被对比的群组。如果对比两个群组(例如免费版 vs 付费版、新用户 vs 老用户、里程碑前后),需以控制混杂因素的方式定义群组。需考虑:
    • 注册/激活时间(由产品定义——例如首次登录、首次有意义操作);这与用户使用时长相关
    • 套餐类型或订阅层级(例如免费版 vs 付费版)
    • 控制不同群组的观察时间窗口长度
    • 与分析问题相关的产品特定使用倾向指标
    定义完成后,在分析的所有查询中都要严格遵循这些群组定义。

4. Create artifacts as you go

4. 逐步生成工件

For every material step in the analysis:
  • SQL query artifact: For any BigQuery query that powers a visualization, summary, or key insight, save a
    .sql
    file in
    /assets/queries/
    with a descriptive name and a comment block explaining the query's purpose. Only create the file after you're satisfied with the results. Skip trivial or one-off lookup queries.
  • Visualization or table artifact: For each key insight, assess whether it's best conveyed through a chart or a table. Lean toward visualizations. If a visualization, write a Python script to generate it and save both the script and the output image to
    /assets/visualizations/
    with descriptive names. If a table, save it as a
    .csv
    in
    /assets/visualizations/
    .
对于分析中的每个关键步骤:
  • SQL查询工件:任何用于生成可视化图表、汇总表或关键洞察的BigQuery查询,都需保存为
    .sql
    文件至
    /assets/queries/
    目录,文件名需具有描述性,并添加注释块说明查询目的。仅在你对结果满意后再创建文件。可忽略琐碎或一次性的查询。
  • 可视化或表格工件:针对每个关键洞察,评估其更适合用图表还是表格呈现。优先选择可视化。如果是可视化,编写Python脚本生成图表,并将脚本和输出图像保存至
    /assets/visualizations/
    目录,文件名需具有描述性。如果是表格,保存为
    .csv
    文件至
    /assets/visualizations/
    目录。

5. Overwriting artifacts

5. 覆盖工件

If you need to redo part of the analysis (due to a methodology correction or user feedback), overwrite all associated artifacts:
  • Replace the
    .sql
    query file
  • Replace the visualization script and regenerate the image
  • Replace the
    .csv
    table file
Note the change to the user when you do this.
如果需要重新进行部分分析(因方法修正或用户反馈),需覆盖所有相关工件:
  • 替换
    .sql
    查询文件
  • 替换可视化脚本并重新生成图像
  • 替换
    .csv
    表格文件
执行此操作时需向用户说明变更内容。

6. Summarize the analysis

6. 总结分析内容

When the analysis is complete (either at the end of the plan or when the user asks), write the full README:
  • Summarize each step and sub-question in logical document sections
  • Be crisp and concise — avoid unnecessary verbosity
  • Embed saved viz images from
    /assets/visualizations/
    where appropriate
  • Generate markdown tables from
    .csv
    files in
    /assets/visualizations/
  • Include a small reference hyperlink to the associated query file in each section
  • Add a TL;DR section near the top (after Problem Statement, before Cohorts Definition)
  • Add a Key Takeaways section at the end
分析完成后(按计划完成或用户要求结束时),完善完整的README文件:
  • 将每个步骤和子问题总结为逻辑清晰的文档章节
  • 内容简洁明了——避免不必要的冗长
  • 在合适的位置嵌入
    /assets/visualizations/
    目录中保存的可视化图像
  • /assets/visualizations/
    目录的
    .csv
    文件生成Markdown表格
  • 在每个章节中添加指向相关查询文件的小型参考超链接
  • 在顶部(问题陈述之后,群组定义之前)添加「TL;DR(摘要)」部分
  • 在末尾添加「关键结论」部分

Examples

示例

bash
analyses/
└── 2024-01-user-retention/
    ├── README.md
    └── assets/
        ├── queries/
        │   ├── cohort_retention_by_week.sql
        │   └── retention_by_plan_type.sql
        └── visualizations/
            ├── retention_curve.py
            ├── retention_curve.png
            └── plan_type_summary.csv
bash
analyses/
└── 2024-01-user-retention/
    ├── README.md
    └── assets/
        ├── queries/
        │   ├── cohort_retention_by_week.sql
        │   └── retention_by_plan_type.sql
        └── visualizations/
            ├── retention_curve.py
            ├── retention_curve.png
            └── plan_type_summary.csv