analysis-artifacts

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Analysis Artifacts

分析工件

When to Use

使用场景

When asked to do a "deep dive" or "analysis" on a question with a non-obvious answer
When the analysis requires exploratory querying in BigQuery
When the output should be reproducible and shareable (not just a one-off answer)

当被要求针对非显而易见的问题进行「深度挖掘」或「分析」时
当分析需要在BigQuery中进行探索性查询时
当输出结果需要可复现且可共享时（而非一次性答案）

Workflow

工作流程

1. Scaffold the analysis directory

1. 搭建分析目录结构

At the start of every analysis:

Create a new directory in the
```
analyses
```
folder, named according to the existing pattern there
Create subdirectories:
```
/assets/queries
```
and
```
/assets/visualizations
```
Create a
```
README.md
```
at the root of the new directory — this is the main readable document for the analysis

在每次分析开始时：

在
```
analyses
```
文件夹中创建一个新目录，命名遵循已有目录的格式
创建子目录：
```
/assets/queries
```
和
```
/assets/visualizations
```
在新目录的根目录下创建
```
README.md
```
文件——这是分析的主要可读文档

2. Plan the analysis

2. 制定分析计划

Always create a plan before starting, whether or not the user asked for one. Steps in the plan should map to the logical sub-questions or sub-areas you've deemed important to explore. Present the plan and wait for a go-ahead before proceeding.

无论用户是否要求，开始分析前都必须制定计划。计划中的步骤应对应你认为需要探索的关键子问题或子领域。先向用户展示计划，等待确认后再继续执行。

3. Set up the README

3. 配置README文件

Once the plan is approved:

Add a title, author, and date to the top of the README
Add a Problem Statement section summarizing the analysis question and the sub-pieces you'll explore
Add a Cohorts Definition section. This must be extremely explicit about the groups being compared. If comparing two groups (e.g., free vs. paid, new vs. old, before vs. after a milestone), define cohorts in a way that controls for confounding factors. Consider:
- Signup/activation time (as defined by your product — e.g., first login, first meaningful action); this relates to user tenure
- Plan type or subscription tier (e.g., free vs. paid)
- Controlling for observation time window length across cohorts
- Product-specific usage propensity metrics relevant to the analysis question
Once defined, respect these cohort definitions in all queries throughout the analysis.

计划获批后：

在README顶部添加标题、作者和日期
添加「问题陈述」部分，总结分析问题及你将探索的子模块
添加「群组定义」部分。此部分必须极其明确地说明被对比的群组。如果对比两个群组（例如免费版 vs 付费版、新用户 vs 老用户、里程碑前后），需以控制混杂因素的方式定义群组。需考虑：
- 注册/激活时间（由产品定义——例如首次登录、首次有意义操作）；这与用户使用时长相关
- 套餐类型或订阅层级（例如免费版 vs 付费版）
- 控制不同群组的观察时间窗口长度
- 与分析问题相关的产品特定使用倾向指标
定义完成后，在分析的所有查询中都要严格遵循这些群组定义。

4. Create artifacts as you go

4. 逐步生成工件

For every material step in the analysis:

SQL query artifact: For any BigQuery query that powers a visualization, summary, or key insight, save a
```
.sql
```
file in
```
/assets/queries/
```
with a descriptive name and a comment block explaining the query's purpose. Only create the file after you're satisfied with the results. Skip trivial or one-off lookup queries.
Visualization or table artifact: For each key insight, assess whether it's best conveyed through a chart or a table. Lean toward visualizations. If a visualization, write a Python script to generate it and save both the script and the output image to
```
/assets/visualizations/
```
with descriptive names. If a table, save it as a
```
.csv
```
in
```
/assets/visualizations/
```
.

对于分析中的每个关键步骤：

SQL查询工件：任何用于生成可视化图表、汇总表或关键洞察的BigQuery查询，都需保存为
```
.sql
```
文件至
```
/assets/queries/
```
目录，文件名需具有描述性，并添加注释块说明查询目的。仅在你对结果满意后再创建文件。可忽略琐碎或一次性的查询。
可视化或表格工件：针对每个关键洞察，评估其更适合用图表还是表格呈现。优先选择可视化。如果是可视化，编写Python脚本生成图表，并将脚本和输出图像保存至
```
/assets/visualizations/
```
目录，文件名需具有描述性。如果是表格，保存为
```
.csv
```
文件至
```
/assets/visualizations/
```
目录。

5. Overwriting artifacts

5. 覆盖工件

If you need to redo part of the analysis (due to a methodology correction or user feedback), overwrite all associated artifacts:

Replace the
```
.sql
```
query file
Replace the visualization script and regenerate the image
Replace the
```
.csv
```
table file

Note the change to the user when you do this.

如果需要重新进行部分分析（因方法修正或用户反馈），需覆盖所有相关工件：

替换
```
.sql
```
查询文件
替换可视化脚本并重新生成图像
替换
```
.csv
```
表格文件

执行此操作时需向用户说明变更内容。

6. Summarize the analysis

6. 总结分析内容

When the analysis is complete (either at the end of the plan or when the user asks), write the full README:

Summarize each step and sub-question in logical document sections
Be crisp and concise — avoid unnecessary verbosity
Embed saved viz images from
```
/assets/visualizations/
```
where appropriate
Generate markdown tables from
```
.csv
```
files in
```
/assets/visualizations/
```
Include a small reference hyperlink to the associated query file in each section
Add a TL;DR section near the top (after Problem Statement, before Cohorts Definition)
Add a Key Takeaways section at the end

分析完成后（按计划完成或用户要求结束时），完善完整的README文件：

将每个步骤和子问题总结为逻辑清晰的文档章节
内容简洁明了——避免不必要的冗长
在合适的位置嵌入
```
/assets/visualizations/
```
目录中保存的可视化图像
从
```
/assets/visualizations/
```
目录的
```
.csv
```
文件生成Markdown表格
在每个章节中添加指向相关查询文件的小型参考超链接
在顶部（问题陈述之后，群组定义之前）添加「TL;DR（摘要）」部分
在末尾添加「关键结论」部分

Examples

示例

bash

analyses/
└── 2024-01-user-retention/
    ├── README.md
    └── assets/
        ├── queries/
        │   ├── cohort_retention_by_week.sql
        │   └── retention_by_plan_type.sql
        └── visualizations/
            ├── retention_curve.py
            ├── retention_curve.png
            └── plan_type_summary.csv

bash

analyses/
└── 2024-01-user-retention/
    ├── README.md
    └── assets/
        ├── queries/
        │   ├── cohort_retention_by_week.sql
        │   └── retention_by_plan_type.sql
        └── visualizations/
            ├── retention_curve.py
            ├── retention_curve.png
            └── plan_type_summary.csv