tracing-downstream-lineage

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Downstream Lineage: Impacts

下游血缘关系:影响分析

Answer the critical question: "What breaks if I change this?"
Use this BEFORE making changes to understand the blast radius.
回答关键问题:“如果我修改这个对象,会导致什么失效?”
在进行修改前使用此方法,以了解影响范围。

Impact Analysis

影响分析

Step 1: Identify Direct Consumers

步骤1:识别直接消费者

Find everything that reads from this target:
For Tables:
  1. Search DAG source code: Look for DAGs that SELECT from this table
    • Use
      list_dags
      to get all DAGs
    • Use
      get_dag_source
      to search for table references
    • Look for:
      FROM target_table
      ,
      JOIN target_table
  2. Check for dependent views:
    sql
    -- Snowflake
    SELECT * FROM information_schema.view_table_usage
    WHERE table_name = '<target_table>'
    
    -- Or check SHOW VIEWS and search definitions
  3. Look for BI tool connections:
    • Dashboards often query tables directly
    • Check for common BI patterns in table naming (rpt_, dashboard_)
For DAGs:
  1. Check what the DAG produces: Use
    get_dag_source
    to find output tables
  2. Then trace those tables' consumers (recursive)
找出所有读取目标对象的内容:
针对表:
  1. 搜索DAG源代码:查找从该表进行SELECT操作的DAG
    • 使用
      list_dags
      获取所有DAG
    • 使用
      get_dag_source
      搜索表引用
    • 查找:
      FROM target_table
      ,
      JOIN target_table
  2. 检查依赖视图
    sql
    -- Snowflake
    SELECT * FROM information_schema.view_table_usage
    WHERE table_name = '<target_table>'
    
    -- 或查看SHOW VIEWS并搜索定义
  3. 查找BI工具连接
    • 仪表板通常直接查询表
    • 检查表命名中的常见BI模式(rpt_, dashboard_)
针对DAG:
  1. 检查DAG生成的内容:使用
    get_dag_source
    查找输出表
  2. 然后追踪这些表的消费者(递归方式)

Step 2: Build Dependency Tree

步骤2:构建依赖树

Map the full downstream impact:
SOURCE: fct.orders
    |
    +-- TABLE: agg.daily_sales --> Dashboard: Executive KPIs
    |       |
    |       +-- TABLE: rpt.monthly_summary --> Email: Monthly Report
    |
    +-- TABLE: ml.order_features --> Model: Demand Forecasting
    |
    +-- DIRECT: Looker Dashboard "Sales Overview"
绘制完整的下游影响图:
SOURCE: fct.orders
    |
    +-- TABLE: agg.daily_sales --> Dashboard: Executive KPIs
    |       |
    |       +-- TABLE: rpt.monthly_summary --> Email: Monthly Report
    |
    +-- TABLE: ml.order_features --> Model: Demand Forecasting
    |
    +-- DIRECT: Looker Dashboard "Sales Overview"

Step 3: Categorize by Criticality

步骤3:按关键程度分类

Critical (breaks production):
  • Production dashboards
  • Customer-facing applications
  • Automated reports to executives
  • ML models in production
  • Regulatory/compliance reports
High (causes significant issues):
  • Internal operational dashboards
  • Analyst workflows
  • Data science experiments
  • Downstream ETL jobs
Medium (inconvenient):
  • Ad-hoc analysis tables
  • Development/staging copies
  • Historical archives
Low (minimal impact):
  • Deprecated tables
  • Unused datasets
  • Test data
关键(会中断生产):
  • 生产环境仪表板
  • 面向客户的应用程序
  • 发送给高管的自动化报告
  • 生产环境中的ML模型
  • 合规/监管报告
高(会导致严重问题):
  • 内部运营仪表板
  • 分析师工作流
  • 数据科学实验
  • 下游ETL任务
中等(造成不便):
  • 临时分析表
  • 开发/ staging副本
  • 历史归档
低(影响极小):
  • 已弃用的表
  • 未使用的数据集
  • 测试数据

Step 4: Assess Change Risk

步骤4:评估变更风险

For the proposed change, evaluate:
Schema Changes (adding/removing/renaming columns):
  • Which downstream queries will break?
  • Are there SELECT * patterns that will pick up new columns?
  • Which transformations reference the changing columns?
Data Changes (values, volumes, timing):
  • Will downstream aggregations still be valid?
  • Are there NULL handling assumptions that will break?
  • Will timing changes affect SLAs?
Deletion/Deprecation:
  • Full dependency tree must be migrated first
  • Communication needed for all stakeholders
针对拟议的变更,评估以下内容:
架构变更(添加/删除/重命名列):
  • 哪些下游查询会失效?
  • 是否存在会获取新列的SELECT *模式?
  • 哪些转换操作引用了要变更的列?
数据变更(值、数量、时间):
  • 下游聚合是否仍然有效?
  • 是否存在会失效的NULL处理假设?
  • 时间变更是否会影响SLA?
删除/弃用
  • 必须先迁移完整的依赖树
  • 需要与所有相关方沟通

Step 5: Find Stakeholders

步骤5:找出相关负责人

Identify who owns downstream assets:
  1. DAG owners: Check
    owners
    field in DAG definitions
  2. Dashboard owners: Usually in BI tool metadata
  3. Team ownership: Look for team naming patterns or documentation
确定下游资产的所有者:
  1. DAG所有者:检查DAG定义中的
    owners
    字段
  2. 仪表板所有者:通常在BI工具元数据中
  3. 团队归属:查找团队命名模式或相关文档

Output: Impact Report

输出:影响报告

Summary

摘要

"Changing
fct.orders
will impact X tables, Y DAGs, and Z dashboards"
“修改
fct.orders
将影响X个表、Y个DAG和Z个仪表板”

Impact Diagram

影响图

                    +--> [agg.daily_sales] --> [Executive Dashboard]
                    |
[fct.orders] -------+--> [rpt.order_details] --> [Ops Team Email]
                    |
                    +--> [ml.features] --> [Demand Model]
                    +--> [agg.daily_sales] --> [Executive Dashboard]
                    |
[fct.orders] -------+--> [rpt.order_details] --> [Ops Team Email]
                    |
                    +--> [ml.features] --> [Demand Model]

Detailed Impacts

详细影响

DownstreamTypeCriticalityOwnerNotes
agg.daily_salesTableCriticaldata-engUpdated hourly
Executive DashboardDashboardCriticalanalyticsCEO views daily
ml.order_featuresTableHighml-teamRetraining weekly
下游对象类型关键程度所有者备注
agg.daily_sales关键data-eng每小时更新
Executive Dashboard仪表板关键analyticsCEO每日查看
ml.order_featuresml-team每周重新训练

Risk Assessment

风险评估

Change TypeRisk LevelMitigation
Add columnLowNo action needed
Rename columnHighUpdate 3 DAGs, 2 dashboards
Delete columnCriticalFull migration plan required
Change data typeMediumTest downstream aggregations
变更类型风险等级缓解措施
添加列无需操作
重命名列更新3个DAG、2个仪表板
删除列关键需要完整的迁移计划
更改数据类型中等测试下游聚合

Recommended Actions

建议操作

Before making changes:
  1. Notify owners: @data-eng, @analytics, @ml-team
  2. Update downstream DAG:
    transform_daily_sales
  3. Test dashboard: Executive KPIs
  4. Schedule change during low-impact window
在进行修改前:
  1. 通知所有者:@data-eng、@analytics、@ml-team
  2. 更新下游DAG:
    transform_daily_sales
  3. 测试仪表板:Executive KPIs
  4. 在低影响时段安排变更

Related Skills

相关技能

  • Trace where data comes from: tracing-upstream-lineage skill
  • Check downstream freshness: checking-freshness skill
  • Debug any broken DAGs: debugging-dags skill
  • Add manual lineage annotations: annotating-task-lineage skill
  • Build custom lineage extractors: creating-openlineage-extractors skill
  • 追踪数据来源:tracing-upstream-lineage技能
  • 检查下游新鲜度:checking-freshness技能
  • 调试失效的DAG:debugging-dags技能
  • 添加手动血缘关系注释:annotating-task-lineage技能
  • 构建自定义血缘关系提取器:creating-openlineage-extractors技能