tracing-downstream-lineage
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDownstream Lineage: Impacts
下游血缘关系:影响分析
Answer the critical question: "What breaks if I change this?"
Use this BEFORE making changes to understand the blast radius.
回答关键问题:“如果我修改这个对象,会导致什么失效?”
在进行修改前使用此方法,以了解影响范围。
Impact Analysis
影响分析
Step 1: Identify Direct Consumers
步骤1:识别直接消费者
Find everything that reads from this target:
For Tables:
-
Search DAG source code: Look for DAGs that SELECT from this table
- Use to get all DAGs
list_dags - Use to search for table references
get_dag_source - Look for: ,
FROM target_tableJOIN target_table
- Use
-
Check for dependent views:sql
-- Snowflake SELECT * FROM information_schema.view_table_usage WHERE table_name = '<target_table>' -- Or check SHOW VIEWS and search definitions -
Look for BI tool connections:
- Dashboards often query tables directly
- Check for common BI patterns in table naming (rpt_, dashboard_)
For DAGs:
- Check what the DAG produces: Use to find output tables
get_dag_source - Then trace those tables' consumers (recursive)
找出所有读取目标对象的内容:
针对表:
-
搜索DAG源代码:查找从该表进行SELECT操作的DAG
- 使用获取所有DAG
list_dags - 使用搜索表引用
get_dag_source - 查找:,
FROM target_tableJOIN target_table
- 使用
-
检查依赖视图:sql
-- Snowflake SELECT * FROM information_schema.view_table_usage WHERE table_name = '<target_table>' -- 或查看SHOW VIEWS并搜索定义 -
查找BI工具连接:
- 仪表板通常直接查询表
- 检查表命名中的常见BI模式(rpt_, dashboard_)
针对DAG:
- 检查DAG生成的内容:使用查找输出表
get_dag_source - 然后追踪这些表的消费者(递归方式)
Step 2: Build Dependency Tree
步骤2:构建依赖树
Map the full downstream impact:
SOURCE: fct.orders
|
+-- TABLE: agg.daily_sales --> Dashboard: Executive KPIs
| |
| +-- TABLE: rpt.monthly_summary --> Email: Monthly Report
|
+-- TABLE: ml.order_features --> Model: Demand Forecasting
|
+-- DIRECT: Looker Dashboard "Sales Overview"绘制完整的下游影响图:
SOURCE: fct.orders
|
+-- TABLE: agg.daily_sales --> Dashboard: Executive KPIs
| |
| +-- TABLE: rpt.monthly_summary --> Email: Monthly Report
|
+-- TABLE: ml.order_features --> Model: Demand Forecasting
|
+-- DIRECT: Looker Dashboard "Sales Overview"Step 3: Categorize by Criticality
步骤3:按关键程度分类
Critical (breaks production):
- Production dashboards
- Customer-facing applications
- Automated reports to executives
- ML models in production
- Regulatory/compliance reports
High (causes significant issues):
- Internal operational dashboards
- Analyst workflows
- Data science experiments
- Downstream ETL jobs
Medium (inconvenient):
- Ad-hoc analysis tables
- Development/staging copies
- Historical archives
Low (minimal impact):
- Deprecated tables
- Unused datasets
- Test data
关键(会中断生产):
- 生产环境仪表板
- 面向客户的应用程序
- 发送给高管的自动化报告
- 生产环境中的ML模型
- 合规/监管报告
高(会导致严重问题):
- 内部运营仪表板
- 分析师工作流
- 数据科学实验
- 下游ETL任务
中等(造成不便):
- 临时分析表
- 开发/ staging副本
- 历史归档
低(影响极小):
- 已弃用的表
- 未使用的数据集
- 测试数据
Step 4: Assess Change Risk
步骤4:评估变更风险
For the proposed change, evaluate:
Schema Changes (adding/removing/renaming columns):
- Which downstream queries will break?
- Are there SELECT * patterns that will pick up new columns?
- Which transformations reference the changing columns?
Data Changes (values, volumes, timing):
- Will downstream aggregations still be valid?
- Are there NULL handling assumptions that will break?
- Will timing changes affect SLAs?
Deletion/Deprecation:
- Full dependency tree must be migrated first
- Communication needed for all stakeholders
针对拟议的变更,评估以下内容:
架构变更(添加/删除/重命名列):
- 哪些下游查询会失效?
- 是否存在会获取新列的SELECT *模式?
- 哪些转换操作引用了要变更的列?
数据变更(值、数量、时间):
- 下游聚合是否仍然有效?
- 是否存在会失效的NULL处理假设?
- 时间变更是否会影响SLA?
删除/弃用:
- 必须先迁移完整的依赖树
- 需要与所有相关方沟通
Step 5: Find Stakeholders
步骤5:找出相关负责人
Identify who owns downstream assets:
- DAG owners: Check field in DAG definitions
owners - Dashboard owners: Usually in BI tool metadata
- Team ownership: Look for team naming patterns or documentation
确定下游资产的所有者:
- DAG所有者:检查DAG定义中的字段
owners - 仪表板所有者:通常在BI工具元数据中
- 团队归属:查找团队命名模式或相关文档
Output: Impact Report
输出:影响报告
Summary
摘要
"Changing will impact X tables, Y DAGs, and Z dashboards"
fct.orders“修改将影响X个表、Y个DAG和Z个仪表板”
fct.ordersImpact Diagram
影响图
+--> [agg.daily_sales] --> [Executive Dashboard]
|
[fct.orders] -------+--> [rpt.order_details] --> [Ops Team Email]
|
+--> [ml.features] --> [Demand Model] +--> [agg.daily_sales] --> [Executive Dashboard]
|
[fct.orders] -------+--> [rpt.order_details] --> [Ops Team Email]
|
+--> [ml.features] --> [Demand Model]Detailed Impacts
详细影响
| Downstream | Type | Criticality | Owner | Notes |
|---|---|---|---|---|
| agg.daily_sales | Table | Critical | data-eng | Updated hourly |
| Executive Dashboard | Dashboard | Critical | analytics | CEO views daily |
| ml.order_features | Table | High | ml-team | Retraining weekly |
| 下游对象 | 类型 | 关键程度 | 所有者 | 备注 |
|---|---|---|---|---|
| agg.daily_sales | 表 | 关键 | data-eng | 每小时更新 |
| Executive Dashboard | 仪表板 | 关键 | analytics | CEO每日查看 |
| ml.order_features | 表 | 高 | ml-team | 每周重新训练 |
Risk Assessment
风险评估
| Change Type | Risk Level | Mitigation |
|---|---|---|
| Add column | Low | No action needed |
| Rename column | High | Update 3 DAGs, 2 dashboards |
| Delete column | Critical | Full migration plan required |
| Change data type | Medium | Test downstream aggregations |
| 变更类型 | 风险等级 | 缓解措施 |
|---|---|---|
| 添加列 | 低 | 无需操作 |
| 重命名列 | 高 | 更新3个DAG、2个仪表板 |
| 删除列 | 关键 | 需要完整的迁移计划 |
| 更改数据类型 | 中等 | 测试下游聚合 |
Recommended Actions
建议操作
Before making changes:
- Notify owners: @data-eng, @analytics, @ml-team
- Update downstream DAG:
transform_daily_sales - Test dashboard: Executive KPIs
- Schedule change during low-impact window
在进行修改前:
- 通知所有者:@data-eng、@analytics、@ml-team
- 更新下游DAG:
transform_daily_sales - 测试仪表板:Executive KPIs
- 在低影响时段安排变更
Related Skills
相关技能
- Trace where data comes from: tracing-upstream-lineage skill
- Check downstream freshness: checking-freshness skill
- Debug any broken DAGs: debugging-dags skill
- Add manual lineage annotations: annotating-task-lineage skill
- Build custom lineage extractors: creating-openlineage-extractors skill
- 追踪数据来源:tracing-upstream-lineage技能
- 检查下游新鲜度:checking-freshness技能
- 调试失效的DAG:debugging-dags技能
- 添加手动血缘关系注释:annotating-task-lineage技能
- 构建自定义血缘关系提取器:creating-openlineage-extractors技能