checking-freshness

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Data Freshness Check

数据新鲜度检查

Quickly determine if data is fresh enough to use.
快速判断数据是否足够新鲜可用。

Freshness Check Process

新鲜度检查流程

For each table to check:
对于每个需要检查的表格:

1. Find the Timestamp Column

1. 查找时间戳列

Look for columns that indicate when data was loaded or updated:
  • _loaded_at
    ,
    _updated_at
    ,
    _created_at
    (common ETL patterns)
  • updated_at
    ,
    created_at
    ,
    modified_at
    (application timestamps)
  • load_date
    ,
    etl_timestamp
    ,
    ingestion_time
  • date
    ,
    event_date
    ,
    transaction_date
    (business dates)
Query INFORMATION_SCHEMA.COLUMNS if you need to see column names.
寻找指示数据加载或更新时间的列:
  • _loaded_at
    _updated_at
    _created_at
    (常见ETL模式)
  • updated_at
    created_at
    modified_at
    (应用程序时间戳)
  • load_date
    etl_timestamp
    ingestion_time
  • date
    event_date
    transaction_date
    (业务日期)
如果需要查看列名,可查询INFORMATION_SCHEMA.COLUMNS。

2. Query Last Update Time

2. 查询最后更新时间

sql
SELECT
    MAX(<timestamp_column>) as last_update,
    CURRENT_TIMESTAMP() as current_time,
    TIMESTAMPDIFF('hour', MAX(<timestamp_column>), CURRENT_TIMESTAMP()) as hours_ago,
    TIMESTAMPDIFF('minute', MAX(<timestamp_column>), CURRENT_TIMESTAMP()) as minutes_ago
FROM <table>
sql
SELECT
    MAX(<timestamp_column>) as last_update,
    CURRENT_TIMESTAMP() as current_time,
    TIMESTAMPDIFF('hour', MAX(<timestamp_column>), CURRENT_TIMESTAMP()) as hours_ago,
    TIMESTAMPDIFF('minute', MAX(<timestamp_column>), CURRENT_TIMESTAMP()) as minutes_ago
FROM <table>

3. Check Row Counts by Time

3. 按时间检查行数

For tables with regular updates, check recent activity:
sql
SELECT
    DATE_TRUNC('day', <timestamp_column>) as day,
    COUNT(*) as row_count
FROM <table>
WHERE <timestamp_column> >= DATEADD('day', -7, CURRENT_DATE())
GROUP BY 1
ORDER BY 1 DESC
对于定期更新的表格,检查近期活动:
sql
SELECT
    DATE_TRUNC('day', <timestamp_column>) as day,
    COUNT(*) as row_count
FROM <table>
WHERE <timestamp_column> >= DATEADD('day', -7, CURRENT_DATE())
GROUP BY 1
ORDER BY 1 DESC

Freshness Status

新鲜度状态

Report status using this scale:
StatusAgeMeaning
Fresh< 4 hoursData is current
Stale4-24 hoursMay be outdated, check if expected
Very Stale> 24 hoursLikely a problem unless batch job
UnknownNo timestampCan't determine freshness
使用以下标准报告状态:
状态时长含义
新鲜< 4小时数据为当前最新
过时4-24小时可能已过期,需确认是否符合预期
严重过时>24小时除非是批处理任务,否则可能存在问题
未知无时间戳无法判断新鲜度

If Data is Stale

若数据过时

Check Airflow for the source pipeline:
  1. Find the DAG: Which DAG populates this table? Use
    list_dags
    and look for matching names.
  2. Check DAG status:
    • Is the DAG paused? Use
      get_dag_details
    • Did the last run fail? Use
      get_dag_stats
    • Is a run currently in progress?
  3. Diagnose if needed: If the DAG failed, use the debugging-dags skill to investigate.
检查Airflow中的源管道:
  1. 查找DAG:哪个DAG负责填充此表格?使用
    list_dags
    查找匹配名称。
  2. 检查DAG状态
    • DAG是否暂停?使用
      get_dag_details
    • 上次运行是否失败?使用
      get_dag_stats
    • 是否有运行中的任务?
  3. 必要时诊断:如果DAG运行失败,使用debugging-dags技能进行调查。

Output Format

输出格式

Provide a clear, scannable report:
FRESHNESS REPORT
================

TABLE: database.schema.table_name
Last Update: 2024-01-15 14:32:00 UTC
Age: 2 hours 15 minutes
Status: Fresh

TABLE: database.schema.other_table
Last Update: 2024-01-14 03:00:00 UTC
Age: 37 hours
Status: Very Stale
Source DAG: daily_etl_pipeline (FAILED)
Action: Investigate with **debugging-dags** skill
提供清晰、易读的报告:
FRESHNESS REPORT
================

TABLE: database.schema.table_name
Last Update: 2024-01-15 14:32:00 UTC
Age: 2 hours 15 minutes
Status: Fresh

TABLE: database.schema.other_table
Last Update: 2024-01-14 03:00:00 UTC
Age: 37 hours
Status: Very Stale
Source DAG: daily_etl_pipeline (FAILED)
Action: Investigate with **debugging-dags** skill

Quick Checks

快速检查

If user just wants a yes/no answer:
  • "Is X fresh?" -> Check and respond with status + one line
  • "Can I use X for my 9am meeting?" -> Check and give clear yes/no with context
如果用户仅需要是/否回答:
  • "X是否新鲜?" -> 检查后返回状态+一行说明
  • "我可以在9点会议上使用X吗?" -> 检查后给出明确的是/否及上下文