checking-freshness
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseData Freshness Check
数据新鲜度检查
Quickly determine if data is fresh enough to use.
快速判断数据是否足够新鲜可用。
Freshness Check Process
新鲜度检查流程
For each table to check:
对于每个需要检查的表格:
1. Find the Timestamp Column
1. 查找时间戳列
Look for columns that indicate when data was loaded or updated:
- ,
_loaded_at,_updated_at(common ETL patterns)_created_at - ,
updated_at,created_at(application timestamps)modified_at - ,
load_date,etl_timestampingestion_time - ,
date,event_date(business dates)transaction_date
Query INFORMATION_SCHEMA.COLUMNS if you need to see column names.
寻找指示数据加载或更新时间的列:
- 、
_loaded_at、_updated_at(常见ETL模式)_created_at - 、
updated_at、created_at(应用程序时间戳)modified_at - 、
load_date、etl_timestampingestion_time - 、
date、event_date(业务日期)transaction_date
如果需要查看列名,可查询INFORMATION_SCHEMA.COLUMNS。
2. Query Last Update Time
2. 查询最后更新时间
sql
SELECT
MAX(<timestamp_column>) as last_update,
CURRENT_TIMESTAMP() as current_time,
TIMESTAMPDIFF('hour', MAX(<timestamp_column>), CURRENT_TIMESTAMP()) as hours_ago,
TIMESTAMPDIFF('minute', MAX(<timestamp_column>), CURRENT_TIMESTAMP()) as minutes_ago
FROM <table>sql
SELECT
MAX(<timestamp_column>) as last_update,
CURRENT_TIMESTAMP() as current_time,
TIMESTAMPDIFF('hour', MAX(<timestamp_column>), CURRENT_TIMESTAMP()) as hours_ago,
TIMESTAMPDIFF('minute', MAX(<timestamp_column>), CURRENT_TIMESTAMP()) as minutes_ago
FROM <table>3. Check Row Counts by Time
3. 按时间检查行数
For tables with regular updates, check recent activity:
sql
SELECT
DATE_TRUNC('day', <timestamp_column>) as day,
COUNT(*) as row_count
FROM <table>
WHERE <timestamp_column> >= DATEADD('day', -7, CURRENT_DATE())
GROUP BY 1
ORDER BY 1 DESC对于定期更新的表格,检查近期活动:
sql
SELECT
DATE_TRUNC('day', <timestamp_column>) as day,
COUNT(*) as row_count
FROM <table>
WHERE <timestamp_column> >= DATEADD('day', -7, CURRENT_DATE())
GROUP BY 1
ORDER BY 1 DESCFreshness Status
新鲜度状态
Report status using this scale:
| Status | Age | Meaning |
|---|---|---|
| Fresh | < 4 hours | Data is current |
| Stale | 4-24 hours | May be outdated, check if expected |
| Very Stale | > 24 hours | Likely a problem unless batch job |
| Unknown | No timestamp | Can't determine freshness |
使用以下标准报告状态:
| 状态 | 时长 | 含义 |
|---|---|---|
| 新鲜 | < 4小时 | 数据为当前最新 |
| 过时 | 4-24小时 | 可能已过期,需确认是否符合预期 |
| 严重过时 | >24小时 | 除非是批处理任务,否则可能存在问题 |
| 未知 | 无时间戳 | 无法判断新鲜度 |
If Data is Stale
若数据过时
Check Airflow for the source pipeline:
-
Find the DAG: Which DAG populates this table? Useand look for matching names.
list_dags -
Check DAG status:
- Is the DAG paused? Use
get_dag_details - Did the last run fail? Use
get_dag_stats - Is a run currently in progress?
- Is the DAG paused? Use
-
Diagnose if needed: If the DAG failed, use the debugging-dags skill to investigate.
检查Airflow中的源管道:
-
查找DAG:哪个DAG负责填充此表格?使用查找匹配名称。
list_dags -
检查DAG状态:
- DAG是否暂停?使用
get_dag_details - 上次运行是否失败?使用
get_dag_stats - 是否有运行中的任务?
- DAG是否暂停?使用
-
必要时诊断:如果DAG运行失败,使用debugging-dags技能进行调查。
Output Format
输出格式
Provide a clear, scannable report:
FRESHNESS REPORT
================
TABLE: database.schema.table_name
Last Update: 2024-01-15 14:32:00 UTC
Age: 2 hours 15 minutes
Status: Fresh
TABLE: database.schema.other_table
Last Update: 2024-01-14 03:00:00 UTC
Age: 37 hours
Status: Very Stale
Source DAG: daily_etl_pipeline (FAILED)
Action: Investigate with **debugging-dags** skill提供清晰、易读的报告:
FRESHNESS REPORT
================
TABLE: database.schema.table_name
Last Update: 2024-01-15 14:32:00 UTC
Age: 2 hours 15 minutes
Status: Fresh
TABLE: database.schema.other_table
Last Update: 2024-01-14 03:00:00 UTC
Age: 37 hours
Status: Very Stale
Source DAG: daily_etl_pipeline (FAILED)
Action: Investigate with **debugging-dags** skillQuick Checks
快速检查
If user just wants a yes/no answer:
- "Is X fresh?" -> Check and respond with status + one line
- "Can I use X for my 9am meeting?" -> Check and give clear yes/no with context
如果用户仅需要是/否回答:
- "X是否新鲜?" -> 检查后返回状态+一行说明
- "我可以在9点会议上使用X吗?" -> 检查后给出明确的是/否及上下文