data-analytics-engineering
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseData Analytics Engineering
数据分析工程
Scope
适用范围
- Define metrics, grains, and dimensional models.
- Build transformation layers and semantic models.
- Implement data quality tests and observability.
- Document datasets, lineage, and ownership.
- Align analytics outputs with BI and product needs.
- 定义指标、粒度和维度模型。
- 构建转换层和语义模型。
- 实施数据质量测试与可观测性。
- 记录数据集、数据血缘和所有权。
- 使分析输出与BI及产品需求对齐。
Ask For Inputs
所需输入
- Business metrics and decision use cases.
- Source systems, data freshness, and latency needs.
- Existing warehouse, tooling, and orchestration.
- Expected data volumes and change cadence.
- Governance requirements and access controls.
- 业务指标与决策用例。
- 源系统、数据新鲜度和延迟需求。
- 现有数据仓库、工具编排情况。
- 预期数据量与变更频率。
- 治理要求与访问控制规则。
Workflow
工作流程
- Define metric dictionary and grains.
- Design staging, intermediate, and mart layers.
- Model dimensions and facts with clear keys.
- Build semantic layer and metric definitions.
- Add tests for freshness, nulls, ranges, and duplicates.
- Document lineage, owners, and SLAs.
- Plan rollout, backfills, and validation checks.
- 定义指标字典与粒度。
- 设计 staging、中间层与 mart 层。
- 使用清晰键值对建模维度与事实表。
- 构建语义层与指标定义。
- 添加数据新鲜度、空值、范围及重复值测试。
- 记录数据血缘、所有者与SLA。
- 规划上线、数据回填与验证检查。
Outputs
输出成果
- Metric dictionary and semantic model.
- Data model with schema and grain definitions.
- Transformation plan and dbt or SQLMesh structure.
- Data quality test suite and alerting plan.
- Documentation and ownership map.
- 指标字典与语义模型。
- 包含 schema 和粒度定义的数据模型。
- 转换计划及dbt或SQLMesh结构。
- 数据质量测试套件与告警计划。
- 文档与所有权映射表。
Quality Checks
质量检查
- Keep metric definitions stable and versioned.
- Treat metrics as APIs: document changes, deprecate safely, and backfill deliberately.
- Define data contracts for core tables (schema, freshness, keys) to control downstream breakage.
- Avoid mixed grains in a single model.
- Ensure tests cover critical joins and aggregates.
- Validate against source of truth and historical baselines.
- 保持指标定义稳定并进行版本控制。
- 将指标视为API:记录变更、安全弃用并谨慎回填数据。
- 为核心表定义数据契约(schema、新鲜度、键值)以避免下游故障。
- 避免在单个模型中混合不同粒度。
- 确保测试覆盖关键关联与聚合操作。
- 对照可信数据源与历史基线进行验证。
Templates
模板
- for metric definitions and owners.
assets/metric-dictionary.md - for entities, measures, and dimensions.
assets/semantic-layer-spec.md - for test coverage planning.
assets/data-quality-test-plan.md
- :用于指标定义与所有者记录。
assets/metric-dictionary.md - :用于实体、度量与维度定义。
assets/semantic-layer-spec.md - :用于测试覆盖规划。
assets/data-quality-test-plan.md
Resources
参考资源
- for modeling guidance and data quality patterns.
references/modeling-patterns.md - for dbt vs SQLMesh vs Coalesce decision matrix.
references/tool-comparison-2026.md - for curated vendor docs and trend-tracking sources (use as a WebSearch seed list).
data/sources.json
- :建模指南与数据质量模式。
references/modeling-patterns.md - :dbt、SQLMesh与Coalesce的决策矩阵。
references/tool-comparison-2026.md - :精选供应商文档与趋势跟踪源(用作WebSearch种子列表)。
data/sources.json
Related Skills
相关技能
- Use data-lake-platform for platform architecture.
- Use data-sql-optimization for query tuning.
- Use ai-ml-data-science for modeling and experiments.
- 使用data-lake-platform进行平台架构设计。
- 使用data-sql-optimization进行查询调优。
- 使用ai-ml-data-science进行建模与实验。
Trend Awareness Protocol
趋势感知协议
IMPORTANT: When users ask recommendation questions about analytics engineering, data modeling, or BI, you MUST use WebSearch to check current trends before answering. If WebSearch is unavailable, use + web browsing and state what you verified vs assumed.
data/sources.json重要提示:当用户询问关于分析工程、数据建模或BI的推荐问题时,你必须先通过WebSearch检查当前趋势再作答。若WebSearch不可用,使用+网页浏览,并说明已验证内容与假设内容。
data/sources.jsonTrigger Conditions
触发条件
- "What's the best tool for [analytics engineering/data modeling/BI]?"
- "What should I use for [transformation/semantic layer/metrics]?"
- "What's the latest in analytics engineering?"
- "Current best practices for [dbt/metrics layers/data quality]?"
- "Is [tool/approach] still relevant in 2026?"
- "[dbt] vs [SQLMesh] vs [other]?"
- "Best BI tool for [use case]?"
- "SQLMesh acquisition" or "Fivetran transformation"
- "Agentic analytics" or "AI data workflows"
- "Metric debt" or "metric governance"
- "分析工程/数据建模/BI的最佳工具是什么?"
- "转换/语义层/指标应该用什么工具?"
- "分析工程的最新动态是什么?"
- "dbt/指标层/数据质量的当前最佳实践?"
- "[工具/方法]在2026年是否仍适用?"
- "dbt vs SQLMesh vs [其他工具]?"
- "适用于[用例]的最佳BI工具?"
- "SQLMesh收购"或"Fivetran转换"
- "Agentic analytics"或"AI数据工作流"
- "指标债务"或"指标治理"
Required Searches
必要搜索
- Search:
"analytics engineering best practices 2026" - Search:
"[dbt/SQLMesh/semantic layer] vs alternatives 2026" - Search:
"analytics engineering trends January 2026" - Search:
"[specific tool] new releases 2026" - Search: (for AI-related queries)
"agentic analytics AI data 2026"
- 搜索:
"analytics engineering best practices 2026" - 搜索:
"[dbt/SQLMesh/semantic layer] vs alternatives 2026" - 搜索:
"analytics engineering trends January 2026" - 搜索:
"[特定工具] new releases 2026" - 搜索:(针对AI相关查询)
"agentic analytics AI data 2026"
What to Report
需要汇报的内容
After searching, provide:
- Current landscape: What analytics tools/patterns are popular NOW
- Emerging trends: New tools, patterns, or standards gaining traction
- Deprecated/declining: Tools/approaches losing relevance or support
- Recommendation: Based on fresh data, not just static knowledge
搜索完成后,需提供:
- 当前格局:目前流行的分析工具/模式
- 新兴趋势:正在获得关注的新工具、模式或标准
- 已过时/衰退:正在失去相关性或支持的工具/方法
- 推荐方案:基于最新数据,而非仅静态知识
Example Topics (verify with fresh search)
示例主题(需通过最新搜索验证)
- Transformation tools (dbt, SQLMesh, Coalesce)
- Semantic layers (dbt Semantic Layer, Cube, AtScale, warehouse-native)
- Metrics stores and headless BI
- Data quality tools (dbt tests, Elementary, dbt-expectations/Metaplane)
- BI platforms (Metabase, Superset, Lightdash, Hex)
- Data modeling patterns (dimensional, wide tables, activity schema)
- Analytics engineering workflows and CI/CD
- Agentic AI workflows for analytics
- Data mesh and domain-owned data products
- 转换工具(dbt、SQLMesh、Coalesce)
- 语义层(dbt Semantic Layer、Cube、AtScale、数据仓库原生层)
- 指标存储与无头BI
- 数据质量工具(dbt tests、Elementary、dbt-expectations/Metaplane)
- BI平台(Metabase、Superset、Lightdash、Hex)
- 数据建模模式(维度建模、宽表、活动schema)
- 分析工程工作流与CI/CD
- 面向分析的Agentic AI工作流
- 数据网格与领域级数据产品