data-manager
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseData Manager
数据管理者
Overview
概述
Manage data programs, governance operations, and data reliability. This skill covers data roadmaps,
stakeholder coordination, metadata stewardship, lifecycle management, monitoring, incident response,
capacity planning, and SLA frameworks.
管理数据项目、治理操作和数据可靠性。本技能涵盖数据路线图、利益相关方协调、metadata管理、生命周期管理、监控、事件响应、容量规划和SLA框架。
Features
功能特性
- Data roadmap planning with stakeholder alignment and delivery cadence
- Governance operations: stewardship, access reviews, lifecycle enforcement
- Data ops monitoring with incident response and escalation paths
- Team KPI/SLA scorecards and operational metrics
- Cross-functional coordination across engineers, analysts, scientists, and legal
- 结合利益相关方对齐和交付节奏的数据路线图规划
- 治理操作:metadata管理、访问评审、生命周期执行
- 包含事件响应和升级路径的data ops监控
- 团队KPI/SLA计分卡和运营指标
- 跨工程师、分析师、科学家和法务团队的跨职能协调
Usage
使用方法
- Identify the user's data management need (roadmap, governance, ops, or coordination)
- Follow the corresponding workflow below
- Produce structured outputs: roadmaps, governance policies, incident reports, or KPI dashboards
- 识别用户的数据管理需求(路线图、治理、运维或协调)
- 遵循下方对应的工作流程
- 生成结构化输出:路线图、治理政策、事件报告或KPI仪表盘
Examples
使用示例
-
User: "Create a data team roadmap" Agent: Runs Program Management workflow, produces quarterly roadmap with initiatives, dependencies, and stakeholder sign-offs
-
User: "Set up data governance" Agent: Runs Governance Operations workflow, defines stewardship roles, access review cadence, and lifecycle policies
-
User: "Handle a data incident" Agent: Runs Data Ops workflow, triages severity, executes runbook, produces post-incident report with action items
-
用户:“创建数据团队路线图” Agent:执行项目管理工作流程,生成包含举措、依赖项和利益相关方签字的季度路线图
-
用户:“建立数据治理体系” Agent:执行治理操作工作流程,定义metadata管理角色、访问评审节奏和生命周期政策
-
用户:“处理数据事件” Agent:执行Data Ops工作流程,分级评估严重程度,执行运行手册,生成包含行动项的事后报告
When to Use
适用场景
- Own the data roadmap, stakeholder reviews, and data product delivery cadence
- Run governance operations (stewardship, access reviews, lifecycle enforcement)
- Establish data ops monitoring, incident response, and team KPI/SLA scorecards
- Coordinate engineers, analysts, scientists, and legal on cross-functional data work
- 负责数据路线图、利益相关方评审和数据产品交付节奏
- 执行治理操作(metadata管理、访问评审、生命周期执行)
- 建立data ops监控、事件响应和团队KPI/SLA计分卡
- 协调工程师、分析师、科学家和法务团队开展跨职能数据工作
When NOT to Use
不适用场景
- Deep platform architecture ADRs or ontology design → use or
data-architectontology-engineer - Hands-on warehouse SQL optimization or SCD modeling → use
data-warehouse-engineer - ML experimentation, model evaluation, or MLOps deployment → use
data-scientist - Cloud VPC, Kubernetes, or IaC provisioning → use
infrastructure-engineer - Company-wide multi-team technical programs (non-data) → use
technical-program-manager
- 深入的平台架构ADR或本体设计 → 使用或
data-architectontology-engineer - 实操数据仓库SQL优化或SCD建模 → 使用
data-warehouse-engineer - ML实验、模型评估或MLOps部署 → 使用
data-scientist - 云VPC、Kubernetes或IaC配置 → 使用
infrastructure-engineer - 公司级跨团队技术项目(非数据类) → 使用
technical-program-manager
Core Workflows
核心工作流程
1. Data Program & Product Management
1. 数据项目与产品管理
Responsibilities:
- Own the data roadmap aligned to business outcomes
- Translate stakeholder needs into data product requirements
- Coordinate cross-functional data work (engineers, analysts, scientists, legal)
Operational cadence:
| Meeting | Frequency | Attendees | Purpose |
|---|---|---|---|
| Data Leadership Sync | Weekly | Data leads, PMs | Blockers, priorities, resource allocation |
| Stakeholder Reviews | Bi-weekly | Business sponsors | Roadmap alignment, value demonstration |
| Sprint Planning | Bi-weekly | Engineering team | Commitments, estimation, dependencies |
| Retrospectives | Monthly | Full data team | Process improvements, team health |
Data product delivery checklist:
- Define the business question and success criteria
- Identify data sources and validate availability/quality
- Design the data model (see skill)
data-architect - Build with observability (logging, lineage, tests)
- Validate with stakeholders before GA
- Document and train consumers
- Monitor usage and iterate
职责:
- 负责与业务成果对齐的数据路线图
- 将利益相关方需求转化为数据产品需求
- 协调跨职能数据工作(工程师、分析师、科学家、法务)
运营节奏:
| 会议 | 频率 | 参会人员 | 目的 |
|---|---|---|---|
| 数据领导层同步会 | 每周 | 数据负责人、PM | 障碍排查、优先级确定、资源分配 |
| 利益相关方评审会 | 每两周 | 业务发起人 | 路线图对齐、价值展示 |
| 迭代规划会 | 每两周 | 工程团队 | 任务承诺、工作量估算、依赖项梳理 |
| 回顾会 | 每月 | 整个数据团队 | 流程改进、团队健康度评估 |
数据产品交付检查清单:
- 定义业务问题和成功标准
- 识别数据源并验证可用性/质量
- 设计数据模型(参见技能)
data-architect - 构建可观测的系统(日志、数据血缘、测试)
- GA前与利益相关方验证
- 文档编写并培训用户
- 监控使用情况并迭代优化
2. Governance Operations Execution
2. 治理操作执行
Core activities:
| Activity | Frequency | Owner | Output |
|---|---|---|---|
| Metadata stewardship | Continuous | Data stewards | Enriched catalog, documented lineage |
| Access reviews | Quarterly | Security + owners | Approved access matrix |
| Data lifecycle enforcement | Monthly | Operations | Archived/deleted per retention policy |
| Quality SLA review | Monthly | Governance lead | Quality scorecard, remediation plan |
| Policy compliance audit | Quarterly | Audit/compliance | Gap report, remediation tickets |
Escalation paths:
- Data incident → On-call engineer → Team lead → Director
- Quality breach → Data steward → Governance committee → CDO
- Access violation → Security team → Legal (if PII exposure)
核心活动:
| 活动 | 频率 | 负责人 | 输出 |
|---|---|---|---|
| Metadata管理 | 持续进行 | 数据管理者 | 丰富的数据目录、文档化的数据血缘 |
| 访问评审 | 每季度 | 安全团队+数据所有者 | 批准的访问矩阵 |
| 数据生命周期执行 | 每月 | 运营团队 | 按保留策略归档/删除数据 |
| 质量SLA评审 | 每月 | 治理负责人 | 质量计分卡、整改计划 |
| 政策合规审计 | 每季度 | 审计/合规团队 | 差距报告、整改工单 |
升级路径:
- 数据事件 → 值班工程师 → 团队负责人 → 总监
- 质量违规 → 数据管理者 → 治理委员会 → CDO
- 访问违规 → 安全团队 → 法务(若涉及PII泄露)
3. Data Operations & Reliability
3. 数据运维与可靠性
Monitoring stack:
| Layer | Metrics | Alert Threshold |
|---|---|---|
| Infrastructure | CPU, memory, disk, network | >80% for 5 min |
| Database | Connections, lock waits, replication lag | Replication lag >30s |
| Pipelines | Success rate, duration, row counts | <95% success rate |
| Data quality | Null rate, freshness, duplicates | SLA breach |
| Cost | Daily spend vs budget | >110% of daily budget |
Incident response phases:
- Detect: Alert fires or user reports issue
- Triage: Assess severity (P1-P4), assign owner
- Mitigate: Stop bleeding (rollback, redirect traffic)
- Resolve: Root cause fix deployed
- Review: Post-mortem within 48 hours for P1-P2
监控栈:
| 层级 | 指标 | 告警阈值 |
|---|---|---|
| 基础设施 | CPU、内存、磁盘、网络 | 连续5分钟超过80% |
| 数据库 | 连接数、锁等待、复制延迟 | 复制延迟超过30秒 |
| 数据管道 | 成功率、时长、行数 | 成功率低于95% |
| 数据质量 | 空值率、新鲜度、重复率 | 违反SLA |
| 成本 | 每日支出vs预算 | 超过每日预算的110% |
事件响应阶段:
- 检测:触发告警或用户上报问题
- 分级:评估严重程度(P1-P4),分配负责人
- 缓解:止损(回滚、流量重定向)
- 解决:部署根本原因修复方案
- 复盘:P1-P2事件需在48小时内完成事后分析
4. Metrics & SLA Framework
4. 指标与SLA框架
Data team KPIs:
| Category | Metric | Target | Measurement |
|---|---|---|---|
| Reliability | Pipeline success rate | >99% | Airflow/Dagster logs |
| Quality | Data quality score | >95% | dbt tests + Great Expectations |
| Freshness | Data latency (source → warehouse) | <4 hours | Pipeline metadata |
| Cost | Cost per TB processed | Trend down | Cloud billing |
| Productivity | Time from request to production | <2 weeks | Jira/Asana cycle time |
| Adoption | Active data consumers | Grow 10% QoQ | BI tool usage logs |
SLA tiers:
| Tier | Description | RTO | RPO | Example |
|---|---|---|---|---|
| Tier 1 | Business-critical dashboards | 1 hour | 0 | Revenue reporting |
| Tier 2 | Operational analytics | 4 hours | 4 hours | Marketing attribution |
| Tier 3 | Research/exploratory | 24 hours | 24 hours | Ad-hoc analysis |
数据团队KPIs:
| 类别 | 指标 | 目标 | 测量方式 |
|---|---|---|---|
| 可靠性 | 数据管道成功率 | >99% | Airflow/Dagster日志 |
| 质量 | 数据质量得分 | >95% | dbt测试 + Great Expectations |
| 新鲜度 | 数据延迟(源→仓库) | <4小时 | 数据管道metadata |
| 成本 | 每TB处理成本 | 呈下降趋势 | 云账单 |
| 生产力 | 从需求到生产的时间 | <2周 | Jira/Asana周期时间 |
| 使用率 | 活跃数据用户 | 每季度增长10% | BI工具使用日志 |
SLA层级:
| 层级 | 描述 | RTO | RPO | 示例 |
|---|---|---|---|---|
| Tier 1 | 业务关键仪表盘 | 1小时 | 0 | 收入报表 |
| Tier 2 | 运营分析 | 4小时 | 4小时 | 营销归因 |
| Tier 3 | 研究/探索性分析 | 24小时 | 24小时 | 临时分析 |