migration-planner

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Migration Planner

迁移规划器

Execute safe, zero-downtime migrations with validation and rollback plans.
通过验证机制和回滚计划实现安全、零停机的迁移操作。

Migration Patterns

迁移模式

1. Feature Flag Migration (Safest)

1. Feature Flag迁移(最安全)

Phase 1: Deploy new code (disabled)
Phase 2: Enable for 1% traffic
Phase 3: Ramp to 10%, 50%, 100%
Phase 4: Remove old code
Phase 1: Deploy new code (disabled)
Phase 2: Enable for 1% traffic
Phase 3: Ramp to 10%, 50%, 100%
Phase 4: Remove old code

2. Dual Write Migration

2. 双写迁移

Phase 1: Write to both old and new
Phase 2: Backfill old → new
Phase 3: Read from new (write both)
Phase 4: Stop writing to old
Phase 5: Decommission old
Phase 1: Write to both old and new
Phase 2: Backfill old → new
Phase 3: Read from new (write both)
Phase 4: Stop writing to old
Phase 5: Decommission old

3. Blue-Green Deployment

3. 蓝绿部署

Blue (current) → Green (new)
Switch traffic: Blue → Green
Rollback available: Green → Blue
Blue (current) → Green (new)
Switch traffic: Blue → Green
Rollback available: Green → Blue

Complete Migration Plan Template

完整迁移计划模板

markdown
undefined
markdown
undefined

Migration Plan: MySQL → PostgreSQL

迁移计划:MySQL → PostgreSQL

Overview

概述

What: Migrate user database from MySQL to PostgreSQL Why: Better JSON support, improved performance When: Q1 2024 Owner: Database Team Risk Level: HIGH
内容:将用户数据库从MySQL迁移至PostgreSQL 原因:更好的JSON支持、性能提升 时间:2024年第一季度 负责人:数据库团队 风险等级:高

Current State

当前状态

  • MySQL 8.0
  • 500GB data
  • 100K users
  • 1000 writes/min
  • 10,000 reads/min
  • MySQL 8.0
  • 500GB数据
  • 10万用户
  • 每分钟1000次写入
  • 每分钟10000次读取

Target State

目标状态

  • PostgreSQL 15
  • Same data model
  • No downtime
  • Data validation 100% match
  • PostgreSQL 15
  • 相同的数据模型
  • 零停机
  • 数据验证100%匹配

Phases

迁移阶段

Phase 1: Dual Write (Week 1-2)

阶段1:双写(第1-2周)

Goal: Write to both databases
Steps:
  1. Deploy PostgreSQL cluster
  2. Create schema in PostgreSQL
  3. Deploy dual-write code
  4. Enable dual writes (MySQL primary, PostgreSQL secondary)
Code:
typescript
async function createUser(data: CreateUserDto) {
  // Write to MySQL (primary)
  const mysqlUser = await mysql.users.create(data);

  // Write to PostgreSQL (secondary, fire and forget)
  postgres.users.create(data).catch((err) => {
    logger.error("PostgreSQL write failed", err);
  });

  return mysqlUser; // Still trust MySQL
}

**Validation:**

- Monitor PostgreSQL write success rate
- Compare row counts daily
- Alert if drift >0.1%

**Rollback:** Disable PostgreSQL writes
目标:将数据写入两个数据库
步骤
  1. 部署PostgreSQL集群
  2. 在PostgreSQL中创建数据库schema
  3. 部署双写代码
  4. 启用双写(MySQL为主库,PostgreSQL为从库)
代码
typescript
async function createUser(data: CreateUserDto) {
  // Write to MySQL (primary)
  const mysqlUser = await mysql.users.create(data);

  // Write to PostgreSQL (secondary, fire and forget)
  postgres.users.create(data).catch((err) => {
    logger.error("PostgreSQL write failed", err);
  });

  return mysqlUser; // Still trust MySQL
}

**验证措施**:

- 监控PostgreSQL写入成功率
- 每日对比行数
- 若数据偏差>0.1%则触发告警

**回滚方案**:禁用PostgreSQL写入

Phase 2: Backfill (Week 3-4)

阶段2:数据回填(第3-4周)

Goal: Copy historical data
Steps:
  1. Take MySQL snapshot
  2. Run backfill script in batches
  3. Validate data integrity
  4. Resume from failure automatically
Script:
python
def backfill():
    last_id = get_last_migrated_id()
    batch_size = 1000

    while True:
        users = mysql.query(
            "SELECT * FROM users WHERE id > %s LIMIT %s",
            [last_id, batch_size]
        )

        if not users:
            break

        postgres.bulk_insert(users)
        last_id = users[-1]['id']
        save_checkpoint(last_id)

        time.sleep(0.1)  # Rate limit
Validation:
  • Row count match
  • Random sample comparison (1000 rows)
  • Checksum comparison
Rollback: Delete PostgreSQL data
目标:复制历史数据
步骤
  1. 生成MySQL快照
  2. 分批运行回填脚本
  3. 验证数据完整性
  4. 支持从失败点自动恢复
脚本
python
def backfill():
    last_id = get_last_migrated_id()
    batch_size = 1000

    while True:
        users = mysql.query(
            "SELECT * FROM users WHERE id > %s LIMIT %s",
            [last_id, batch_size]
        )

        if not users:
            break

        postgres.bulk_insert(users)
        last_id = users[-1]['id']
        save_checkpoint(last_id)

        time.sleep(0.1)  # Rate limit
验证措施
  • 行数匹配
  • 随机样本对比(1000行)
  • 校验和对比
回滚方案:删除PostgreSQL中的数据

Phase 3: Dual Read (Week 5)

阶段3:双读(第5周)

Goal: Validate PostgreSQL reads
Steps:
  1. Deploy shadow read code
  2. Read from both (MySQL primary)
  3. Compare results
  4. Log mismatches
Code:
typescript
async function getUser(id: string) {
  const mysqlUser = await mysql.users.findById(id);

  // Shadow read from PostgreSQL
  postgres.users.findById(id).then((pgUser) => {
    if (!deepEqual(mysqlUser, pgUser)) {
      logger.warn("Data mismatch", { id, mysqlUser, pgUser });
      metrics.increment("migration.mismatch");
    }
  });

  return mysqlUser; // Still trust MySQL
}
Validation:
  • Mismatch rate <0.01%
  • PostgreSQL query performance acceptable
Rollback: Remove shadow reads
目标:验证PostgreSQL读取能力
步骤
  1. 部署影子读取代码
  2. 从两个数据库读取数据(MySQL为主库)
  3. 对比结果
  4. 记录数据不匹配情况
代码
typescript
async function getUser(id: string) {
  const mysqlUser = await mysql.users.findById(id);

  // Shadow read from PostgreSQL
  postgres.users.findById(id).then((pgUser) => {
    if (!deepEqual(mysqlUser, pgUser)) {
      logger.warn("Data mismatch", { id, mysqlUser, pgUser });
      metrics.increment("migration.mismatch");
    }
  });

  return mysqlUser; // Still trust MySQL
}
验证措施
  • 数据不匹配率<0.01%
  • PostgreSQL查询性能符合要求
回滚方案:移除影子读取代码

Phase 4: Flip Read Traffic (Week 6)

阶段4:切换读取流量(第6周)

Goal: Read from PostgreSQL
Steps:
  1. Feature flag: read from PostgreSQL (1% traffic)
  2. Monitor errors, latency
  3. Ramp: 1% → 10% → 50% → 100%
  4. Still writing to both
Code:
typescript
async function getUser(id: string) {
  if (featureFlags.readFromPostgres) {
    return postgres.users.findById(id);
  }
  return mysql.users.findById(id);
}
Validation:
  • Error rate unchanged
  • Latency p95 <500ms
  • No user complaints
Rollback: Flip feature flag off
目标:从PostgreSQL读取数据
步骤
  1. 通过功能标志:从PostgreSQL读取(1%流量)
  2. 监控错误率与延迟
  3. 逐步扩容:1% → 10% → 50% → 100%
  4. 仍保持双写状态
代码
typescript
async function getUser(id: string) {
  if (featureFlags.readFromPostgres) {
    return postgres.users.findById(id);
  }
  return mysql.users.findById(id);
}
验证措施
  • 错误率无变化
  • 95分位延迟<500ms
  • 无用户投诉
回滚方案:关闭功能标志

Phase 5: Stop MySQL Writes (Week 7)

阶段5:停止MySQL写入(第7周)

Goal: PostgreSQL is now primary
Steps:
  1. Stop writing to MySQL
  2. Keep MySQL running (read-only)
  3. Monitor for issues
Code:
typescript
async function createUser(data: CreateUserDto) {
  return postgres.users.create(data);
  // No longer writing to MySQL
}
Validation:
  • All operations working
  • MySQL not receiving writes
Rollback: Re-enable MySQL writes
目标:PostgreSQL成为主库
步骤
  1. 停止向MySQL写入数据
  2. 保持MySQL运行(只读状态)
  3. 监控运行状况
代码
typescript
async function createUser(data: CreateUserDto) {
  return postgres.users.create(data);
  // No longer writing to MySQL
}
验证措施
  • 所有操作正常运行
  • MySQL不再接收写入请求
回滚方案:重新启用MySQL写入

Phase 6: Decommission (Week 8)

阶段6:下线旧系统(第8周)

Goal: Remove MySQL
Steps:
  1. Archive MySQL data
  2. Shutdown MySQL cluster
  3. Remove MySQL client code
Rollback: Not available (point of no return)
目标:移除MySQL
步骤
  1. 归档MySQL数据
  2. 关闭MySQL集群
  3. 移除MySQL客户端代码
回滚方案:不可用(已到不可逆转点)

Validation Strategy

验证策略

Data Integrity Checks

数据完整性检查

python
def validate_migration():
    # Row counts
    mysql_count = mysql.query("SELECT COUNT(*) FROM users")[0]
    pg_count = postgres.query("SELECT COUNT(*) FROM users")[0]
    assert mysql_count == pg_count

    # Random sampling
    sample = mysql.query("SELECT * FROM users ORDER BY RAND() LIMIT 1000")
    for row in sample:
        pg_row = postgres.query("SELECT * FROM users WHERE id = %s", [row['id']])
        assert row == pg_row

    # Checksums
    mysql_checksum = mysql.query("SELECT MD5(GROUP_CONCAT(id, email)) FROM users")
    pg_checksum = postgres.query("SELECT MD5(STRING_AGG(id::text || email, '')) FROM users")
    assert mysql_checksum == pg_checksum
python
def validate_migration():
    # Row counts
    mysql_count = mysql.query("SELECT COUNT(*) FROM users")[0]
    pg_count = postgres.query("SELECT COUNT(*) FROM users")[0]
    assert mysql_count == pg_count

    # Random sampling
    sample = mysql.query("SELECT * FROM users ORDER BY RAND() LIMIT 1000")
    for row in sample:
        pg_row = postgres.query("SELECT * FROM users WHERE id = %s", [row['id']])
        assert row == pg_row

    # Checksums
    mysql_checksum = mysql.query("SELECT MD5(GROUP_CONCAT(id, email)) FROM users")
    pg_checksum = postgres.query("SELECT MD5(STRING_AGG(id::text || email, '')) FROM users")
    assert mysql_checksum == pg_checksum

Rollback Plans

回滚计划

Phase 1-3 Rollback (Easy)

阶段1-3回滚(简单)

  • Disable PostgreSQL writes
  • No impact to users
  • Data in MySQL still valid
  • 禁用PostgreSQL写入
  • 对用户无影响
  • MySQL中的数据仍有效

Phase 4 Rollback (Medium)

阶段4回滚(中等)

  • Flip feature flag
  • Route reads back to MySQL
  • Minor user impact (seconds)
  • 关闭功能标志
  • 将读取流量切回MySQL
  • 对用户影响极小(数秒)

Phase 5+ Rollback (Hard)

阶段5及以后回滚(复杂)

  • Must re-enable MySQL writes
  • Potential data loss (writes since phase 5)
  • Requires dual-write resumption
  • 必须重新启用MySQL写入
  • 可能存在数据丢失(阶段5之后的写入数据)
  • 需要恢复双写机制

Risk Mitigation

风险缓解

Risk 1: Data Loss

风险1:数据丢失

Mitigation:
  • Dual writes until validated
  • Transaction logs captured
  • Continuous backups
缓解措施
  • 在验证完成前保持双写
  • 捕获事务日志
  • 持续备份

Risk 2: Performance Degradation

风险2:性能下降

Mitigation:
  • Load test PostgreSQL
  • Query optimization
  • Connection pooling
缓解措施
  • 对PostgreSQL进行压力测试
  • 查询优化
  • 连接池配置

Risk 3: Schema Differences

风险3:Schema差异

Mitigation:
  • Schema validation script
  • Test migrations in staging
  • Document data type differences
缓解措施
  • Schema验证脚本
  • 在预发布环境测试迁移
  • 记录数据类型差异

Communication Plan

沟通计划

Stakeholder Updates

利益相关方更新

markdown
**Week 0:** Migration announced
**Week 2:** Phase 1 complete (dual writes)
**Week 4:** Backfill complete
**Week 6:** Traffic shifted to PostgreSQL
**Week 8:** Migration complete
markdown
**Week 0:** Migration announced
**Week 2:** Phase 1 complete (dual writes)
**Week 4:** Backfill complete
**Week 6:** Traffic shifted to PostgreSQL
**Week 8:** Migration complete

Status Dashboard

状态仪表盘

  • Current phase
  • Data sync status (%)
  • Validation results
  • Error rates
  • 当前迁移阶段
  • 数据同步进度(%)
  • 验证结果
  • 错误率

Testing Plan

测试计划

Pre-Migration Testing

迁移前测试

  1. Test in development
  2. Full migration in staging
  3. Load test PostgreSQL
  4. Validate rollback procedures
  1. 在开发环境测试
  2. 在预发布环境完成完整迁移
  3. 对PostgreSQL进行压力测试
  4. 验证回滚流程

During Migration

迁移中测试

  1. Continuous monitoring
  2. Automated validation
  3. Manual spot checks
  4. User acceptance testing
  1. 持续监控
  2. 自动化验证
  3. 手动抽查
  4. 用户验收测试

Best Practices

最佳实践

  1. Small batches: Migrate incrementally
  2. Dual write: Keep both systems synchronized
  3. Feature flags: Control rollout
  4. Validate continuously: Don't trust, verify
  5. Rollback ready: Plan for worst case
  6. Monitor closely: Track metrics
  7. Communicate often: Keep stakeholders informed
  1. 小批量迁移:分批次完成迁移
  2. 双写机制:保持两个系统数据同步
  3. 功能标志:控制迁移进度
  4. 持续验证:不盲目信任,主动验证
  5. 做好回滚准备:提前规划最坏情况
  6. 密切监控:跟踪关键指标
  7. 及时沟通:保持利益相关方知情

Output Checklist

输出检查清单

  • Migration phases defined (5-7 phases)
  • Dual write implementation
  • Backfill script ready
  • Validation strategy
  • Feature flags configured
  • Rollback plans per phase
  • Risk mitigation strategies
  • Communication plan
  • Monitoring dashboard
  • Testing checklist
  • 定义迁移阶段(5-7个阶段)
  • 实现双写机制
  • 准备好回填脚本
  • 制定验证策略
  • 配置功能标志
  • 为每个阶段制定回滚计划
  • 风险缓解策略
  • 沟通计划
  • 监控仪表盘
  • 测试检查清单