disaster-recovery

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Disaster Recovery

灾难恢复

Implement disaster recovery strategies and procedures.
实施灾难恢复策略与流程。

DR Metrics

灾难恢复指标

yaml
recovery_metrics:
  RTO: Recovery Time Objective
    - Maximum acceptable downtime
    - How long to restore service
    
  RPO: Recovery Point Objective
    - Maximum acceptable data loss
    - How much data can be lost
yaml
recovery_metrics:
  RTO: Recovery Time Objective
    - 可接受的最长停机时间
    - 恢复服务所需时长
    
  RPO: Recovery Point Objective
    - 可接受的最大数据丢失量
    - 允许丢失的数据量

DR Strategies

灾难恢复策略

StrategyRTORPOCost
Backup & RestoreHoursHours$
Pilot LightMinutes-HoursMinutes$$
Warm StandbyMinutesSeconds$$$
Multi-Site ActiveNear-zeroNear-zero$$$$
策略RTORPO成本
备份与恢复数小时数小时
试点模式数分钟至数小时数分钟
温备模式数分钟数秒中高
多站点活跃模式近乎零近乎零

AWS Multi-Region

AWS多区域部署

bash
undefined
bash
undefined

Cross-region RDS replica

Cross-region RDS replica

aws rds create-db-instance-read-replica
--db-instance-identifier dr-replica
--source-db-instance-identifier prod-db
--source-region us-east-1
--region us-west-2
aws rds create-db-instance-read-replica
--db-instance-identifier dr-replica
--source-db-instance-identifier prod-db
--source-region us-east-1
--region us-west-2

S3 cross-region replication

S3 cross-region replication

aws s3api put-bucket-replication
--bucket source-bucket
--replication-configuration file://replication.json
undefined
aws s3api put-bucket-replication
--bucket source-bucket
--replication-configuration file://replication.json
undefined

DR Testing

灾难恢复测试

yaml
dr_test_schedule:
  tabletop: Quarterly
  component_failover: Monthly
  full_failover: Annually
  
test_checklist:
  - [ ] Verify backup integrity
  - [ ] Test failover procedures
  - [ ] Validate data consistency
  - [ ] Measure actual RTO/RPO
  - [ ] Document lessons learned
yaml
dr_test_schedule:
  tabletop: 每季度
  component_failover: 每月
  full_failover: 每年
  
test_checklist:
  - [ ] 验证备份完整性
  - [ ] 测试故障转移流程
  - [ ] 验证数据一致性
  - [ ] 测量实际RTO/RPO
  - [ ] 记录经验教训

Best Practices

最佳实践

  • Regular DR testing
  • Automate failover where possible
  • Document all procedures
  • Update runbooks after tests
  • 定期开展灾难恢复测试
  • 尽可能自动化故障转移
  • 记录所有流程
  • 测试后更新运行手册