archiving-databases

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Database Archival System

数据库归档系统

This skill provides automated assistance for database archival system tasks.
该Skill为数据库归档系统任务提供自动化协助。

Prerequisites

前提条件

Before using this skill, ensure:
  • Database credentials with SELECT and DELETE permissions on source tables
  • Access to destination storage (archive table or cloud storage credentials)
  • Network connectivity to cloud storage services if using S3/Azure/GCS
  • Backup of database before first archival run
  • Understanding of data retention requirements and compliance policies
  • Monitoring tools configured to track archival job success
使用该Skill前,请确保:
  • 拥有源表SELECT和DELETE权限的数据库凭据
  • 可访问目标存储(归档表或云存储凭据)
  • 若使用S3/Azure Blob/GCS,需与云存储服务保持网络连通
  • 首次归档运行前已备份数据库
  • 了解数据保留要求与合规政策
  • 已配置监控工具以追踪归档任务的成功状态

Instructions

操作步骤

Step 1: Define Archival Criteria

步骤1:定义归档标准

  1. Identify tables containing historical data for archival
  2. Define age threshold for archival (e.g., records older than 1 year)
  3. Determine additional criteria (status flags, record size, access frequency)
  4. Calculate expected data volume to be archived
  5. Document business requirements and compliance policies
  1. 识别包含需归档历史数据的表
  2. 定义归档的年龄阈值(例如:超过1年的记录)
  3. 确定额外标准(状态标记、记录大小、访问频率)
  4. 计算预计要归档的数据量
  5. 记录业务需求与合规政策

Step 2: Choose Archival Destination

步骤2:选择归档目标

  1. Evaluate options: archive table in same database, separate archive database, or cold storage
  2. For cloud storage: select S3, Azure Blob, or GCS based on infrastructure
  3. Configure destination storage with appropriate security and access controls
  4. Set up compression settings for storage efficiency
  5. Define data format for archived records (CSV, Parquet, JSON)
  1. 评估选项:同一数据库中的归档表、独立归档数据库或冷存储
  2. 若使用云存储:根据基础设施选择S3、Azure Blob或GCS
  3. 为目标存储配置适当的安全与访问控制
  4. 设置压缩配置以提升存储效率
  5. 定义归档记录的数据格式(CSV、Parquet、JSON)

Step 3: Create Archive Schema

步骤3:创建归档架构

  1. Design archive table schema matching source table structure
  2. Add metadata columns (archived_at, source_table, archive_reason)
  3. Create indexes on commonly queried archive columns
  4. For cloud storage: define bucket structure and naming conventions
  5. Test archive schema with sample data
  1. 设计与源表结构匹配的归档表架构
  2. 添加元数据列(archived_at、source_table、archive_reason)
  3. 为常查询的归档列创建索引
  4. 若使用云存储:定义存储桶结构与命名规范
  5. 使用示例数据测试归档架构

Step 4: Implement Archival Logic

步骤4:实现归档逻辑

  1. Write SQL query to identify records meeting archival criteria
  2. Create extraction script to export records from source tables
  3. Implement transformation logic if archive format differs from source
  4. Build verification queries to confirm data integrity after archival
  5. Add transaction handling to ensure atomicity (delete only if archive succeeds)
  1. 编写SQL查询以识别符合归档标准的记录
  2. 创建提取脚本以从源表导出记录
  3. 若归档格式与源格式不同,实现转换逻辑
  4. 构建验证查询以确认归档后的数据完整性
  5. 添加事务处理以确保原子性(仅在归档成功后才删除源数据)

Step 5: Execute Archival Process

步骤5:执行归档流程

  1. Run archival in staging environment first with subset of data
  2. Verify archived data integrity and completeness
  3. Execute archival in production during low-traffic window
  4. Monitor database performance during archival operation
  5. Generate archival report with record counts and storage savings
  1. 先在staging环境中使用部分数据运行归档
  2. 验证归档数据的完整性与完整性
  3. 在低流量时段于生产环境执行归档
  4. 归档操作期间监控数据库性能
  5. 生成包含记录数量与存储节省量的归档报告

Step 6: Automate Retention Policy

步骤6:自动化保留策略

  1. Schedule periodic archival jobs (weekly, monthly)
  2. Configure automated monitoring and alerting for job failures
  3. Implement cleanup of successfully archived records from source tables
  4. Set up expiration policies on archived data per compliance requirements
  5. Document archival schedule and retention periods
  1. 定期调度归档任务(每周、每月)
  2. 配置任务失败的自动监控与告警
  3. 实现从源表清理已成功归档的记录
  4. 根据合规要求设置归档数据的过期策略
  5. 记录归档调度与保留周期

Output

输出

This skill produces:
Archival Scripts: SQL and shell scripts to extract, transform, and load data to archive destination
Archive Tables/Files: Structured storage containing historical records with metadata and timestamps
Verification Reports: Row counts, data checksums, and integrity checks confirming successful archival
Storage Metrics: Database size reduction, archive storage utilization, and cost savings estimates
Archival Logs: Detailed logs of each archival run with timestamps, record counts, and any errors
该Skill可生成:
归档脚本:用于提取、转换并加载数据至归档目标的SQL与Shell脚本
归档表/文件:包含历史记录及元数据与时间戳的结构化存储
验证报告:包含记录行数、数据校验和及完整性检查的报告,以确认归档成功
存储指标:数据库大小缩减量、归档存储利用率及成本节约估算
归档日志:每次归档运行的详细日志,包含时间戳、记录数量及任何错误信息

Error Handling

错误处理

Insufficient Storage Space:
  • Check available disk space on archive destination before execution
  • Implement storage monitoring and alerting
  • Use compression to reduce archive size
  • Clean up old archives per retention policy before new archival
Data Integrity Issues:
  • Run checksums on source data before and after archival
  • Implement row count verification between source and archive
  • Keep source data until archive verification completes
  • Rollback archive transaction if verification fails
Permission Denied Errors:
  • Verify database user has SELECT on source tables and INSERT on archive tables
  • Confirm cloud storage credentials have write permissions
  • Check network security groups allow connections to cloud storage
  • Document required permissions for archival automation
Timeout During Large Archival:
  • Split archival into smaller batches by date ranges
  • Run archival incrementally over multiple days
  • Increase database timeout settings for archival sessions
  • Schedule archival during maintenance windows with extended timeouts
存储空间不足
  • 执行前检查归档目标的可用磁盘空间
  • 实现存储监控与告警
  • 使用压缩以减小归档大小
  • 新归档前根据保留策略清理旧归档
数据完整性问题
  • 归档前后对源数据运行校验和
  • 实现源数据与归档数据的行数验证
  • 在归档验证完成前保留源数据
  • 若验证失败,回滚归档事务
权限拒绝错误
  • 验证数据库用户拥有源表的SELECT权限与归档表的INSERT权限
  • 确认云存储凭据拥有写入权限
  • 检查网络安全组是否允许与云存储建立连接
  • 记录归档自动化所需的权限
大规模归档超时
  • 按日期范围将归档拆分为更小的批次
  • 分多天增量运行归档
  • 增加归档会话的数据库超时设置
  • 在维护窗口调度归档并延长超时时间

Resources

资源

Archival Configuration Templates:
  • PostgreSQL archival:
    {baseDir}/templates/postgresql-archive-config.yaml
  • MySQL archival:
    {baseDir}/templates/mysql-archive-config.yaml
  • S3 cold storage:
    {baseDir}/templates/s3-archive-config.yaml
  • Azure Blob storage:
    {baseDir}/templates/azure-archive-config.yaml
Retention Policy Definitions:
{baseDir}/policies/retention-policies.yaml
Archival Scripts Library:
{baseDir}/scripts/archival/
  • Extract to CSV script
  • Extract to Parquet script
  • S3 upload with compression
  • Archive verification queries
Monitoring Dashboards:
{baseDir}/monitoring/archival-dashboard.json
Cost Analysis Tools:
{baseDir}/tools/storage-cost-calculator.py
归档配置模板
  • PostgreSQL归档:
    {baseDir}/templates/postgresql-archive-config.yaml
  • MySQL归档:
    {baseDir}/templates/mysql-archive-config.yaml
  • S3冷存储:
    {baseDir}/templates/s3-archive-config.yaml
  • Azure Blob存储:
    {baseDir}/templates/azure-archive-config.yaml
保留策略定义
{baseDir}/policies/retention-policies.yaml
归档脚本库
{baseDir}/scripts/archival/
  • 提取至CSV的脚本
  • 提取至Parquet的脚本
  • 带压缩的S3上传脚本
  • 归档验证查询
监控仪表板
{baseDir}/monitoring/archival-dashboard.json
成本分析工具
{baseDir}/tools/storage-cost-calculator.py

Overview

概述

This skill provides automated assistance for the described functionality.
该Skill为上述功能提供自动化协助。

Examples

示例

Example usage patterns will be demonstrated in context.
将在上下文中演示示例使用模式。