archiving-databases

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Database Archival System

数据库归档系统

This skill provides automated assistance for database archival system tasks.

该Skill为数据库归档系统任务提供自动化协助。

Prerequisites

前提条件

Before using this skill, ensure:

Database credentials with SELECT and DELETE permissions on source tables
Access to destination storage (archive table or cloud storage credentials)
Network connectivity to cloud storage services if using S3/Azure/GCS
Backup of database before first archival run
Understanding of data retention requirements and compliance policies
Monitoring tools configured to track archival job success

使用该Skill前，请确保：

拥有源表SELECT和DELETE权限的数据库凭据
可访问目标存储（归档表或云存储凭据）
若使用S3/Azure Blob/GCS，需与云存储服务保持网络连通
首次归档运行前已备份数据库
了解数据保留要求与合规政策
已配置监控工具以追踪归档任务的成功状态

Instructions

操作步骤

Step 1: Define Archival Criteria

步骤1：定义归档标准

Identify tables containing historical data for archival
Define age threshold for archival (e.g., records older than 1 year)
Determine additional criteria (status flags, record size, access frequency)
Calculate expected data volume to be archived
Document business requirements and compliance policies

识别包含需归档历史数据的表
定义归档的年龄阈值（例如：超过1年的记录）
确定额外标准（状态标记、记录大小、访问频率）
计算预计要归档的数据量
记录业务需求与合规政策

Step 2: Choose Archival Destination

步骤2：选择归档目标

Evaluate options: archive table in same database, separate archive database, or cold storage
For cloud storage: select S3, Azure Blob, or GCS based on infrastructure
Configure destination storage with appropriate security and access controls
Set up compression settings for storage efficiency
Define data format for archived records (CSV, Parquet, JSON)

评估选项：同一数据库中的归档表、独立归档数据库或冷存储
若使用云存储：根据基础设施选择S3、Azure Blob或GCS
为目标存储配置适当的安全与访问控制
设置压缩配置以提升存储效率
定义归档记录的数据格式（CSV、Parquet、JSON）

Step 3: Create Archive Schema

步骤3：创建归档架构

Design archive table schema matching source table structure
Add metadata columns (archived_at, source_table, archive_reason)
Create indexes on commonly queried archive columns
For cloud storage: define bucket structure and naming conventions
Test archive schema with sample data

设计与源表结构匹配的归档表架构
添加元数据列（archived_at、source_table、archive_reason）
为常查询的归档列创建索引
若使用云存储：定义存储桶结构与命名规范
使用示例数据测试归档架构

Step 4: Implement Archival Logic

步骤4：实现归档逻辑

Write SQL query to identify records meeting archival criteria
Create extraction script to export records from source tables
Implement transformation logic if archive format differs from source
Build verification queries to confirm data integrity after archival
Add transaction handling to ensure atomicity (delete only if archive succeeds)

编写SQL查询以识别符合归档标准的记录
创建提取脚本以从源表导出记录
若归档格式与源格式不同，实现转换逻辑
构建验证查询以确认归档后的数据完整性
添加事务处理以确保原子性（仅在归档成功后才删除源数据）

Step 5: Execute Archival Process

步骤5：执行归档流程

Run archival in staging environment first with subset of data
Verify archived data integrity and completeness
Execute archival in production during low-traffic window
Monitor database performance during archival operation
Generate archival report with record counts and storage savings

先在staging环境中使用部分数据运行归档
验证归档数据的完整性与完整性
在低流量时段于生产环境执行归档
归档操作期间监控数据库性能
生成包含记录数量与存储节省量的归档报告

Step 6: Automate Retention Policy

步骤6：自动化保留策略

Schedule periodic archival jobs (weekly, monthly)
Configure automated monitoring and alerting for job failures
Implement cleanup of successfully archived records from source tables
Set up expiration policies on archived data per compliance requirements
Document archival schedule and retention periods

定期调度归档任务（每周、每月）
配置任务失败的自动监控与告警
实现从源表清理已成功归档的记录
根据合规要求设置归档数据的过期策略
记录归档调度与保留周期

Output

输出

This skill produces:

Archival Scripts: SQL and shell scripts to extract, transform, and load data to archive destination

Archive Tables/Files: Structured storage containing historical records with metadata and timestamps

Verification Reports: Row counts, data checksums, and integrity checks confirming successful archival

Storage Metrics: Database size reduction, archive storage utilization, and cost savings estimates

Archival Logs: Detailed logs of each archival run with timestamps, record counts, and any errors

该Skill可生成：

归档脚本：用于提取、转换并加载数据至归档目标的SQL与Shell脚本

归档表/文件：包含历史记录及元数据与时间戳的结构化存储

验证报告：包含记录行数、数据校验和及完整性检查的报告，以确认归档成功

存储指标：数据库大小缩减量、归档存储利用率及成本节约估算

归档日志：每次归档运行的详细日志，包含时间戳、记录数量及任何错误信息

Error Handling

错误处理

Insufficient Storage Space:

Check available disk space on archive destination before execution
Implement storage monitoring and alerting
Use compression to reduce archive size
Clean up old archives per retention policy before new archival

Data Integrity Issues:

Run checksums on source data before and after archival
Implement row count verification between source and archive
Keep source data until archive verification completes
Rollback archive transaction if verification fails

Permission Denied Errors:

Verify database user has SELECT on source tables and INSERT on archive tables
Confirm cloud storage credentials have write permissions
Check network security groups allow connections to cloud storage
Document required permissions for archival automation

Timeout During Large Archival:

Split archival into smaller batches by date ranges
Run archival incrementally over multiple days
Increase database timeout settings for archival sessions
Schedule archival during maintenance windows with extended timeouts

存储空间不足：

执行前检查归档目标的可用磁盘空间
实现存储监控与告警
使用压缩以减小归档大小
新归档前根据保留策略清理旧归档

数据完整性问题：

归档前后对源数据运行校验和
实现源数据与归档数据的行数验证
在归档验证完成前保留源数据
若验证失败，回滚归档事务

权限拒绝错误：

验证数据库用户拥有源表的SELECT权限与归档表的INSERT权限
确认云存储凭据拥有写入权限
检查网络安全组是否允许与云存储建立连接
记录归档自动化所需的权限

大规模归档超时：

按日期范围将归档拆分为更小的批次
分多天增量运行归档
增加归档会话的数据库超时设置
在维护窗口调度归档并延长超时时间

Resources

资源

Archival Configuration Templates:

PostgreSQL archival:

{baseDir}/templates/postgresql-archive-config.yaml

MySQL archival:

{baseDir}/templates/mysql-archive-config.yaml

S3 cold storage:

{baseDir}/templates/s3-archive-config.yaml

Azure Blob storage:

{baseDir}/templates/azure-archive-config.yaml

Retention Policy Definitions:

{baseDir}/policies/retention-policies.yaml

Archival Scripts Library:

{baseDir}/scripts/archival/

Extract to CSV script
Extract to Parquet script
S3 upload with compression
Archive verification queries

Monitoring Dashboards:

{baseDir}/monitoring/archival-dashboard.json

Cost Analysis Tools:

{baseDir}/tools/storage-cost-calculator.py

归档配置模板：

PostgreSQL归档：

{baseDir}/templates/postgresql-archive-config.yaml

MySQL归档：

{baseDir}/templates/mysql-archive-config.yaml

S3冷存储：

{baseDir}/templates/s3-archive-config.yaml

Azure Blob存储：

{baseDir}/templates/azure-archive-config.yaml

保留策略定义：

{baseDir}/policies/retention-policies.yaml

归档脚本库：

{baseDir}/scripts/archival/

提取至CSV的脚本
提取至Parquet的脚本
带压缩的S3上传脚本
归档验证查询

监控仪表板：

{baseDir}/monitoring/archival-dashboard.json

成本分析工具：

{baseDir}/tools/storage-cost-calculator.py

Overview

概述

This skill provides automated assistance for the described functionality.

该Skill为上述功能提供自动化协助。

Examples

示例

Example usage patterns will be demonstrated in context.

将在上下文中演示示例使用模式。