archiving-databases
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDatabase Archival System
数据库归档系统
This skill provides automated assistance for database archival system tasks.
该Skill为数据库归档系统任务提供自动化协助。
Prerequisites
前提条件
Before using this skill, ensure:
- Database credentials with SELECT and DELETE permissions on source tables
- Access to destination storage (archive table or cloud storage credentials)
- Network connectivity to cloud storage services if using S3/Azure/GCS
- Backup of database before first archival run
- Understanding of data retention requirements and compliance policies
- Monitoring tools configured to track archival job success
使用该Skill前,请确保:
- 拥有源表SELECT和DELETE权限的数据库凭据
- 可访问目标存储(归档表或云存储凭据)
- 若使用S3/Azure Blob/GCS,需与云存储服务保持网络连通
- 首次归档运行前已备份数据库
- 了解数据保留要求与合规政策
- 已配置监控工具以追踪归档任务的成功状态
Instructions
操作步骤
Step 1: Define Archival Criteria
步骤1:定义归档标准
- Identify tables containing historical data for archival
- Define age threshold for archival (e.g., records older than 1 year)
- Determine additional criteria (status flags, record size, access frequency)
- Calculate expected data volume to be archived
- Document business requirements and compliance policies
- 识别包含需归档历史数据的表
- 定义归档的年龄阈值(例如:超过1年的记录)
- 确定额外标准(状态标记、记录大小、访问频率)
- 计算预计要归档的数据量
- 记录业务需求与合规政策
Step 2: Choose Archival Destination
步骤2:选择归档目标
- Evaluate options: archive table in same database, separate archive database, or cold storage
- For cloud storage: select S3, Azure Blob, or GCS based on infrastructure
- Configure destination storage with appropriate security and access controls
- Set up compression settings for storage efficiency
- Define data format for archived records (CSV, Parquet, JSON)
- 评估选项:同一数据库中的归档表、独立归档数据库或冷存储
- 若使用云存储:根据基础设施选择S3、Azure Blob或GCS
- 为目标存储配置适当的安全与访问控制
- 设置压缩配置以提升存储效率
- 定义归档记录的数据格式(CSV、Parquet、JSON)
Step 3: Create Archive Schema
步骤3:创建归档架构
- Design archive table schema matching source table structure
- Add metadata columns (archived_at, source_table, archive_reason)
- Create indexes on commonly queried archive columns
- For cloud storage: define bucket structure and naming conventions
- Test archive schema with sample data
- 设计与源表结构匹配的归档表架构
- 添加元数据列(archived_at、source_table、archive_reason)
- 为常查询的归档列创建索引
- 若使用云存储:定义存储桶结构与命名规范
- 使用示例数据测试归档架构
Step 4: Implement Archival Logic
步骤4:实现归档逻辑
- Write SQL query to identify records meeting archival criteria
- Create extraction script to export records from source tables
- Implement transformation logic if archive format differs from source
- Build verification queries to confirm data integrity after archival
- Add transaction handling to ensure atomicity (delete only if archive succeeds)
- 编写SQL查询以识别符合归档标准的记录
- 创建提取脚本以从源表导出记录
- 若归档格式与源格式不同,实现转换逻辑
- 构建验证查询以确认归档后的数据完整性
- 添加事务处理以确保原子性(仅在归档成功后才删除源数据)
Step 5: Execute Archival Process
步骤5:执行归档流程
- Run archival in staging environment first with subset of data
- Verify archived data integrity and completeness
- Execute archival in production during low-traffic window
- Monitor database performance during archival operation
- Generate archival report with record counts and storage savings
- 先在staging环境中使用部分数据运行归档
- 验证归档数据的完整性与完整性
- 在低流量时段于生产环境执行归档
- 归档操作期间监控数据库性能
- 生成包含记录数量与存储节省量的归档报告
Step 6: Automate Retention Policy
步骤6:自动化保留策略
- Schedule periodic archival jobs (weekly, monthly)
- Configure automated monitoring and alerting for job failures
- Implement cleanup of successfully archived records from source tables
- Set up expiration policies on archived data per compliance requirements
- Document archival schedule and retention periods
- 定期调度归档任务(每周、每月)
- 配置任务失败的自动监控与告警
- 实现从源表清理已成功归档的记录
- 根据合规要求设置归档数据的过期策略
- 记录归档调度与保留周期
Output
输出
This skill produces:
Archival Scripts: SQL and shell scripts to extract, transform, and load data to archive destination
Archive Tables/Files: Structured storage containing historical records with metadata and timestamps
Verification Reports: Row counts, data checksums, and integrity checks confirming successful archival
Storage Metrics: Database size reduction, archive storage utilization, and cost savings estimates
Archival Logs: Detailed logs of each archival run with timestamps, record counts, and any errors
该Skill可生成:
归档脚本:用于提取、转换并加载数据至归档目标的SQL与Shell脚本
归档表/文件:包含历史记录及元数据与时间戳的结构化存储
验证报告:包含记录行数、数据校验和及完整性检查的报告,以确认归档成功
存储指标:数据库大小缩减量、归档存储利用率及成本节约估算
归档日志:每次归档运行的详细日志,包含时间戳、记录数量及任何错误信息
Error Handling
错误处理
Insufficient Storage Space:
- Check available disk space on archive destination before execution
- Implement storage monitoring and alerting
- Use compression to reduce archive size
- Clean up old archives per retention policy before new archival
Data Integrity Issues:
- Run checksums on source data before and after archival
- Implement row count verification between source and archive
- Keep source data until archive verification completes
- Rollback archive transaction if verification fails
Permission Denied Errors:
- Verify database user has SELECT on source tables and INSERT on archive tables
- Confirm cloud storage credentials have write permissions
- Check network security groups allow connections to cloud storage
- Document required permissions for archival automation
Timeout During Large Archival:
- Split archival into smaller batches by date ranges
- Run archival incrementally over multiple days
- Increase database timeout settings for archival sessions
- Schedule archival during maintenance windows with extended timeouts
存储空间不足:
- 执行前检查归档目标的可用磁盘空间
- 实现存储监控与告警
- 使用压缩以减小归档大小
- 新归档前根据保留策略清理旧归档
数据完整性问题:
- 归档前后对源数据运行校验和
- 实现源数据与归档数据的行数验证
- 在归档验证完成前保留源数据
- 若验证失败,回滚归档事务
权限拒绝错误:
- 验证数据库用户拥有源表的SELECT权限与归档表的INSERT权限
- 确认云存储凭据拥有写入权限
- 检查网络安全组是否允许与云存储建立连接
- 记录归档自动化所需的权限
大规模归档超时:
- 按日期范围将归档拆分为更小的批次
- 分多天增量运行归档
- 增加归档会话的数据库超时设置
- 在维护窗口调度归档并延长超时时间
Resources
资源
Archival Configuration Templates:
- PostgreSQL archival:
{baseDir}/templates/postgresql-archive-config.yaml - MySQL archival:
{baseDir}/templates/mysql-archive-config.yaml - S3 cold storage:
{baseDir}/templates/s3-archive-config.yaml - Azure Blob storage:
{baseDir}/templates/azure-archive-config.yaml
Retention Policy Definitions:
{baseDir}/policies/retention-policies.yamlArchival Scripts Library:
{baseDir}/scripts/archival/- Extract to CSV script
- Extract to Parquet script
- S3 upload with compression
- Archive verification queries
Monitoring Dashboards:
Cost Analysis Tools:
{baseDir}/monitoring/archival-dashboard.json{baseDir}/tools/storage-cost-calculator.py归档配置模板:
- PostgreSQL归档:
{baseDir}/templates/postgresql-archive-config.yaml - MySQL归档:
{baseDir}/templates/mysql-archive-config.yaml - S3冷存储:
{baseDir}/templates/s3-archive-config.yaml - Azure Blob存储:
{baseDir}/templates/azure-archive-config.yaml
保留策略定义:
{baseDir}/policies/retention-policies.yaml归档脚本库:
{baseDir}/scripts/archival/- 提取至CSV的脚本
- 提取至Parquet的脚本
- 带压缩的S3上传脚本
- 归档验证查询
监控仪表板:
成本分析工具:
{baseDir}/monitoring/archival-dashboard.json{baseDir}/tools/storage-cost-calculator.pyOverview
概述
This skill provides automated assistance for the described functionality.
该Skill为上述功能提供自动化协助。
Examples
示例
Example usage patterns will be demonstrated in context.
将在上下文中演示示例使用模式。