databases
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDatabases: Production Configuration & Operations
数据库:生产环境配置与运维
Configure, tune, design schemas, migrate, back up, and review database engines - from single-node dev setups to PCI-compliant production clusters. The goal is correct, performant, durable databases that survive failures, pass audits, and don't wake you up at 3am.
Target versions (March 2026):
- PostgreSQL 18.3 (EOL 2030-11), previous: 17.9, 16.13
- MongoDB 8.0.20 (GA), 8.2.6 (rapid release, EOL 2026-07)
- MariaDB 11.8.6 (LTS, EOL 2028-06), 12.2.2 (rolling GA, EOL 2026-05)
- MySQL 8.4.8 (LTS), 9.6.0 (innovation)
- SQL Server 2025 RTM + CU3 (GA 2025-11-18)
- PgBouncer 1.25.1, Pgpool-II 4.7.1, ProxySQL 3.0.6
This skill covers six domains depending on context:
- Configuration - engine settings, authentication, TLS, tuning parameters
- Schema design - indexing strategy, partitioning, normalization, type selection
- Migration - cross-engine migration, zero-downtime DDL, ORM migration tooling
- Operations - backup/restore, replication, connection pooling, monitoring
- Performance - query plan analysis, index optimization, vacuum/maintenance
- Compliance - PCI-DSS 4.0 encryption, audit logging, key management, data masking
配置、调优、设计架构、迁移、备份以及审核数据库引擎——从单节点开发环境到符合PCI标准的生产集群。目标是打造正确、高性能、高可用性的数据库,能够应对故障、通过审计,且不会在凌晨3点让你紧急处理问题。
目标版本(2026年3月):
- PostgreSQL 18.3(终止支持时间2030-11),过往版本:17.9、16.13
- MongoDB 8.0.20(正式发布版),8.2.6(快速发布版,终止支持时间2026-07)
- MariaDB 11.8.6(长期支持版,终止支持时间2028-06),12.2.2(滚动正式发布版,终止支持时间2026-05)
- MySQL 8.4.8(长期支持版),9.6.0(创新版)
- SQL Server 2025 RTM + CU3(正式发布时间2025-11-18)
- PgBouncer 1.25.1、Pgpool-II 4.7.1、ProxySQL 3.0.6
本技能根据场景涵盖六大领域:
- 配置——引擎设置、认证、TLS、调优参数
- 架构设计——索引策略、分区、规范化、类型选择
- 迁移——跨引擎迁移、零停机DDL、ORM迁移工具
- 运维——备份/恢复、复制、连接池、监控
- 性能——查询计划分析、索引优化、清理/维护
- 合规——PCI-DSS 4.0加密、审计日志、密钥管理、数据掩码
When to use
适用场景
- Configuring database engines (postgresql.conf, mongod.conf, my.cnf, MSSQL settings)
- Designing or reviewing database schemas (indexes, partitioning, types, constraints)
- Planning or executing cross-engine migrations (MySQL -> PostgreSQL, etc.)
- Setting up backup/restore strategies and PITR
- Configuring replication (streaming, logical, replica sets, GTID)
- Tuning connection pooling (PgBouncer, ProxySQL, application-side pools)
- Analyzing query performance (EXPLAIN, slow query logs, index usage)
- Database-level PCI-DSS 4.0 compliance (encryption, audit logging, access control)
- Evaluating managed vs self-hosted database decisions
- Setting up database monitoring and alerting
- 配置数据库引擎(postgresql.conf、mongod.conf、my.cnf、MSSQL设置)
- 设计或审核数据库架构(索引、分区、类型、约束)
- 规划或执行跨引擎迁移(如MySQL -> PostgreSQL)
- 制定备份/恢复策略及PITR(时点恢复)方案
- 配置复制(流式复制、逻辑复制、副本集、GTID)
- 调优连接池(PgBouncer、ProxySQL、应用端连接池)
- 分析查询性能(EXPLAIN、慢查询日志、索引使用率)
- 数据库层面的PCI-DSS 4.0合规(加密、审计日志、访问控制)
- 评估托管型与自托管型数据库的选择
- 设置数据库监控与告警
When NOT to use
不适用场景
- Deploying databases on Kubernetes (StatefulSets, PVCs, operators) - use kubernetes
- Provisioning managed databases (RDS, Cloud SQL, Atlas) via IaC - use terraform
- Docker Compose for database containers - use docker
- Database-related Ansible playbooks and roles - use ansible
- Application-level database bugs (N+1, transaction misuse, ORM pitfalls) - use code-review
- SQL injection detection, connection string secrets in code - use security-audit
- CI/CD pipelines that run migrations - use ci-cd
- 在Kubernetes上部署数据库(StatefulSets、PVC、Operator)——使用kubernetes技能
- 通过IaC配置托管型数据库(RDS、Cloud SQL、Atlas)——使用terraform技能
- 通过Docker Compose部署数据库容器——使用docker技能
- 数据库相关的Ansible剧本与角色——使用ansible技能
- 应用层面的数据库Bug(N+1查询、事务误用、ORM陷阱)——使用code-review技能
- SQL注入检测、代码中的连接字符串密钥问题——使用security-audit技能
- 执行迁移的CI/CD流水线——使用ci-cd技能
AI Self-Check
AI自检清单
AI tools consistently produce the same database mistakes. Before returning any generated SQL, schema, migration, or config, verify against this list:
AI工具在处理数据库任务时经常犯相同的错误。在返回任何生成的SQL、架构、迁移脚本或配置之前,请对照以下清单验证:
Migrations
迁移相关
- /
IF NOT EXISTSguards onIF EXISTS/ADD COLUMN(bare DDL crashes on re-run)DROP COLUMN - Adding column includes a
NOT NULLvalue (or two-step: add nullable, backfill, alter NOT NULL)DEFAULT - Index creation uses on PostgreSQL (prevents full table lock)
CONCURRENTLY - Large table changes run in batches, not a single transaction (lock escalation, OOM risk)
- Migration is backward-compatible (old app version can still run against new schema)
- No or
DROP TABLEwithout explicit user confirmationDROP DATABASE - Migration is idempotent - can be run twice without error
- Rollback/down migration exists and is tested
- PG enum changes use outside a transaction (cannot run inside BEGIN/COMMIT, fails silently in some ORMs)
ALTER TYPE ... ADD VALUE
- /
ADD COLUMN操作需包含DROP COLUMN/IF NOT EXISTS防护(无防护的DDL重复执行会崩溃)IF EXISTS - 添加列时需设置
NOT NULL值(或分两步:先添加可为空列,回填数据,再修改为NOT NULL)DEFAULT - PostgreSQL中创建索引需使用(避免全表锁)
CONCURRENTLY - 大表变更需分批执行,而非单事务操作(避免锁升级、内存溢出风险)
- 迁移需保持向后兼容(旧版本应用仍能在新架构下运行)
- 无用户明确确认时,不得执行或
DROP TABLEDROP DATABASE - 迁移需具备幂等性——重复执行不会报错
- 回滚/降级迁移脚本已存在并经过测试
- PostgreSQL枚举类型变更需在事务外执行(无法在BEGIN/COMMIT内运行,部分ORM中会静默失败)
ALTER TYPE ... ADD VALUE
Schema
架构相关
- New timestamp columns use (PG),
timestamptz(MSSQL),DATETIME2(MySQL -DATETIMEhas the 2038 problem)TIMESTAMP - Character set is for MySQL (not
utf8mb4which is 3-byte only),utf8for PGUTF8 - Identity columns use over
GENERATED ALWAYS AS IDENTITYin PG 10+ (non-bypassable)SERIAL - Foreign keys have explicit behavior (don't rely on engine defaults)
ON DELETE - Indexes exist on all foreign key columns (PG does NOT auto-create these, unlike MySQL InnoDB)
- Composite index column order follows: equality columns first, range column last, selectivity-ordered
- 新时间戳列使用(PostgreSQL)、
timestamptz(MSSQL)、DATETIME2(MySQL——DATETIME存在2038年问题)TIMESTAMP - MySQL字符集使用(而非仅支持3字节的
utf8mb4),PostgreSQL使用utf8UTF8 - PostgreSQL 10+版本中,身份列使用而非
GENERATED ALWAYS AS IDENTITY(不可绕过)SERIAL - 外键需明确设置行为(不要依赖引擎默认值)
ON DELETE - 所有外键列均存在索引(PostgreSQL不会自动创建,与MySQL InnoDB不同)
- 复合索引列顺序遵循:等值列优先,范围列最后,按选择性排序
Configuration
配置相关
- No or
trustauthentication inmd5(usepg_hba.conf)scram-sha-256 - MySQL includes
sql_mode(prevents silent data truncation)STRICT_TRANS_TABLES - TLS enforced for all connections (not just "available")
- is sized for the actual workload, not left at default
max_connections - Password authentication uses modern hashing (SCRAM-SHA-256 for PG, for MySQL)
caching_sha2_password - No default/example passwords in config files
- 中不得使用
pg_hba.conf或trust认证(使用md5)scram-sha-256 - MySQL 需包含
sql_mode(防止静默数据截断)STRICT_TRANS_TABLES - 强制所有连接使用TLS(而非仅“可选”)
- 根据实际工作负载调整,而非保留默认值
max_connections - 密码认证使用现代哈希算法(PostgreSQL用SCRAM-SHA-256,MySQL用)
caching_sha2_password - 配置文件中无默认/示例密码
Bulk Operations
批量操作相关
- Bulk inserts are chunked - never pass an unbounded array into a single statement
INSERT ... VALUES - Chunk size computed as , where host-parameter limits are: PostgreSQL 65,535, MySQL 65,535, SQLite 32,766 (default
floor(max_host_parameters / columns_per_row)), MSSQL 2,100SQLITE_MAX_VARIABLE_NUMBER - When the app targets multiple backends, chunk size uses the lowest limit across all supported engines
- Chunked writes stay inside one transaction when replace-all semantics are required (DELETE + INSERT pattern)
- Tests assert that multiple insert calls fire when row count crosses the safe chunk threshold
- 批量插入需分块——切勿将无界数组传入单个语句
INSERT ... VALUES - 分块大小计算公式为,各引擎参数上限:PostgreSQL 65,535、MySQL 65,535、SQLite 32,766(默认
floor(max_host_parameters / columns_per_row))、MSSQL 2,100SQLITE_MAX_VARIABLE_NUMBER - 当应用面向多后端时,分块大小取所有支持引擎中的最低上限
- 若需替换全部数据语义(DELETE + INSERT模式),分块写入需在单个事务内完成
- 测试需断言:当行数超过安全分块阈值时,会触发多次插入调用
General
通用规则
- All SQL uses parameterized queries / prepared statements (never string concatenation)
- Connection pool settings don't exceed across all app instances
max_connections - Backup strategy tested by actually restoring (backup without restore test = hope, not a strategy)
- Secrets (passwords, connection strings) injected via env vars or secret managers, not config files
- 所有SQL使用参数化查询/预编译语句(切勿使用字符串拼接)
- 连接池设置不得超过所有应用实例的总和
max_connections - 备份策略需通过实际恢复测试(无恢复测试的备份只是侥幸,而非有效策略)
- 密钥(密码、连接字符串)通过环境变量或密钥管理器注入,而非存储在配置文件中
Workflow
工作流程
Step 1: Determine the domain
步骤1:确定领域
Based on the request:
- "Configure PostgreSQL / tune settings" -> Configuration
- "Design a schema / review indexes" -> Schema design
- "Migrate from X to Y / add a column" -> Migration
- "Set up backups / replication / pooling" -> Operations
- "This query is slow / optimize" -> Performance
- "PCI audit / encrypt database" -> Compliance
- "Review this schema/config" -> Apply production checklist + AI self-check
根据请求类型:
- “配置PostgreSQL / 调优设置” -> 配置领域
- “设计架构 / 审核索引” -> 架构设计领域
- “从X迁移到Y / 添加列” -> 迁移领域
- “设置备份 / 复制 / 连接池” -> 运维领域
- “查询缓慢 / 优化性能” -> 性能领域
- “PCI审计 / 加密数据库” -> 合规领域
- “审核此架构/配置” -> 应用生产环境检查清单 + AI自检清单
Step 2: Gather requirements
步骤2:收集需求
Before writing SQL or config, determine these. If the user doesn't specify, use the sensible defaults shown:
- Engine and version - behavior differs significantly across versions. Default: latest stable (see target versions above).
- Deployment model - self-hosted (bare metal, VM, container, K8s) vs managed (RDS, Cloud SQL, Atlas). Default: assume self-hosted unless context says otherwise.
- Workload type - OLTP (many small transactions) vs OLAP (few large queries) vs mixed. Default: OLTP.
- Data volume - row counts, table sizes, growth rate. Default: medium (1-100GB). If unknown, optimize for growth.
- Compliance - PCI-DSS CDE? HIPAA? What data classification? Default: no compliance scope (but still apply security baseline).
- HA requirements - RTO/RPO targets, multi-AZ, read replicas. Default: single-node with PITR.
- Existing infrastructure - what's already running, what ORMs/drivers are in use. Inspect the codebase if accessible.
在编写SQL或配置之前,需明确以下内容。若用户未指定,使用以下合理默认值:
- 引擎及版本——不同版本行为差异显著。默认:最新稳定版(见上方目标版本)。
- 部署模式——自托管(裸机、虚拟机、容器、K8s) vs 托管型(RDS、Cloud SQL、Atlas)。默认:除非上下文说明,否则假设为自托管。
- 工作负载类型——OLTP(大量小事务) vs OLAP(少量大查询) vs 混合负载。默认:OLTP。
- 数据量——行数、表大小、增长率。默认:中等规模(1-100GB)。若未知,按可扩展优化。
- 合规要求——是否属于PCI-DSS CDE?HIPAA?数据分类是什么?默认:无合规范围(但仍需应用安全基线)。
- 高可用要求——RTO/RPO目标、多AZ、只读副本。默认:单节点+PITR。
- 现有基础设施——当前运行的系统、使用的ORM/驱动。若可访问,检查代码库。
Step 3: Build
步骤3:执行任务
Follow the domain-specific section below. Always apply the production checklist and AI self-check before finishing.
Common operations - quick-start patterns:
Query optimization (the most frequent request):
- Get the plan: (PG),
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT ...;(MySQL 8.0+),EXPLAIN FORMAT=TREE ...;(MongoDB).db.collection.explain('executionStats').find(...) - Look for: on large tables (PG) /
Seq Scan(MySQL) /Full Table Scan(MongoDB),COLLSCANwith high row estimates,Nested Loopspilling to disk (Sort), andSort Method: external merge>>Rows Removed by Filterreturned. 2a. Before assuming an index problem, verify semantic correctness: check JOIN conditions for column mismatches, confirm WHERE clause predicates don't unintentionally filter nulls, and verify LEFT JOIN semantics aren't silently converted to INNER JOIN by outer-table filters.Rows - Check index usage: (PG; adjust
SELECT schemaname, relname, idx_scan, seq_scan FROM pg_stat_user_tables WHERE seq_scan > 100 ORDER BY seq_scan DESC;filter for non-public schemas) orschemaname = 'public'(MySQL performance_schema).SELECT * FROM sys.schema_unused_indexes; - If a missing index is the fix, create it with (PG):
CONCURRENTLYorCREATE INDEX CONCURRENTLY idx_orders_customer ON orders (customer_id);(MySQL).ALGORITHM=INPLACE, LOCK=NONE - Re-run to confirm the planner uses the new index and that estimated vs actual rows are close.
EXPLAIN ANALYZE
Lock contention / deadlock diagnosis (second most common "it's stuck" scenario):
- PG:
SELECT pid, age(backend_xact_start), query, wait_event_type, wait_event FROM pg_stat_activity WHERE state != 'idle' AND wait_event IS NOT NULL ORDER BY age(backend_xact_start) DESC; - PG blocked queries:
SELECT blocked.pid AS blocked_pid, blocked.query AS blocked_query, blocking.pid AS blocking_pid, blocking.query AS blocking_query FROM pg_stat_activity blocked JOIN pg_locks bl ON bl.pid = blocked.pid JOIN pg_locks kl ON kl.locktype = bl.locktype AND kl.database IS NOT DISTINCT FROM bl.database AND kl.relation IS NOT DISTINCT FROM bl.relation AND kl.page IS NOT DISTINCT FROM bl.page AND kl.tuple IS NOT DISTINCT FROM bl.tuple AND kl.transactionid IS NOT DISTINCT FROM bl.transactionid AND kl.pid != bl.pid JOIN pg_stat_activity blocking ON blocking.pid = kl.pid WHERE NOT bl.granted AND kl.granted; - MySQL: - look for
SHOW ENGINE INNODB STATUS\Gsection. Also:LATEST DETECTED DEADLOCK(MySQL 8.0+).SELECT * FROM performance_schema.data_lock_waits; - MongoDB: and check
db.currentOp({"waitingForLock": true})log formongodentries.LockTimeout - Fix: kill the blocking query if it's stuck (/
SELECT pg_terminate_backend(PID);), then investigate why the lock was held (long transactions, missing indexes on UPDATE/DELETE WHERE clauses, lock ordering bugs in application code).KILL CONNECTION thread_id;
Connection pooler sizing (second most common):
- Determine backend budget: minus superuser_reserved minus replication slots = available.
max_connections - PgBouncer per user/db pair: start at
default_pool_size, round down.available / number_of_app_instances - Set to the total connections your app fleet will open (all instances combined).
max_client_conn - Use pool mode for web apps (stateless requests). Use
transactionmode only if the app uses prepared statements without PgBouncer 1.21+session, temp tables, ormax_prepared_statementscommands.SET - Validate: - watch
psql -h pgbouncer-host -p 6432 pgbouncer -c "SHOW POOLS;"vssv_activeunder load.sv_idle
Migration safety check (before running any DDL in production):
- Verify the migration is idempotent: /
IF NOT EXISTSguards on all DDL.IF EXISTS - Check table size: - anything over 1GB needs batched operations or
SELECT pg_size_pretty(pg_total_relation_size('tablename'));indexes.CONCURRENTLY - Verify backward compatibility: can the current app version still function after this DDL runs?
- Test the rollback/down migration against a copy of production data, not just an empty schema.
- Run during low-traffic window if the operation takes locks (even brief ones).
遵循以下领域特定操作。完成前务必应用生产环境检查清单和AI自检清单。
常见操作——快速入门模式:
查询优化(最常见请求):
- 获取执行计划:(PostgreSQL)、
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT ...;(MySQL 8.0+)、EXPLAIN FORMAT=TREE ...;(MongoDB)。db.collection.explain('executionStats').find(...) - 关注以下情况:大表上的(PostgreSQL)/
Seq Scan(MySQL)/Full Table Scan(MongoDB)、高行数预估的COLLSCAN、磁盘溢出排序(Nested Loop)、Sort Method: external merge远大于返回Rows Removed by Filter。 2a. 在假设索引问题前,先验证语义正确性:检查JOIN条件的列是否匹配,确认WHERE子句谓词不会意外过滤空值,验证LEFT JOIN语义未因外表过滤被静默转换为INNER JOIN。Rows - 检查索引使用率:(PostgreSQL;非public架构需调整
SELECT schemaname, relname, idx_scan, seq_scan FROM pg_stat_user_tables WHERE seq_scan > 100 ORDER BY seq_scan DESC;过滤条件)或schemaname = 'public'(MySQL performance_schema)。SELECT * FROM sys.schema_unused_indexes; - 若需添加缺失索引,PostgreSQL中使用:
CONCURRENTLY,MySQL中使用CREATE INDEX CONCURRENTLY idx_orders_customer ON orders (customer_id);。ALGORITHM=INPLACE, LOCK=NONE - 重新运行确认查询优化器使用新索引,且预估行数与实际行数接近。
EXPLAIN ANALYZE
锁竞争/死锁诊断(第二常见的“卡住”场景):
- PostgreSQL:
SELECT pid, age(backend_xact_start), query, wait_event_type, wait_event FROM pg_stat_activity WHERE state != 'idle' AND wait_event IS NOT NULL ORDER BY age(backend_xact_start) DESC; - PostgreSQL阻塞查询:
SELECT blocked.pid AS blocked_pid, blocked.query AS blocked_query, blocking.pid AS blocking_pid, blocking.query AS blocking_query FROM pg_stat_activity blocked JOIN pg_locks bl ON bl.pid = blocked.pid JOIN pg_locks kl ON kl.locktype = bl.locktype AND kl.database IS NOT DISTINCT FROM bl.database AND kl.relation IS NOT DISTINCT FROM bl.relation AND kl.page IS NOT DISTINCT FROM bl.page AND kl.tuple IS NOT DISTINCT FROM bl.tuple AND kl.transactionid IS NOT DISTINCT FROM bl.transactionid AND kl.pid != bl.pid JOIN pg_stat_activity blocking ON blocking.pid = kl.pid WHERE NOT bl.granted AND kl.granted; - MySQL:——查看
SHOW ENGINE INNODB STATUS\G部分。此外:LATEST DETECTED DEADLOCK(MySQL 8.0+)。SELECT * FROM performance_schema.data_lock_waits; - MongoDB:并检查
db.currentOp({"waitingForLock": true})日志中的mongod条目。LockTimeout - 修复:若阻塞查询卡住,终止该查询(/
SELECT pg_terminate_backend(PID);),然后调查锁持有原因(长事务、UPDATE/DELETE WHERE子句缺失索引、应用代码中的锁顺序Bug)。KILL CONNECTION thread_id;
连接池大小调优(第二常见场景):
- 确定后端预算:减去超级用户预留数和复制槽数 = 可用连接数。
max_connections - PgBouncer每个用户/数据库对的:初始值为
default_pool_size,向下取整。可用连接数 / 应用实例数 - 设置为应用集群所有实例的总连接数。
max_client_conn - Web应用使用池模式(无状态请求)。仅当应用使用预编译语句且未启用PgBouncer 1.21+的
transaction、临时表或max_prepared_statements命令时,才使用SET模式。session - 验证:——负载下监控
psql -h pgbouncer-host -p 6432 pgbouncer -c "SHOW POOLS;"vssv_active。sv_idle
迁移安全检查(生产环境执行任何DDL前):
- 验证迁移具备幂等性:所有DDL均包含/
IF NOT EXISTS防护。IF EXISTS - 检查表大小:——超过1GB的表需分批操作或使用
SELECT pg_size_pretty(pg_total_relation_size('tablename'));创建索引。CONCURRENTLY - 验证向后兼容性:当前应用版本在DDL执行后仍能正常运行吗?
- 针对生产数据副本测试回滚/降级迁移脚本,而非仅测试空架构。
- 若操作会加锁(即使短暂),选择低流量时段执行。
Step 4: Validate
步骤4:验证
bash
undefinedbash
undefinedPostgreSQL
PostgreSQL
psql -c "SHOW config_file;" # verify config location
pg_isready -h localhost # connection check
psql -c "SELECT * FROM pg_hba_file_rules;" # verify pg_hba.conf
psql -c "EXPLAIN (ANALYZE, BUFFERS) <query>;" # query plan analysis
psql -c "SHOW config_file;" # 验证配置文件位置
pg_isready -h localhost # 连接检查
psql -c "SELECT * FROM pg_hba_file_rules;" # 验证pg_hba.conf
psql -c "EXPLAIN (ANALYZE, BUFFERS) <query>;" # 查询计划分析
MongoDB
MongoDB
mongosh --eval "db.adminCommand({getCmdLineOpts: 1})" # verify config
mongosh --eval "rs.status()" # replica set health
mongosh --eval "db.collection.explain('executionStats').find({})"
mongosh --eval "db.adminCommand({getCmdLineOpts: 1})" # 验证配置
mongosh --eval "rs.status()" # 副本集健康检查
mongosh --eval "db.collection.explain('executionStats').find({})"
MySQL / MariaDB
MySQL / MariaDB
mysql -e "SELECT @@sql_mode;" # verify strict mode
mysql -e "SHOW VARIABLES LIKE 'innodb%';" # InnoDB settings
mysql -e "EXPLAIN FORMAT=TREE <query>;" # query plan (MySQL 8.0+)
mysql -e "SELECT @@sql_mode;" # 验证严格模式
mysql -e "SHOW VARIABLES LIKE 'innodb%';" # InnoDB设置
mysql -e "EXPLAIN FORMAT=TREE <query>;" # 查询计划(MySQL 8.0+)
MSSQL
MSSQL
sqlcmd -Q "SELECT @@VERSION;"
sqlcmd -Q "DBCC CHECKDB ('dbname') WITH NO_INFOMSGS;" # integrity check
---sqlcmd -Q "SELECT @@VERSION;"
sqlcmd -Q "DBCC CHECKDB ('dbname') WITH NO_INFOMSGS;" # 完整性检查
---Engine Routing
引擎专属要点
- PostgreSQL: SCRAM, poolers, WAL or logical replication, , and careful vacuum strategy
pg_stat_statements - MongoDB: replica-set health, schema validation, oplog sizing, and avoiding fan-out document patterns
- MySQL/MariaDB: strict mode, , GTID or Galera choices, and understanding the MySQL/MariaDB divergence
utf8mb4 - MSSQL: memory limits, TempDB layout, Query Store, and backup or restore discipline
Read for copy-pasteable engine configs and
for recovery specifics.
references/config-templates.mdreferences/backup-patterns.md- PostgreSQL:SCRAM认证、连接池、WAL或逻辑复制、、精细化清理策略
pg_stat_statements - MongoDB:副本集健康状态、架构验证、oplog大小、避免扇出文档模式
- MySQL/MariaDB:严格模式、、GTID或Galera选择、理解MySQL与MariaDB的差异
utf8mb4 - MSSQL:内存限制、TempDB布局、查询存储、备份与恢复规范
可参考获取可直接复制的引擎配置模板,参考获取恢复细节。
references/config-templates.mdreferences/backup-patterns.mdSchema, Migration, and Performance
架构、迁移与性能
- Favor expand-contract for zero-downtime schema changes.
- Composite index order still follows equality, sort, then range.
- Choose tenant isolation deliberately; PCI-sensitive shared-schema designs need extra scrutiny.
- Treat query-plan review and monitoring as normal operations, not emergency-only work.
- Watch for clauses that nullify
WHEREsemantics (filtering the outer table converts it toLEFT JOIN- move the filter into theINNER JOINclause or use a subquery).ON
- 零停机架构变更优先采用“扩展-收缩”模式。
- 复合索引顺序仍遵循“等值、排序、范围”原则。
- 需谨慎选择租户隔离方式;涉及PCI敏感数据的共享架构设计需额外审查。
- 将查询计划审查与监控视为常规运维工作,而非仅用于紧急情况。
- 注意可能破坏LEFT JOIN语义的WHERE子句(过滤外表会将其转换为INNER JOIN——需将过滤条件移至ON子句或使用子查询)。
Major version upgrades
大版本升级
Three approaches, pick by downtime tolerance:
| Method | Downtime | Best for |
|---|---|---|
| Minutes (metadata copy) | Small-to-medium DBs where brief downtime is acceptable |
| Logical replication | Seconds (cutover only) | Large DBs, zero-downtime requirement, PG 10+ to any higher |
| Hours (full copy) | Cross-engine, major schema changes, or when logical replication is impractical |
Zero-downtime with logical replication (PG -> PG):
- Stand up new version replica alongside the primary
- Create publication on source:
CREATE PUBLICATION upgrade_pub FOR ALL TABLES; - Create subscription on target:
CREATE SUBSCRIPTION upgrade_sub CONNECTION '...' PUBLICATION upgrade_pub; - Wait for initial sync + catchup (monitor , replication lag)
pg_stat_subscription - Migrate sequences: logical replication does not replicate sequence values - copy them manually
- Test application against the new version (read traffic, connection pooler split). Monitor during split-test: query latency p50/p95/p99 (compare old vs new - regression means planner stats or config drift), error rates by query type (new version may have stricter behavior), connection pool saturation (in PgBouncer - watch
SHOW POOLS), replication lag trend (should stay flat or decrease, never grow during read-only test), and memory/CPU on the new instance. Run for at least one full traffic cycle (24h if your workload is diurnal). Abort and route back to old primary if error rate exceeds baseline or p99 latency degrades >20%.cl_waiting - Cutover: stop writes to old primary, verify lag = 0, point connection pooler to new primary
- Drop subscription and decommission old primary
Rollback procedure (if replication stalls or validation fails):
- Lag not reaching zero: Check for
pg_stat_subscriptionvslast_msg_send_timedelta. Common causes: long-running transactions on source blocking WAL send, tables missing primary keys (forces full-row comparison), or network throughput limits. If lag is stuck, checklast_msg_receipt_timeon the source forpg_replication_slots- an inactive slot means the subscription dropped. Recreate the subscription; do not proceed with cutover.active = false - Application validation fails on new version: Route all traffic back to the old primary (update pooler config). The old primary never stopped accepting writes, so no data is lost. Drop the subscription on the new instance: - this also drops the replication slot on the source. Investigate, fix, and restart from step 3.
DROP SUBSCRIPTION upgrade_sub; - Post-cutover rollback (writes already went to new primary): This is the hard case. Options: (a) set up reverse logical replication from new -> old before decommissioning the old primary (plan this before cutover if RTO requires it), or (b) restore old primary from backup + WAL and accept data loss for the cutover window. Option (a) requires the old primary to still be running. Decision point: if you need rollback after cutover, set up reverse replication in step 6 before routing write traffic.
Key pitfalls (check all of these before starting):
- Tables need primary keys or - check first:
REPLICA IDENTITY FULLSELECT c.relname FROM pg_class c JOIN pg_namespace n ON c.relnamespace = n.oid WHERE n.nspname = 'public' AND c.relkind = 'r' AND NOT EXISTS (SELECT 1 FROM pg_constraint WHERE conrelid = c.oid AND contype = 'p'); - DDL is not replicated - schema changes during migration need manual sync on both sides
- Large objects () are not replicated
lo - Sequence values drift - copy them manually after cutover, not before
Read for cross-engine type mapping, ORM migration tooling, and detailed migration patterns.
references/migration-patterns.md三种方式,根据停机容忍度选择:
| 方法 | 停机时间 | 适用场景 |
|---|---|---|
| 分钟级(仅复制元数据) | 中小规模数据库,可接受短暂停机 |
| 逻辑复制 | 秒级(仅切换阶段) | 大规模数据库、要求零停机、PostgreSQL 10+升级至更高版本 |
| 小时级(全量复制) | 跨引擎迁移、重大架构变更、逻辑复制不可行的场景 |
逻辑复制实现零停机升级(PostgreSQL -> PostgreSQL):
- 在主节点旁部署新版本副本
- 在源端创建发布:
CREATE PUBLICATION upgrade_pub FOR ALL TABLES; - 在目标端创建订阅:
CREATE SUBSCRIPTION upgrade_sub CONNECTION '...' PUBLICATION upgrade_pub; - 等待初始同步与追平(监控、复制延迟)
pg_stat_subscription - 迁移序列:逻辑复制不会复制序列值——需手动复制
- 针对新版本测试应用(分流读流量、连接池拆分)。分流测试期间监控:查询延迟p50/p95/p99(对比新旧版本——性能退化意味着查询优化器统计信息或配置漂移)、按查询类型统计的错误率(新版本可能有更严格的行为)、连接池饱和度(PgBouncer中执行——监控
SHOW POOLS)、复制延迟趋势(应保持平稳或下降,只读测试期间不得增长)、新实例的内存/CPU使用率。至少运行一个完整流量周期(若工作负载随时间变化则运行24小时)。若错误率超过基线或p99延迟下降超过20%,则中止并将流量切回旧主节点。cl_waiting - 切换:停止向旧主节点写入,验证延迟=0,将连接池指向新主节点
- 删除订阅并停用旧主节点
回滚流程(若复制停滞或验证失败):
- 延迟无法归零:检查中
pg_stat_subscription与last_msg_send_time的差值。常见原因:源端长事务阻塞WAL发送、表缺少主键(需全行比较)、网络吞吐量限制。若延迟停滞,检查源端last_msg_receipt_time中的pg_replication_slots——非活跃槽意味着订阅已断开。重新创建订阅;不得继续切换。active = false - 新版本应用验证失败:将所有流量切回旧主节点(更新连接池配置)。旧主节点从未停止接受写入,因此无数据丢失。删除新实例上的订阅:——这也会删除源端的复制槽。排查问题、修复后从步骤3重新开始。
DROP SUBSCRIPTION upgrade_sub; - 切换后回滚(已向新主节点写入):这是最复杂的情况。选项:(a) 在停用旧主节点前,设置从新主节点到旧主节点的反向逻辑复制(若RTO要求,需在切换前规划此步骤);(b) 从备份+WAL恢复旧主节点,并接受切换窗口内的数据丢失。选项(a)要求旧主节点仍在运行。决策点:若切换后需要回滚,在步骤6路由写流量前设置反向复制。
关键陷阱(开始前需全部检查):
- 表需具备主键或——先检查:
REPLICA IDENTITY FULLSELECT c.relname FROM pg_class c JOIN pg_namespace n ON c.relnamespace = n.oid WHERE n.nspname = 'public' AND c.relkind = 'r' AND NOT EXISTS (SELECT 1 FROM pg_constraint WHERE conrelid = c.oid AND contype = 'p'); - DDL不会被复制——迁移期间的架构变更需在两端手动同步
- 大对象()不会被复制
lo - 序列值会漂移——切换后手动复制,而非切换前
可参考获取跨引擎类型映射、ORM迁移工具及详细迁移模式。
references/migration-patterns.mdPooling, Backup, and Platform Choice
连接池、备份与平台选择
- PostgreSQL pooling is usually non-negotiable; use PgBouncer unless a concrete reason says otherwise.
- Backup discipline means restore testing, encryption, retention limits, and monitoring backup freshness.
- Managed databases reduce toil but do not remove shared-responsibility or compliance review.
- Self-hosted databases buy control at the cost of HA, patching, and operational burden.
- PostgreSQL通常必须使用连接池;除非有明确理由,否则使用PgBouncer。
- 备份规范包括恢复测试、加密、保留限制及备份新鲜度监控。
- 托管型数据库减少运维工作,但不会免除共享责任或合规审查。
- 自托管型数据库获得控制权,但需承担高可用、补丁更新及运维负担。
Production Checklist
生产环境检查清单
All Engines
所有引擎通用
- Authentication uses modern hashing (SCRAM-SHA-256, caching_sha2_password, certificate auth)
- TLS enforced for all connections (not just "available")
- No default passwords, no auth, no passwordless access
trust - Connection pool in front of the database (PgBouncer, ProxySQL, or application-side)
- sized for actual workload, not default
max_connections - Backup strategy implemented, tested, and monitored
- Restore procedure documented and tested (at least monthly)
- Monitoring in place (connections, query performance, replication lag, disk usage)
- Slow query logging enabled with appropriate threshold
- Dead/unused indexes identified and removed
- Character encoding correct (for MySQL,
utf8mb4for PG)UTF8
- 认证使用现代哈希算法(SCRAM-SHA-256、caching_sha2_password、证书认证)
- 强制所有连接使用TLS(而非仅“可选”)
- 无默认密码、无认证、无无密码访问
trust - 数据库前端部署连接池(PgBouncer、ProxySQL或应用端连接池)
- 根据实际工作负载调整,而非保留默认值
max_connections - 备份策略已实施、测试并监控
- 恢复流程已文档化并测试(至少每月一次)
- 已设置监控(连接数、查询性能、复制延迟、磁盘使用率)
- 已启用慢查询日志并设置合适阈值
- 已识别并删除无用/未使用索引
- 字符编码正确(MySQL用,PostgreSQL用
utf8mb4)UTF8
PostgreSQL-Specific
PostgreSQL专属
- :
pg_hba.confonly,hostsslonly, CIDR-restrictedscram-sha-256 - = 25% RAM,
shared_buffers= 75% RAMeffective_cache_size - sized for concurrency (default 64MB is high for OLTP with many connections - per-sort, not per-connection)
work_mem - for SSD storage
random_page_cost = 1.1 - set per-role (not globally - migrations need longer)
statement_timeout - set (60s default)
idle_in_transaction_session_timeout - enabled
pg_stat_statements - Autovacuum tuned for large tables ()
autovacuum_vacuum_scale_factor - WAL archiving enabled for PITR (+ pgBackRest/Barman, or managed backup)
archive_mode = on - Foreign key columns have indexes
- pgAudit installed and configured (if PCI scope)
- Patched against CVE-2026-2005 (pgcrypto heap buffer overflow, RCE) - 18.2+ / 17.8+ / 16.12+
- :仅使用
pg_hba.conf、仅使用hostssl、限制CIDR范围scram-sha-256 - = 内存的25%,
shared_buffers= 内存的75%effective_cache_size - 根据并发量调整(默认64MB对多连接OLTP场景过高——按排序分配,而非按连接分配)
work_mem - SSD存储设置
random_page_cost = 1.1 - 按角色设置(而非全局设置——迁移需要更长时间)
statement_timeout - 设置(默认60秒)
idle_in_transaction_session_timeout - 启用
pg_stat_statements - 针对大表调优自动清理()
autovacuum_vacuum_scale_factor - 启用WAL归档以支持PITR(+ pgBackRest/Barman,或托管备份)
archive_mode = on - 外键列均有索引
- 若属于PCI范围,安装并配置pgAudit
- 针对CVE-2026-2005(pgcrypto堆缓冲区溢出、远程代码执行)打补丁——需升级至18.2+ / 17.8+ / 16.12+
MySQL/MariaDB-Specific
MySQL/MariaDB专属
- includes
sql_mode(prevents silent data truncation)STRICT_TRANS_TABLES - = 70% RAM
innodb_buffer_pool_size - for durability
innodb_flush_log_at_trx_commit = 1 -
require_secure_transport = ON -
character-set-server = utf8mb4 - Binary log enabled for PITR ()
log_bin = ON -
innodb_file_per_table = ON - MariaDB patched against CVE-2026-32710 (JSON_SCHEMA_VALID crash/RCE) - 11.8.6+ / 11.4.10+
- 包含
sql_mode(防止静默数据截断)STRICT_TRANS_TABLES - = 内存的70%
innodb_buffer_pool_size - (保证持久性)
innodb_flush_log_at_trx_commit = 1 -
require_secure_transport = ON -
character-set-server = utf8mb4 - 启用二进制日志以支持PITR()
log_bin = ON -
innodb_file_per_table = ON - 针对CVE-2026-32710(JSON_SCHEMA_VALID崩溃/远程代码执行)打补丁——需升级至MariaDB 11.8.6+ / 11.4.10+
MongoDB-Specific
MongoDB专属
- (never run without auth)
security.authorization: enabled - Replica set with 3+ members (not standalone in production)
- Write concern (default in 8.0+)
w: "majority" - Schema validation () on critical collections
$jsonSchema - Patched against MongoBleed (CVE-2025-14847) - 8.0.17+
- Patched against CVE-2026-25611 (pre-auth DoS via compression) - 8.0.18+ / 8.2.4+
- TLS enabled ()
net.tls.mode: requireTLS
- (切勿在无认证情况下运行)
security.authorization: enabled - 使用3个及以上成员的副本集(生产环境不得使用单节点)
- 写入关注设置(8.0+版本默认)
w: "majority" - 关键集合启用架构验证()
$jsonSchema - 针对MongoBleed(CVE-2025-14847)打补丁——需升级至8.0.17+
- 针对MongoDB压缩DoS漏洞(CVE-2026-25611)打补丁——需升级至8.0.18+ / 8.2.4+
- 启用TLS()
net.tls.mode: requireTLS
MSSQL-Specific
MSSQL专属
- set explicitly (not unlimited)
Max Server Memory - set to core count per NUMA node
MAXDOP - raised from default 5
Cost Threshold for Parallelism - TempDB files = min(CPU cores, 8), equal size
- Query Store enabled
- Recovery model = FULL for production databases
- TDE enabled for CDE databases
- Patched against CVE-2026-21262 (privilege escalation) - March 2026 CU+
- 明确设置(而非无限制)
Max Server Memory - 设置为每个NUMA节点的核心数
MAXDOP - 将从默认值5调高
Cost Threshold for Parallelism - TempDB文件数 = min(CPU核心数, 8),且所有文件大小相等
- 启用查询存储
- 生产数据库恢复模式设置为FULL
- CDE数据库启用TDE
- 针对CVE-2026-21262(权限提升)打补丁——需升级至2026年3月CU+
Compliance (PCI-DSS 4.0)
合规(PCI-DSS 4.0)
- Encryption at rest: TDE or column-level (disk-level alone insufficient per Req 3.5.1.2)
- Encryption in transit: TLS 1.2+ enforced on all connections (Req 4)
- Audit logging: pgAudit/audit plugin, shipped to immutable SIEM (Req 10)
- Key management: keys in HSM/KMS, not alongside data (Req 3.6)
- Key rotation: DEKs every 90 days, master keys annually (Req 3.7)
- Access control: separate roles, no shared accounts, MFA for CDE access (Req 7, 8)
- Data masking: PAN display limited to last 4 digits (Req 3.3)
- No cardholder data in non-prod environments (Req 6.5.4)
- Quarterly access review documented (Req 7.2.5)
- Vulnerability scanning includes database engine, not just containers (Req 11.3)
- 静态加密:使用TDE或列级加密(仅磁盘级加密不符合Req 3.5.1.2要求)
- 传输加密:强制所有连接使用TLS 1.2+(Req 4)
- 审计日志:使用pgAudit/审计插件,发送至不可变SIEM(Req 10)
- 密钥管理:密钥存储在HSM/KMS中,不得与数据共存(Req 3.6)
- 密钥轮换:数据加密密钥(DEK)每90天轮换一次,主密钥每年轮换一次(Req 3.7)
- 访问控制:角色分离、无共享账户、CDE访问需MFA(Req 7、8)
- 数据掩码:PAN仅显示最后4位(Req 3.3)
- 非生产环境不得存储持卡人数据(Req 6.5.4)
- 每季度访问审查已文档化(Req 7.2.5)
- 漏洞扫描覆盖数据库引擎,而非仅容器(Req 11.3)
Reference Files
参考文件
- - engine configuration templates
references/config-templates.md - - backup, restore, and PITR patterns
references/backup-patterns.md - - cross-engine migration patterns and type-mapping guidance
references/migration-patterns.md
- ——引擎配置模板
references/config-templates.md - ——备份、恢复及PITR模式
references/backup-patterns.md - ——跨引擎迁移模式及类型映射指南
references/migration-patterns.md
Rules
不可违反规则
These are non-negotiable. Violating any of these is a bug.
- Backups without restore tests are not backups. Test restores monthly. Document the procedure.
- No auth in
trust. Not in dev, not in Docker, not anywhere. Usepg_hba.conf.scram-sha-256 - MySQL strict mode ON. prevents silent data truncation. Without it, MySQL silently corrupts your data.
STRICT_TRANS_TABLES - TLS enforced, not just available. ,
require_secure_transport,hostssl. Connections without TLS are a finding.requireTLS - Connection pooler for PostgreSQL. PG's process-per-connection model doesn't scale without one. Use PgBouncer.
- Indexes on foreign keys in PostgreSQL. PG doesn't auto-create them. Missing FK indexes cause sequential scans on JOINs and cascading DELETEs.
- for MySQL, always.
utf8mb4is a lie - it's 3-byte only, can't store emoji or many CJK characters.utf8 - for PostgreSQL, always. Bare
timestamptzstores no timezone, breaks when server timezone changes.timestamp - Parameterized queries everywhere. String concatenation for SQL is a bug, not a shortcut. Doubly true for AI-generated code.
- Disk-level encryption is insufficient for PCI-DSS 4.0. Req 3.5.1.2 requires TDE, column-level, or application-layer encryption.
- Patch MongoBleed (CVE-2025-14847). Self-hosted MongoDB < 8.0.17 / 7.0.28 / 6.0.27 is actively exploitable with no authentication required.
- Patch MongoDB compression DoS (CVE-2026-25611). Pre-auth DoS via crafted OP_COMPRESSED messages. Default config affected (compression enabled since 3.6). Fixed in 8.0.18+ / 8.2.4+ / 7.0.29+.
- Patch PgBouncer (CVE-2025-12819). PgBouncer < 1.25.1 can allow unauthenticated SQL execution when includes
track_extra_parametersANDsearch_pathis set (both non-default). Upgrade regardless - the fix is low-risk.auth_user - Chunk bulk inserts. Never build a single with an unbounded row list. Compute batch size from the lowest host-parameter ceiling across supported backends (
INSERT ... VALUES). Wrap chunks in one transaction when atomicity matters.floor(limit / columns_per_row) - Run the AI self-check. Every generated migration, schema, or config gets verified against the checklist above before returning.
以下规则不可违反,违反任何一条均视为Bug。
- 无恢复测试的备份不算备份。 每月测试恢复流程并文档化。
- 中不得使用
pg_hba.conf认证。 开发环境、Docker环境均不可使用。必须使用trust。scram-sha-256 - MySQL必须启用严格模式。 防止静默数据截断。无此设置,MySQL会静默损坏数据。
STRICT_TRANS_TABLES - 强制使用TLS,而非仅可选。 使用、
require_secure_transport、hostssl。无TLS的连接会被视为合规问题。requireTLS - PostgreSQL必须使用连接池。 PostgreSQL的“每连接一个进程”模型不具备扩展性,必须使用PgBouncer。
- PostgreSQL外键列必须有索引。 PostgreSQL不会自动创建,缺失外键索引会导致JOIN时的全表扫描及级联DELETE问题。
- MySQL必须始终使用。
utf8mb4是伪命题——仅支持3字节,无法存储表情或许多CJK字符。utf8 - PostgreSQL必须始终使用。 裸
timestamptz不存储时区,服务器时区变更时会出错。timestamp - 所有场景使用参数化查询。 SQL字符串拼接是Bug,而非捷径。AI生成代码更需严格遵守。
- 仅磁盘级加密不符合PCI-DSS 4.0要求。 Req 3.5.1.2要求使用TDE、列级加密或应用层加密。
- 针对MongoBleed(CVE-2025-14847)打补丁。 自托管MongoDB < 8.0.17 / 7.0.28 / 6.0.27版本无需认证即可被主动利用。
- 针对MongoDB压缩DoS漏洞(CVE-2026-25611)打补丁。 无需认证即可通过构造OP_COMPRESSED消息实现DoS。默认配置受影响(3.6版本起默认启用压缩)。需升级至8.0.18+ / 8.2.4+ / 7.0.29+。
- 针对PgBouncer(CVE-2025-12819)打补丁。 PgBouncer < 1.25.1版本在包含
track_extra_parameters且设置search_path时(均为非默认配置),可能允许未认证的SQL执行。无论配置如何都需升级——修复风险极低。auth_user - 批量插入必须分块。 切勿构建包含无界行列表的单个语句。根据所有支持后端的最低参数上限计算批次大小(
INSERT ... VALUES)。若需原子性,将分块写入包裹在单个事务内。floor(limit / columns_per_row) - 必须执行AI自检。 所有生成的迁移脚本、架构或配置在返回前都需对照上述清单验证。
Related Skills
相关技能
- code-review - has a reference for application-level database bug patterns (transaction misuse, NULL handling, ORM N+1, type coercion). This skill covers engine configuration and operations; code-review covers how the application uses the database.
databases.md - security-audit - for SQL injection detection and credential scanning in application code
- kubernetes - for deploying databases on K8s (StatefulSets, operators, PVCs)
- terraform - for provisioning managed databases (RDS, Cloud SQL, Atlas)
- docker - for database containers in Docker Compose
- ansible - for database server configuration management
- code-review——包含参考文档,涵盖应用层面的数据库Bug模式(事务误用、NULL处理、ORM N+1查询、类型转换)。本技能覆盖引擎配置与运维;code-review覆盖应用如何使用数据库。
databases.md - security-audit——用于检测SQL注入及应用代码中的凭证扫描
- kubernetes——用于在K8s上部署数据库(StatefulSets、Operator、PVC)
- terraform——用于配置托管型数据库(RDS、Cloud SQL、Atlas)
- docker——用于Docker Compose中的数据库容器
- ansible——用于数据库服务器配置管理