databases

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Databases: Production Configuration & Operations

数据库：生产环境配置与运维

Configure, tune, design schemas, migrate, back up, and review database engines - from single-node dev setups to PCI-compliant production clusters. The goal is correct, performant, durable databases that survive failures, pass audits, and don't wake you up at 3am.

Target versions (March 2026):

PostgreSQL 18.3 (EOL 2030-11), previous: 17.9, 16.13
MongoDB 8.0.20 (GA), 8.2.6 (rapid release, EOL 2026-07)
MariaDB 11.8.6 (LTS, EOL 2028-06), 12.2.2 (rolling GA, EOL 2026-05)
MySQL 8.4.8 (LTS), 9.6.0 (innovation)
SQL Server 2025 RTM + CU3 (GA 2025-11-18)
PgBouncer 1.25.1, Pgpool-II 4.7.1, ProxySQL 3.0.6

This skill covers six domains depending on context:

Configuration - engine settings, authentication, TLS, tuning parameters
Schema design - indexing strategy, partitioning, normalization, type selection
Migration - cross-engine migration, zero-downtime DDL, ORM migration tooling
Operations - backup/restore, replication, connection pooling, monitoring
Performance - query plan analysis, index optimization, vacuum/maintenance
Compliance - PCI-DSS 4.0 encryption, audit logging, key management, data masking

配置、调优、设计架构、迁移、备份以及审核数据库引擎——从单节点开发环境到符合PCI标准的生产集群。目标是打造正确、高性能、高可用性的数据库，能够应对故障、通过审计，且不会在凌晨3点让你紧急处理问题。

目标版本（2026年3月）：

PostgreSQL 18.3（终止支持时间2030-11），过往版本：17.9、16.13
MongoDB 8.0.20（正式发布版），8.2.6（快速发布版，终止支持时间2026-07）
MariaDB 11.8.6（长期支持版，终止支持时间2028-06），12.2.2（滚动正式发布版，终止支持时间2026-05）
MySQL 8.4.8（长期支持版），9.6.0（创新版）
SQL Server 2025 RTM + CU3（正式发布时间2025-11-18）
PgBouncer 1.25.1、Pgpool-II 4.7.1、ProxySQL 3.0.6

本技能根据场景涵盖六大领域：

配置——引擎设置、认证、TLS、调优参数
架构设计——索引策略、分区、规范化、类型选择
迁移——跨引擎迁移、零停机DDL、ORM迁移工具
运维——备份/恢复、复制、连接池、监控
性能——查询计划分析、索引优化、清理/维护
合规——PCI-DSS 4.0加密、审计日志、密钥管理、数据掩码

When to use

适用场景

Configuring database engines (postgresql.conf, mongod.conf, my.cnf, MSSQL settings)
Designing or reviewing database schemas (indexes, partitioning, types, constraints)
Planning or executing cross-engine migrations (MySQL -> PostgreSQL, etc.)
Setting up backup/restore strategies and PITR
Configuring replication (streaming, logical, replica sets, GTID)
Tuning connection pooling (PgBouncer, ProxySQL, application-side pools)
Analyzing query performance (EXPLAIN, slow query logs, index usage)
Database-level PCI-DSS 4.0 compliance (encryption, audit logging, access control)
Evaluating managed vs self-hosted database decisions
Setting up database monitoring and alerting

配置数据库引擎（postgresql.conf、mongod.conf、my.cnf、MSSQL设置）
设计或审核数据库架构（索引、分区、类型、约束）
规划或执行跨引擎迁移（如MySQL -> PostgreSQL）
制定备份/恢复策略及PITR（时点恢复）方案
配置复制（流式复制、逻辑复制、副本集、GTID）
调优连接池（PgBouncer、ProxySQL、应用端连接池）
分析查询性能（EXPLAIN、慢查询日志、索引使用率）
数据库层面的PCI-DSS 4.0合规（加密、审计日志、访问控制）
评估托管型与自托管型数据库的选择
设置数据库监控与告警

When NOT to use

不适用场景

Deploying databases on Kubernetes (StatefulSets, PVCs, operators) - use kubernetes
Provisioning managed databases (RDS, Cloud SQL, Atlas) via IaC - use terraform
Docker Compose for database containers - use docker
Database-related Ansible playbooks and roles - use ansible
Application-level database bugs (N+1, transaction misuse, ORM pitfalls) - use code-review
SQL injection detection, connection string secrets in code - use security-audit
CI/CD pipelines that run migrations - use ci-cd

在Kubernetes上部署数据库（StatefulSets、PVC、Operator）——使用kubernetes技能
通过IaC配置托管型数据库（RDS、Cloud SQL、Atlas）——使用terraform技能
通过Docker Compose部署数据库容器——使用docker技能
数据库相关的Ansible剧本与角色——使用ansible技能
应用层面的数据库Bug（N+1查询、事务误用、ORM陷阱）——使用code-review技能
SQL注入检测、代码中的连接字符串密钥问题——使用security-audit技能
执行迁移的CI/CD流水线——使用ci-cd技能

AI Self-Check

AI自检清单

AI tools consistently produce the same database mistakes. Before returning any generated SQL, schema, migration, or config, verify against this list:

AI工具在处理数据库任务时经常犯相同的错误。在返回任何生成的SQL、架构、迁移脚本或配置之前，请对照以下清单验证：

Migrations

迁移相关

IF NOT EXISTS

IF EXISTS

guards on

ADD COLUMN

DROP COLUMN

(bare DDL crashes on re-run)

Adding
```
NOT NULL
```
column includes a
```
DEFAULT
```
value (or two-step: add nullable, backfill, alter NOT NULL)
Index creation uses
```
CONCURRENTLY
```
on PostgreSQL (prevents full table lock)
Large table changes run in batches, not a single transaction (lock escalation, OOM risk)
Migration is backward-compatible (old app version can still run against new schema)
No
```
DROP TABLE
```
or
```
DROP DATABASE
```
without explicit user confirmation
Migration is idempotent - can be run twice without error
Rollback/down migration exists and is tested
PG enum changes use
```
ALTER TYPE ... ADD VALUE
```
outside a transaction (cannot run inside BEGIN/COMMIT, fails silently in some ORMs)

```
ADD COLUMN
```
/
```
DROP COLUMN
```
操作需包含
```
IF NOT EXISTS
```
/
```
IF EXISTS
```
防护（无防护的DDL重复执行会崩溃）
添加
```
NOT NULL
```
列时需设置
```
DEFAULT
```
值（或分两步：先添加可为空列，回填数据，再修改为NOT NULL）
PostgreSQL中创建索引需使用
```
CONCURRENTLY
```
（避免全表锁）
大表变更需分批执行，而非单事务操作（避免锁升级、内存溢出风险）
迁移需保持向后兼容（旧版本应用仍能在新架构下运行）
无用户明确确认时，不得执行
```
DROP TABLE
```
或
```
DROP DATABASE
```
迁移需具备幂等性——重复执行不会报错
回滚/降级迁移脚本已存在并经过测试
PostgreSQL枚举类型变更需在事务外执行
```
ALTER TYPE ... ADD VALUE
```
（无法在BEGIN/COMMIT内运行，部分ORM中会静默失败）

Schema

架构相关

New timestamp columns use
```
timestamptz
```
(PG),
```
DATETIME2
```
(MSSQL),
```
DATETIME
```
(MySQL -
```
TIMESTAMP
```
has the 2038 problem)
Character set is
```
utf8mb4
```
for MySQL (not
```
utf8
```
which is 3-byte only),
```
UTF8
```
for PG
Identity columns use
```
GENERATED ALWAYS AS IDENTITY
```
over
```
SERIAL
```
in PG 10+ (non-bypassable)
Foreign keys have explicit
```
ON DELETE
```
behavior (don't rely on engine defaults)
Indexes exist on all foreign key columns (PG does NOT auto-create these, unlike MySQL InnoDB)
Composite index column order follows: equality columns first, range column last, selectivity-ordered

新时间戳列使用
```
timestamptz
```
（PostgreSQL）、
```
DATETIME2
```
（MSSQL）、
```
DATETIME
```
（MySQL——
```
TIMESTAMP
```
存在2038年问题）
MySQL字符集使用
```
utf8mb4
```
（而非仅支持3字节的
```
utf8
```
），PostgreSQL使用
```
UTF8
```
PostgreSQL 10+版本中，身份列使用
```
GENERATED ALWAYS AS IDENTITY
```
而非
```
SERIAL
```
（不可绕过）
外键需明确设置
```
ON DELETE
```
行为（不要依赖引擎默认值）
所有外键列均存在索引（PostgreSQL不会自动创建，与MySQL InnoDB不同）
复合索引列顺序遵循：等值列优先，范围列最后，按选择性排序

Configuration

配置相关

No
```
trust
```
or
```
md5
```
authentication in
```
pg_hba.conf
```
(use
```
scram-sha-256
```
)
MySQL
```
sql_mode
```
includes
```
STRICT_TRANS_TABLES
```
(prevents silent data truncation)
TLS enforced for all connections (not just "available")
```
max_connections
```
is sized for the actual workload, not left at default
Password authentication uses modern hashing (SCRAM-SHA-256 for PG,
```
caching_sha2_password
```
for MySQL)
No default/example passwords in config files

```
pg_hba.conf
```
中不得使用
```
trust
```
或
```
md5
```
认证（使用
```
scram-sha-256
```
）
MySQL
```
sql_mode
```
需包含
```
STRICT_TRANS_TABLES
```
（防止静默数据截断）
强制所有连接使用TLS（而非仅“可选”）
```
max_connections
```
根据实际工作负载调整，而非保留默认值
密码认证使用现代哈希算法（PostgreSQL用SCRAM-SHA-256，MySQL用
```
caching_sha2_password
```
）
配置文件中无默认/示例密码

Bulk Operations

批量操作相关

Bulk inserts are chunked - never pass an unbounded array into a single
```
INSERT ... VALUES
```
statement
Chunk size computed as
```
floor(max_host_parameters / columns_per_row)
```
, where host-parameter limits are: PostgreSQL 65,535, MySQL 65,535, SQLite 32,766 (default
```
SQLITE_MAX_VARIABLE_NUMBER
```
), MSSQL 2,100
When the app targets multiple backends, chunk size uses the lowest limit across all supported engines
Chunked writes stay inside one transaction when replace-all semantics are required (DELETE + INSERT pattern)
Tests assert that multiple insert calls fire when row count crosses the safe chunk threshold

批量插入需分块——切勿将无界数组传入单个
```
INSERT ... VALUES
```
语句
分块大小计算公式为
```
floor(max_host_parameters / columns_per_row)
```
，各引擎参数上限：PostgreSQL 65,535、MySQL 65,535、SQLite 32,766（默认
```
SQLITE_MAX_VARIABLE_NUMBER
```
）、MSSQL 2,100
当应用面向多后端时，分块大小取所有支持引擎中的最低上限
若需替换全部数据语义（DELETE + INSERT模式），分块写入需在单个事务内完成
测试需断言：当行数超过安全分块阈值时，会触发多次插入调用

General

通用规则

All SQL uses parameterized queries / prepared statements (never string concatenation)
Connection pool settings don't exceed
```
max_connections
```
across all app instances
Backup strategy tested by actually restoring (backup without restore test = hope, not a strategy)
Secrets (passwords, connection strings) injected via env vars or secret managers, not config files

所有SQL使用参数化查询/预编译语句（切勿使用字符串拼接）
连接池设置不得超过所有应用实例的
```
max_connections
```
总和
备份策略需通过实际恢复测试（无恢复测试的备份只是侥幸，而非有效策略）
密钥（密码、连接字符串）通过环境变量或密钥管理器注入，而非存储在配置文件中

Workflow

工作流程

Step 1: Determine the domain

步骤1：确定领域

Based on the request:

"Configure PostgreSQL / tune settings" -> Configuration
"Design a schema / review indexes" -> Schema design
"Migrate from X to Y / add a column" -> Migration
"Set up backups / replication / pooling" -> Operations
"This query is slow / optimize" -> Performance
"PCI audit / encrypt database" -> Compliance
"Review this schema/config" -> Apply production checklist + AI self-check

根据请求类型：

“配置PostgreSQL / 调优设置” -> 配置领域
“设计架构 / 审核索引” -> 架构设计领域
“从X迁移到Y / 添加列” -> 迁移领域
“设置备份 / 复制 / 连接池” -> 运维领域
“查询缓慢 / 优化性能” -> 性能领域
“PCI审计 / 加密数据库” -> 合规领域
“审核此架构/配置” -> 应用生产环境检查清单 + AI自检清单

Step 2: Gather requirements

步骤2：收集需求

Before writing SQL or config, determine these. If the user doesn't specify, use the sensible defaults shown:

Engine and version - behavior differs significantly across versions. Default: latest stable (see target versions above).
Deployment model - self-hosted (bare metal, VM, container, K8s) vs managed (RDS, Cloud SQL, Atlas). Default: assume self-hosted unless context says otherwise.
Workload type - OLTP (many small transactions) vs OLAP (few large queries) vs mixed. Default: OLTP.
Data volume - row counts, table sizes, growth rate. Default: medium (1-100GB). If unknown, optimize for growth.
Compliance - PCI-DSS CDE? HIPAA? What data classification? Default: no compliance scope (but still apply security baseline).
HA requirements - RTO/RPO targets, multi-AZ, read replicas. Default: single-node with PITR.
Existing infrastructure - what's already running, what ORMs/drivers are in use. Inspect the codebase if accessible.

在编写SQL或配置之前，需明确以下内容。若用户未指定，使用以下合理默认值：

引擎及版本——不同版本行为差异显著。默认：最新稳定版（见上方目标版本）。
部署模式——自托管（裸机、虚拟机、容器、K8s） vs 托管型（RDS、Cloud SQL、Atlas）。默认：除非上下文说明，否则假设为自托管。
工作负载类型——OLTP（大量小事务） vs OLAP（少量大查询） vs 混合负载。默认：OLTP。
数据量——行数、表大小、增长率。默认：中等规模（1-100GB）。若未知，按可扩展优化。
合规要求——是否属于PCI-DSS CDE？HIPAA？数据分类是什么？默认：无合规范围（但仍需应用安全基线）。
高可用要求——RTO/RPO目标、多AZ、只读副本。默认：单节点+PITR。
现有基础设施——当前运行的系统、使用的ORM/驱动。若可访问，检查代码库。

Step 3: Build

步骤3：执行任务

Follow the domain-specific section below. Always apply the production checklist and AI self-check before finishing.

Common operations - quick-start patterns:

Query optimization (the most frequent request):

Get the plan:

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT ...;

(PG),

EXPLAIN FORMAT=TREE ...;

(MySQL 8.0+),

db.collection.explain('executionStats').find(...)

(MongoDB).

Look for:
```
Seq Scan
```
on large tables (PG) /
```
Full Table Scan
```
(MySQL) /
```
COLLSCAN
```
(MongoDB),
```
Nested Loop
```
with high row estimates,
```
Sort
```
spilling to disk (
```
Sort Method: external merge
```
), and
```
Rows Removed by Filter
```
>>
```
Rows
```
returned. 2a. Before assuming an index problem, verify semantic correctness: check JOIN conditions for column mismatches, confirm WHERE clause predicates don't unintentionally filter nulls, and verify LEFT JOIN semantics aren't silently converted to INNER JOIN by outer-table filters.

Check index usage:

SELECT schemaname, relname, idx_scan, seq_scan FROM pg_stat_user_tables WHERE seq_scan > 100 ORDER BY seq_scan DESC;

(PG; adjust

schemaname = 'public'

filter for non-public schemas) or

SELECT * FROM sys.schema_unused_indexes;

(MySQL performance_schema).

If a missing index is the fix, create it with

CONCURRENTLY

(PG):

CREATE INDEX CONCURRENTLY idx_orders_customer ON orders (customer_id);

ALGORITHM=INPLACE, LOCK=NONE

(MySQL).

Re-run
```
EXPLAIN ANALYZE
```
to confirm the planner uses the new index and that estimated vs actual rows are close.

Lock contention / deadlock diagnosis (second most common "it's stuck" scenario):

PG:

SELECT pid, age(backend_xact_start), query, wait_event_type, wait_event FROM pg_stat_activity WHERE state != 'idle' AND wait_event IS NOT NULL ORDER BY age(backend_xact_start) DESC;

PG blocked queries:

SELECT blocked.pid AS blocked_pid, blocked.query AS blocked_query, blocking.pid AS blocking_pid, blocking.query AS blocking_query FROM pg_stat_activity blocked JOIN pg_locks bl ON bl.pid = blocked.pid JOIN pg_locks kl ON kl.locktype = bl.locktype AND kl.database IS NOT DISTINCT FROM bl.database AND kl.relation IS NOT DISTINCT FROM bl.relation AND kl.page IS NOT DISTINCT FROM bl.page AND kl.tuple IS NOT DISTINCT FROM bl.tuple AND kl.transactionid IS NOT DISTINCT FROM bl.transactionid AND kl.pid != bl.pid JOIN pg_stat_activity blocking ON blocking.pid = kl.pid WHERE NOT bl.granted AND kl.granted;

MySQL:

SHOW ENGINE INNODB STATUS\G

- look for

LATEST DETECTED DEADLOCK

section. Also:

SELECT * FROM performance_schema.data_lock_waits;

(MySQL 8.0+).

MongoDB:

db.currentOp({"waitingForLock": true})

and check

mongod

log for

LockTimeout

entries.

Fix: kill the blocking query if it's stuck (
```
SELECT pg_terminate_backend(PID);
```
/
```
KILL CONNECTION thread_id;
```
), then investigate why the lock was held (long transactions, missing indexes on UPDATE/DELETE WHERE clauses, lock ordering bugs in application code).

Connection pooler sizing (second most common):

Determine backend budget:
```
max_connections
```
minus superuser_reserved minus replication slots = available.

PgBouncer

default_pool_size

per user/db pair: start at

available / number_of_app_instances

, round down.

Set
```
max_client_conn
```
to the total connections your app fleet will open (all instances combined).
Use
```
transaction
```
pool mode for web apps (stateless requests). Use
```
session
```
mode only if the app uses prepared statements without PgBouncer 1.21+
```
max_prepared_statements
```
, temp tables, or
```
SET
```
commands.

Validate:

psql -h pgbouncer-host -p 6432 pgbouncer -c "SHOW POOLS;"

- watch

sv_active

sv_idle

under load.

Migration safety check (before running any DDL in production):

Verify the migration is idempotent:
```
IF NOT EXISTS
```
/
```
IF EXISTS
```
guards on all DDL.
Check table size:
```
SELECT pg_size_pretty(pg_total_relation_size('tablename'));
```
- anything over 1GB needs batched operations or
```
CONCURRENTLY
```
indexes.
Verify backward compatibility: can the current app version still function after this DDL runs?
Test the rollback/down migration against a copy of production data, not just an empty schema.
Run during low-traffic window if the operation takes locks (even brief ones).

遵循以下领域特定操作。完成前务必应用生产环境检查清单和AI自检清单。

常见操作——快速入门模式：

查询优化（最常见请求）：

获取执行计划：

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT ...;

（PostgreSQL）、

EXPLAIN FORMAT=TREE ...;

（MySQL 8.0+）、

db.collection.explain('executionStats').find(...)

（MongoDB）。

关注以下情况：大表上的
```
Seq Scan
```
（PostgreSQL）/
```
Full Table Scan
```
（MySQL）/
```
COLLSCAN
```
（MongoDB）、高行数预估的
```
Nested Loop
```
、磁盘溢出排序（
```
Sort Method: external merge
```
）、
```
Rows Removed by Filter
```
远大于返回
```
Rows
```
。 2a. 在假设索引问题前，先验证语义正确性：检查JOIN条件的列是否匹配，确认WHERE子句谓词不会意外过滤空值，验证LEFT JOIN语义未因外表过滤被静默转换为INNER JOIN。

检查索引使用率：

SELECT schemaname, relname, idx_scan, seq_scan FROM pg_stat_user_tables WHERE seq_scan > 100 ORDER BY seq_scan DESC;

（PostgreSQL；非public架构需调整

schemaname = 'public'

过滤条件）或

SELECT * FROM sys.schema_unused_indexes;

（MySQL performance_schema）。

若需添加缺失索引，PostgreSQL中使用

CONCURRENTLY

：

CREATE INDEX CONCURRENTLY idx_orders_customer ON orders (customer_id);

，MySQL中使用

ALGORITHM=INPLACE, LOCK=NONE

。

重新运行
```
EXPLAIN ANALYZE
```
确认查询优化器使用新索引，且预估行数与实际行数接近。

锁竞争/死锁诊断（第二常见的“卡住”场景）：

PostgreSQL：

SELECT pid, age(backend_xact_start), query, wait_event_type, wait_event FROM pg_stat_activity WHERE state != 'idle' AND wait_event IS NOT NULL ORDER BY age(backend_xact_start) DESC;

PostgreSQL阻塞查询：

SELECT blocked.pid AS blocked_pid, blocked.query AS blocked_query, blocking.pid AS blocking_pid, blocking.query AS blocking_query FROM pg_stat_activity blocked JOIN pg_locks bl ON bl.pid = blocked.pid JOIN pg_locks kl ON kl.locktype = bl.locktype AND kl.database IS NOT DISTINCT FROM bl.database AND kl.relation IS NOT DISTINCT FROM bl.relation AND kl.page IS NOT DISTINCT FROM bl.page AND kl.tuple IS NOT DISTINCT FROM bl.tuple AND kl.transactionid IS NOT DISTINCT FROM bl.transactionid AND kl.pid != bl.pid JOIN pg_stat_activity blocking ON blocking.pid = kl.pid WHERE NOT bl.granted AND kl.granted;

MySQL：

SHOW ENGINE INNODB STATUS\G

——查看

LATEST DETECTED DEADLOCK

部分。此外：

SELECT * FROM performance_schema.data_lock_waits;

（MySQL 8.0+）。

MongoDB：

db.currentOp({"waitingForLock": true})

并检查

mongod

日志中的

LockTimeout

条目。

修复：若阻塞查询卡住，终止该查询（
```
SELECT pg_terminate_backend(PID);
```
/
```
KILL CONNECTION thread_id;
```
），然后调查锁持有原因（长事务、UPDATE/DELETE WHERE子句缺失索引、应用代码中的锁顺序Bug）。

连接池大小调优（第二常见场景）：

确定后端预算：
```
max_connections
```
减去超级用户预留数和复制槽数 = 可用连接数。
PgBouncer每个用户/数据库对的
```
default_pool_size
```
：初始值为
```
可用连接数 / 应用实例数
```
，向下取整。
设置
```
max_client_conn
```
为应用集群所有实例的总连接数。
Web应用使用
```
transaction
```
池模式（无状态请求）。仅当应用使用预编译语句且未启用PgBouncer 1.21+的
```
max_prepared_statements
```
、临时表或
```
SET
```
命令时，才使用
```
session
```
模式。

验证：

psql -h pgbouncer-host -p 6432 pgbouncer -c "SHOW POOLS;"

——负载下监控

sv_active

sv_idle

。

迁移安全检查（生产环境执行任何DDL前）：

验证迁移具备幂等性：所有DDL均包含
```
IF NOT EXISTS
```
/
```
IF EXISTS
```
防护。
检查表大小：
```
SELECT pg_size_pretty(pg_total_relation_size('tablename'));
```
——超过1GB的表需分批操作或使用
```
CONCURRENTLY
```
创建索引。
验证向后兼容性：当前应用版本在DDL执行后仍能正常运行吗？
针对生产数据副本测试回滚/降级迁移脚本，而非仅测试空架构。
若操作会加锁（即使短暂），选择低流量时段执行。

Step 4: Validate

步骤4：验证

bash

undefined

bash

undefined

PostgreSQL

psql -c "SHOW config_file;" # verify config location pg_isready -h localhost # connection check psql -c "SELECT * FROM pg_hba_file_rules;" # verify pg_hba.conf psql -c "EXPLAIN (ANALYZE, BUFFERS) <query>;" # query plan analysis

psql -c "SHOW config_file;" # 验证配置文件位置 pg_isready -h localhost # 连接检查 psql -c "SELECT * FROM pg_hba_file_rules;" # 验证pg_hba.conf psql -c "EXPLAIN (ANALYZE, BUFFERS) <query>;" # 查询计划分析

MongoDB

mongosh --eval "db.adminCommand({getCmdLineOpts: 1})" # verify config mongosh --eval "rs.status()" # replica set health mongosh --eval "db.collection.explain('executionStats').find({})"

mongosh --eval "db.adminCommand({getCmdLineOpts: 1})" # 验证配置 mongosh --eval "rs.status()" # 副本集健康检查 mongosh --eval "db.collection.explain('executionStats').find({})"

MySQL / MariaDB

mysql -e "SELECT @@sql_mode;" # verify strict mode mysql -e "SHOW VARIABLES LIKE 'innodb%';" # InnoDB settings mysql -e "EXPLAIN FORMAT=TREE <query>;" # query plan (MySQL 8.0+)

mysql -e "SELECT @@sql_mode;" # 验证严格模式 mysql -e "SHOW VARIABLES LIKE 'innodb%';" # InnoDB设置 mysql -e "EXPLAIN FORMAT=TREE <query>;" # 查询计划（MySQL 8.0+）

MSSQL

sqlcmd -Q "SELECT @@VERSION;" sqlcmd -Q "DBCC CHECKDB ('dbname') WITH NO_INFOMSGS;" # integrity check

---

sqlcmd -Q "SELECT @@VERSION;" sqlcmd -Q "DBCC CHECKDB ('dbname') WITH NO_INFOMSGS;" # 完整性检查

---

Engine Routing

引擎专属要点

PostgreSQL: SCRAM, poolers, WAL or logical replication,
```
pg_stat_statements
```
, and careful vacuum strategy
MongoDB: replica-set health, schema validation, oplog sizing, and avoiding fan-out document patterns
MySQL/MariaDB: strict mode,
```
utf8mb4
```
, GTID or Galera choices, and understanding the MySQL/MariaDB divergence
MSSQL: memory limits, TempDB layout, Query Store, and backup or restore discipline

Read

references/config-templates.md

for copy-pasteable engine configs and

references/backup-patterns.md

for recovery specifics.

PostgreSQL：SCRAM认证、连接池、WAL或逻辑复制、
```
pg_stat_statements
```
、精细化清理策略
MongoDB：副本集健康状态、架构验证、oplog大小、避免扇出文档模式
MySQL/MariaDB：严格模式、
```
utf8mb4
```
、GTID或Galera选择、理解MySQL与MariaDB的差异
MSSQL：内存限制、TempDB布局、查询存储、备份与恢复规范

可参考

references/config-templates.md

获取可直接复制的引擎配置模板，参考

references/backup-patterns.md

获取恢复细节。

Schema, Migration, and Performance

架构、迁移与性能

Favor expand-contract for zero-downtime schema changes.
Composite index order still follows equality, sort, then range.
Choose tenant isolation deliberately; PCI-sensitive shared-schema designs need extra scrutiny.
Treat query-plan review and monitoring as normal operations, not emergency-only work.
Watch for
```
WHERE
```
clauses that nullify
```
LEFT JOIN
```
semantics (filtering the outer table converts it to
```
INNER JOIN
```
- move the filter into the
```
ON
```
clause or use a subquery).

零停机架构变更优先采用“扩展-收缩”模式。
复合索引顺序仍遵循“等值、排序、范围”原则。
需谨慎选择租户隔离方式；涉及PCI敏感数据的共享架构设计需额外审查。
将查询计划审查与监控视为常规运维工作，而非仅用于紧急情况。
注意可能破坏LEFT JOIN语义的WHERE子句（过滤外表会将其转换为INNER JOIN——需将过滤条件移至ON子句或使用子查询）。

Major version upgrades

大版本升级

Three approaches, pick by downtime tolerance:

Method	Downtime	Best for
`pg_upgrade --link`	Minutes (metadata copy)	Small-to-medium DBs where brief downtime is acceptable
Logical replication	Seconds (cutover only)	Large DBs, zero-downtime requirement, PG 10+ to any higher
`pg_dump` / `pg_restore`	Hours (full copy)	Cross-engine, major schema changes, or when logical replication is impractical

Zero-downtime with logical replication (PG -> PG):

Stand up new version replica alongside the primary

Create publication on source:

CREATE PUBLICATION upgrade_pub FOR ALL TABLES;

Create subscription on target:

CREATE SUBSCRIPTION upgrade_sub CONNECTION '...' PUBLICATION upgrade_pub;

Wait for initial sync + catchup (monitor
```
pg_stat_subscription
```
, replication lag)
Migrate sequences: logical replication does not replicate sequence values - copy them manually
Test application against the new version (read traffic, connection pooler split). Monitor during split-test: query latency p50/p95/p99 (compare old vs new - regression means planner stats or config drift), error rates by query type (new version may have stricter behavior), connection pool saturation (
```
SHOW POOLS
```
in PgBouncer - watch
```
cl_waiting
```
), replication lag trend (should stay flat or decrease, never grow during read-only test), and memory/CPU on the new instance. Run for at least one full traffic cycle (24h if your workload is diurnal). Abort and route back to old primary if error rate exceeds baseline or p99 latency degrades >20%.
Cutover: stop writes to old primary, verify lag = 0, point connection pooler to new primary
Drop subscription and decommission old primary

Rollback procedure (if replication stalls or validation fails):

Lag not reaching zero: Check
```
pg_stat_subscription
```
for
```
last_msg_send_time
```
vs
```
last_msg_receipt_time
```
delta. Common causes: long-running transactions on source blocking WAL send, tables missing primary keys (forces full-row comparison), or network throughput limits. If lag is stuck, check
```
pg_replication_slots
```
on the source for
```
active = false
```
- an inactive slot means the subscription dropped. Recreate the subscription; do not proceed with cutover.
Application validation fails on new version: Route all traffic back to the old primary (update pooler config). The old primary never stopped accepting writes, so no data is lost. Drop the subscription on the new instance:
```
DROP SUBSCRIPTION upgrade_sub;
```
- this also drops the replication slot on the source. Investigate, fix, and restart from step 3.
Post-cutover rollback (writes already went to new primary): This is the hard case. Options: (a) set up reverse logical replication from new -> old before decommissioning the old primary (plan this before cutover if RTO requires it), or (b) restore old primary from backup + WAL and accept data loss for the cutover window. Option (a) requires the old primary to still be running. Decision point: if you need rollback after cutover, set up reverse replication in step 6 before routing write traffic.

Key pitfalls (check all of these before starting):

Tables need primary keys or

REPLICA IDENTITY FULL

- check first:

SELECT c.relname FROM pg_class c JOIN pg_namespace n ON c.relnamespace = n.oid WHERE n.nspname = 'public' AND c.relkind = 'r' AND NOT EXISTS (SELECT 1 FROM pg_constraint WHERE conrelid = c.oid AND contype = 'p');

DDL is not replicated - schema changes during migration need manual sync on both sides
Large objects (
```
lo
```
) are not replicated
Sequence values drift - copy them manually after cutover, not before

Read

references/migration-patterns.md

for cross-engine type mapping, ORM migration tooling, and detailed migration patterns.

三种方式，根据停机容忍度选择：

方法	停机时间	适用场景
`pg_upgrade --link`	分钟级（仅复制元数据）	中小规模数据库，可接受短暂停机
逻辑复制	秒级（仅切换阶段）	大规模数据库、要求零停机、PostgreSQL 10+升级至更高版本
`pg_dump` / `pg_restore`	小时级（全量复制）	跨引擎迁移、重大架构变更、逻辑复制不可行的场景

逻辑复制实现零停机升级（PostgreSQL -> PostgreSQL）：

在主节点旁部署新版本副本

在源端创建发布：

CREATE PUBLICATION upgrade_pub FOR ALL TABLES;

在目标端创建订阅：

CREATE SUBSCRIPTION upgrade_sub CONNECTION '...' PUBLICATION upgrade_pub;

等待初始同步与追平（监控
```
pg_stat_subscription
```
、复制延迟）
迁移序列：逻辑复制不会复制序列值——需手动复制
针对新版本测试应用（分流读流量、连接池拆分）。分流测试期间监控：查询延迟p50/p95/p99（对比新旧版本——性能退化意味着查询优化器统计信息或配置漂移）、按查询类型统计的错误率（新版本可能有更严格的行为）、连接池饱和度（PgBouncer中执行
```
SHOW POOLS
```
——监控
```
cl_waiting
```
）、复制延迟趋势（应保持平稳或下降，只读测试期间不得增长）、新实例的内存/CPU使用率。至少运行一个完整流量周期（若工作负载随时间变化则运行24小时）。若错误率超过基线或p99延迟下降超过20%，则中止并将流量切回旧主节点。
切换：停止向旧主节点写入，验证延迟=0，将连接池指向新主节点
删除订阅并停用旧主节点

回滚流程（若复制停滞或验证失败）：

延迟无法归零：检查
```
pg_stat_subscription
```
中
```
last_msg_send_time
```
与
```
last_msg_receipt_time
```
的差值。常见原因：源端长事务阻塞WAL发送、表缺少主键（需全行比较）、网络吞吐量限制。若延迟停滞，检查源端
```
pg_replication_slots
```
中的
```
active = false
```
——非活跃槽意味着订阅已断开。重新创建订阅；不得继续切换。
新版本应用验证失败：将所有流量切回旧主节点（更新连接池配置）。旧主节点从未停止接受写入，因此无数据丢失。删除新实例上的订阅：
```
DROP SUBSCRIPTION upgrade_sub;
```
——这也会删除源端的复制槽。排查问题、修复后从步骤3重新开始。
切换后回滚（已向新主节点写入）：这是最复杂的情况。选项：(a) 在停用旧主节点前，设置从新主节点到旧主节点的反向逻辑复制（若RTO要求，需在切换前规划此步骤）；(b) 从备份+WAL恢复旧主节点，并接受切换窗口内的数据丢失。选项(a)要求旧主节点仍在运行。决策点：若切换后需要回滚，在步骤6路由写流量前设置反向复制。

关键陷阱（开始前需全部检查）：

表需具备主键或

REPLICA IDENTITY FULL

——先检查：

SELECT c.relname FROM pg_class c JOIN pg_namespace n ON c.relnamespace = n.oid WHERE n.nspname = 'public' AND c.relkind = 'r' AND NOT EXISTS (SELECT 1 FROM pg_constraint WHERE conrelid = c.oid AND contype = 'p');

DDL不会被复制——迁移期间的架构变更需在两端手动同步
大对象（
```
lo
```
）不会被复制
序列值会漂移——切换后手动复制，而非切换前

可参考

references/migration-patterns.md

获取跨引擎类型映射、ORM迁移工具及详细迁移模式。

Pooling, Backup, and Platform Choice

连接池、备份与平台选择

PostgreSQL pooling is usually non-negotiable; use PgBouncer unless a concrete reason says otherwise.
Backup discipline means restore testing, encryption, retention limits, and monitoring backup freshness.
Managed databases reduce toil but do not remove shared-responsibility or compliance review.
Self-hosted databases buy control at the cost of HA, patching, and operational burden.

PostgreSQL通常必须使用连接池；除非有明确理由，否则使用PgBouncer。
备份规范包括恢复测试、加密、保留限制及备份新鲜度监控。
托管型数据库减少运维工作，但不会免除共享责任或合规审查。
自托管型数据库获得控制权，但需承担高可用、补丁更新及运维负担。

Production Checklist

生产环境检查清单

All Engines

所有引擎通用

Authentication uses modern hashing (SCRAM-SHA-256, caching_sha2_password, certificate auth)
TLS enforced for all connections (not just "available")
No default passwords, no
```
trust
```
auth, no passwordless access
Connection pool in front of the database (PgBouncer, ProxySQL, or application-side)
```
max_connections
```
sized for actual workload, not default
Backup strategy implemented, tested, and monitored
Restore procedure documented and tested (at least monthly)
Monitoring in place (connections, query performance, replication lag, disk usage)
Slow query logging enabled with appropriate threshold
Dead/unused indexes identified and removed
Character encoding correct (
```
utf8mb4
```
for MySQL,
```
UTF8
```
for PG)

认证使用现代哈希算法（SCRAM-SHA-256、caching_sha2_password、证书认证）
强制所有连接使用TLS（而非仅“可选”）
无默认密码、无
```
trust
```
认证、无无密码访问
数据库前端部署连接池（PgBouncer、ProxySQL或应用端连接池）
```
max_connections
```
根据实际工作负载调整，而非保留默认值
备份策略已实施、测试并监控
恢复流程已文档化并测试（至少每月一次）
已设置监控（连接数、查询性能、复制延迟、磁盘使用率）
已启用慢查询日志并设置合适阈值
已识别并删除无用/未使用索引
字符编码正确（MySQL用
```
utf8mb4
```
，PostgreSQL用
```
UTF8
```
）

PostgreSQL-Specific

PostgreSQL专属

```
pg_hba.conf
```
:
```
hostssl
```
only,
```
scram-sha-256
```
only, CIDR-restricted
```
shared_buffers
```
= 25% RAM,
```
effective_cache_size
```
= 75% RAM
```
work_mem
```
sized for concurrency (default 64MB is high for OLTP with many connections - per-sort, not per-connection)
```
random_page_cost = 1.1
```
for SSD storage
```
statement_timeout
```
set per-role (not globally - migrations need longer)
```
idle_in_transaction_session_timeout
```
set (60s default)
```
pg_stat_statements
```
enabled
Autovacuum tuned for large tables (
```
autovacuum_vacuum_scale_factor
```
)
WAL archiving enabled for PITR (
```
archive_mode = on
```
+ pgBackRest/Barman, or managed backup)
Foreign key columns have indexes
pgAudit installed and configured (if PCI scope)
Patched against CVE-2026-2005 (pgcrypto heap buffer overflow, RCE) - 18.2+ / 17.8+ / 16.12+

```
pg_hba.conf
```
：仅使用
```
hostssl
```
、仅使用
```
scram-sha-256
```
、限制CIDR范围
```
shared_buffers
```
= 内存的25%，
```
effective_cache_size
```
= 内存的75%
```
work_mem
```
根据并发量调整（默认64MB对多连接OLTP场景过高——按排序分配，而非按连接分配）
SSD存储设置
```
random_page_cost = 1.1
```
按角色设置
```
statement_timeout
```
（而非全局设置——迁移需要更长时间）
设置
```
idle_in_transaction_session_timeout
```
（默认60秒）
启用
```
pg_stat_statements
```
针对大表调优自动清理（
```
autovacuum_vacuum_scale_factor
```
）
启用WAL归档以支持PITR（
```
archive_mode = on
```
+ pgBackRest/Barman，或托管备份）
外键列均有索引
若属于PCI范围，安装并配置pgAudit
针对CVE-2026-2005（pgcrypto堆缓冲区溢出、远程代码执行）打补丁——需升级至18.2+ / 17.8+ / 16.12+

MySQL/MariaDB-Specific

MySQL/MariaDB专属

```
sql_mode
```
includes
```
STRICT_TRANS_TABLES
```
(prevents silent data truncation)
```
innodb_buffer_pool_size
```
= 70% RAM
```
innodb_flush_log_at_trx_commit = 1
```
for durability
```
require_secure_transport = ON
```
```
character-set-server = utf8mb4
```
Binary log enabled for PITR (
```
log_bin = ON
```
)
```
innodb_file_per_table = ON
```
MariaDB patched against CVE-2026-32710 (JSON_SCHEMA_VALID crash/RCE) - 11.8.6+ / 11.4.10+

```
sql_mode
```
包含
```
STRICT_TRANS_TABLES
```
（防止静默数据截断）
```
innodb_buffer_pool_size
```
= 内存的70%
```
innodb_flush_log_at_trx_commit = 1
```
（保证持久性）
```
require_secure_transport = ON
```
```
character-set-server = utf8mb4
```
启用二进制日志以支持PITR（
```
log_bin = ON
```
）
```
innodb_file_per_table = ON
```
针对CVE-2026-32710（JSON_SCHEMA_VALID崩溃/远程代码执行）打补丁——需升级至MariaDB 11.8.6+ / 11.4.10+

MongoDB-Specific

MongoDB专属

```
security.authorization: enabled
```
(never run without auth)
Replica set with 3+ members (not standalone in production)
Write concern
```
w: "majority"
```
(default in 8.0+)
Schema validation (
```
$jsonSchema
```
) on critical collections
Patched against MongoBleed (CVE-2025-14847) - 8.0.17+
Patched against CVE-2026-25611 (pre-auth DoS via compression) - 8.0.18+ / 8.2.4+
TLS enabled (
```
net.tls.mode: requireTLS
```
)

```
security.authorization: enabled
```
（切勿在无认证情况下运行）
使用3个及以上成员的副本集（生产环境不得使用单节点）
写入关注设置
```
w: "majority"
```
（8.0+版本默认）
关键集合启用架构验证（
```
$jsonSchema
```
）
针对MongoBleed（CVE-2025-14847）打补丁——需升级至8.0.17+
针对MongoDB压缩DoS漏洞（CVE-2026-25611）打补丁——需升级至8.0.18+ / 8.2.4+
启用TLS（
```
net.tls.mode: requireTLS
```
）

MSSQL-Specific

MSSQL专属

```
Max Server Memory
```
set explicitly (not unlimited)
```
MAXDOP
```
set to core count per NUMA node
```
Cost Threshold for Parallelism
```
raised from default 5
TempDB files = min(CPU cores, 8), equal size
Query Store enabled
Recovery model = FULL for production databases
TDE enabled for CDE databases
Patched against CVE-2026-21262 (privilege escalation) - March 2026 CU+

明确设置
```
Max Server Memory
```
（而非无限制）
```
MAXDOP
```
设置为每个NUMA节点的核心数
将
```
Cost Threshold for Parallelism
```
从默认值5调高
TempDB文件数 = min(CPU核心数, 8)，且所有文件大小相等
启用查询存储
生产数据库恢复模式设置为FULL
CDE数据库启用TDE
针对CVE-2026-21262（权限提升）打补丁——需升级至2026年3月CU+

Compliance (PCI-DSS 4.0)

合规（PCI-DSS 4.0）

Reference Files

参考文件

```
references/config-templates.md
```
- engine configuration templates
```
references/backup-patterns.md
```
- backup, restore, and PITR patterns
```
references/migration-patterns.md
```
- cross-engine migration patterns and type-mapping guidance

```
references/config-templates.md
```
——引擎配置模板
```
references/backup-patterns.md
```
——备份、恢复及PITR模式
```
references/migration-patterns.md
```
——跨引擎迁移模式及类型映射指南

Rules

不可违反规则

These are non-negotiable. Violating any of these is a bug.

Backups without restore tests are not backups. Test restores monthly. Document the procedure.
No
trust
auth in
pg_hba.conf
. Not in dev, not in Docker, not anywhere. Use
```
scram-sha-256
```
.
MySQL strict mode ON.
```
STRICT_TRANS_TABLES
```
prevents silent data truncation. Without it, MySQL silently corrupts your data.
TLS enforced, not just available.
```
require_secure_transport
```
,
```
hostssl
```
,
```
requireTLS
```
. Connections without TLS are a finding.
Connection pooler for PostgreSQL. PG's process-per-connection model doesn't scale without one. Use PgBouncer.
Indexes on foreign keys in PostgreSQL. PG doesn't auto-create them. Missing FK indexes cause sequential scans on JOINs and cascading DELETEs.
utf8mb4
for MySQL, always.
```
utf8
```
is a lie - it's 3-byte only, can't store emoji or many CJK characters.
timestamptz
for PostgreSQL, always. Bare
```
timestamp
```
stores no timezone, breaks when server timezone changes.
Parameterized queries everywhere. String concatenation for SQL is a bug, not a shortcut. Doubly true for AI-generated code.
Disk-level encryption is insufficient for PCI-DSS 4.0. Req 3.5.1.2 requires TDE, column-level, or application-layer encryption.
Patch MongoBleed (CVE-2025-14847). Self-hosted MongoDB < 8.0.17 / 7.0.28 / 6.0.27 is actively exploitable with no authentication required.
Patch MongoDB compression DoS (CVE-2026-25611). Pre-auth DoS via crafted OP_COMPRESSED messages. Default config affected (compression enabled since 3.6). Fixed in 8.0.18+ / 8.2.4+ / 7.0.29+.
Patch PgBouncer (CVE-2025-12819). PgBouncer < 1.25.1 can allow unauthenticated SQL execution when
```
track_extra_parameters
```
includes
```
search_path
```
AND
```
auth_user
```
is set (both non-default). Upgrade regardless - the fix is low-risk.
Chunk bulk inserts. Never build a single
```
INSERT ... VALUES
```
with an unbounded row list. Compute batch size from the lowest host-parameter ceiling across supported backends (
```
floor(limit / columns_per_row)
```
). Wrap chunks in one transaction when atomicity matters.
Run the AI self-check. Every generated migration, schema, or config gets verified against the checklist above before returning.

以下规则不可违反，违反任何一条均视为Bug。

无恢复测试的备份不算备份。 每月测试恢复流程并文档化。
pg_hba.conf
中不得使用
trust
认证。开发环境、Docker环境均不可使用。必须使用
```
scram-sha-256
```
。
MySQL必须启用严格模式。
```
STRICT_TRANS_TABLES
```
防止静默数据截断。无此设置，MySQL会静默损坏数据。
强制使用TLS，而非仅可选。 使用
```
require_secure_transport
```
、
```
hostssl
```
、
```
requireTLS
```
。无TLS的连接会被视为合规问题。
PostgreSQL必须使用连接池。 PostgreSQL的“每连接一个进程”模型不具备扩展性，必须使用PgBouncer。
PostgreSQL外键列必须有索引。 PostgreSQL不会自动创建，缺失外键索引会导致JOIN时的全表扫描及级联DELETE问题。
MySQL必须始终使用
utf8mb4
。
```
utf8
```
是伪命题——仅支持3字节，无法存储表情或许多CJK字符。
PostgreSQL必须始终使用
timestamptz
。裸
```
timestamp
```
不存储时区，服务器时区变更时会出错。
所有场景使用参数化查询。 SQL字符串拼接是Bug，而非捷径。AI生成代码更需严格遵守。
仅磁盘级加密不符合PCI-DSS 4.0要求。 Req 3.5.1.2要求使用TDE、列级加密或应用层加密。
针对MongoBleed（CVE-2025-14847）打补丁。 自托管MongoDB < 8.0.17 / 7.0.28 / 6.0.27版本无需认证即可被主动利用。
针对MongoDB压缩DoS漏洞（CVE-2026-25611）打补丁。 无需认证即可通过构造OP_COMPRESSED消息实现DoS。默认配置受影响（3.6版本起默认启用压缩）。需升级至8.0.18+ / 8.2.4+ / 7.0.29+。
针对PgBouncer（CVE-2025-12819）打补丁。 PgBouncer < 1.25.1版本在
```
track_extra_parameters
```
包含
```
search_path
```
且设置
```
auth_user
```
时（均为非默认配置），可能允许未认证的SQL执行。无论配置如何都需升级——修复风险极低。
批量插入必须分块。 切勿构建包含无界行列表的单个
```
INSERT ... VALUES
```
语句。根据所有支持后端的最低参数上限计算批次大小（
```
floor(limit / columns_per_row)
```
）。若需原子性，将分块写入包裹在单个事务内。
必须执行AI自检。 所有生成的迁移脚本、架构或配置在返回前都需对照上述清单验证。

databases

Original

Translation

Databases: Production Configuration & Operations

数据库：生产环境配置与运维

When to use

适用场景

When NOT to use

不适用场景

AI Self-Check

AI自检清单

Migrations

迁移相关

Schema

架构相关

Configuration

配置相关

Bulk Operations

批量操作相关

General

通用规则

Workflow

工作流程

Step 1: Determine the domain

步骤1：确定领域

Step 2: Gather requirements

步骤2：收集需求

Step 3: Build

步骤3：执行任务

Step 4: Validate

步骤4：验证

PostgreSQL

PostgreSQL

MongoDB

MongoDB

MySQL / MariaDB

MySQL / MariaDB

MSSQL

MSSQL

Engine Routing

引擎专属要点

Schema, Migration, and Performance

架构、迁移与性能

Major version upgrades

大版本升级

Pooling, Backup, and Platform Choice

连接池、备份与平台选择

Production Checklist

生产环境检查清单

All Engines

所有引擎通用

PostgreSQL-Specific

PostgreSQL专属

MySQL/MariaDB-Specific

MySQL/MariaDB专属

MongoDB-Specific

MongoDB专属

MSSQL-Specific

MSSQL专属

Compliance (PCI-DSS 4.0)

合规（PCI-DSS 4.0）

Reference Files

参考文件

Rules

不可违反规则

Related Skills

相关技能