data-replication-setup

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Data Replication Setup

数据库复制设置

Overview

概述

Configure database replication for disaster recovery, load distribution, and high availability. Covers master-slave, multi-master replication, and monitoring strategies.
配置数据库复制以实现灾难恢复、负载分发和高可用性。涵盖主从复制、多主复制以及监控策略。

When to Use

适用场景

  • High availability setup
  • Disaster recovery planning
  • Read replica configuration
  • Multi-region replication
  • Replication monitoring and maintenance
  • Failover automation
  • Cross-region backup strategies
  • 高可用性架构搭建
  • 灾难恢复规划
  • 只读副本配置
  • 多区域复制
  • 复制监控与维护
  • 故障转移自动化
  • 跨区域备份策略

PostgreSQL Replication

PostgreSQL 复制

Master-Slave (Primary-Standby) Setup

主从(主备)架构设置

PostgreSQL - Configure Primary Server:
sql
-- On primary server: postgresql.conf
-- wal_level = replica
-- max_wal_senders = 10
-- wal_keep_size = 1GB

-- Create replication user
CREATE ROLE replication_user WITH REPLICATION ENCRYPTED PASSWORD 'secure_password';

-- Allow replication connections: pg_hba.conf
-- host    replication     replication_user   standby_ip/32    md5

-- Enable WAL archiving for continuous backup
-- archive_mode = on
-- archive_command = 'test ! -f /archive/%f && cp %p /archive/%f'
PostgreSQL - Set Up Standby Server:
bash
undefined
PostgreSQL - 配置主服务器:
sql
-- On primary server: postgresql.conf
-- wal_level = replica
-- max_wal_senders = 10
-- wal_keep_size = 1GB

-- Create replication user
CREATE ROLE replication_user WITH REPLICATION ENCRYPTED PASSWORD 'secure_password';

-- Allow replication connections: pg_hba.conf
-- host    replication     replication_user   standby_ip/32    md5

-- Enable WAL archiving for continuous backup
-- archive_mode = on
-- archive_command = 'test! -f /archive/%f && cp %p /archive/%f'
PostgreSQL - 配置备服务器:
bash
undefined

On standby server

On standby server

1. Stop PostgreSQL if running

1. Stop PostgreSQL if running

sudo systemctl stop postgresql
sudo systemctl stop postgresql

2. Take base backup from primary

2. Take base backup from primary

pg_basebackup -h primary_ip -D /var/lib/postgresql/14/main
-U replication_user -v -P -W
pg_basebackup -h primary_ip -D /var/lib/postgresql/14/main
-U replication_user -v -P -W

3. Create standby.signal file

3. Create standby.signal file

touch /var/lib/postgresql/14/main/standby.signal
touch /var/lib/postgresql/14/main/standby.signal

4. Configure recovery: recovery.conf

4. Configure recovery: recovery.conf

primary_conninfo = 'host=primary_ip user=replication_user password=password'

primary_conninfo = 'host=primary_ip user=replication_user password=password'

5. Start PostgreSQL

5. Start PostgreSQL

sudo systemctl start postgresql

**Monitor Replication Status:**

```sql
-- On primary: check connected standbys
SELECT pid, usename, application_name, client_addr, state
FROM pg_stat_replication;

-- On primary: check replication lag
SELECT slot_name, restart_lsn, confirmed_flush_lsn
FROM pg_replication_slots;

-- On standby: check recovery status
SELECT pg_is_wal_replay_paused();
SELECT extract(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) as replication_lag_seconds;
sudo systemctl start postgresql

**监控复制状态:**

```sql
-- On primary: check connected standbys
SELECT pid, usename, application_name, client_addr, state
FROM pg_stat_replication;

-- On primary: check replication lag
SELECT slot_name, restart_lsn, confirmed_flush_lsn
FROM pg_replication_slots;

-- On standby: check recovery status
SELECT pg_is_wal_replay_paused();
SELECT extract(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) as replication_lag_seconds;

Logical Replication

逻辑复制

PostgreSQL - Logical Replication Setup:
sql
-- On publisher (primary)
CREATE PUBLICATION users_publication FOR TABLE users, orders;

-- Create replication slot
SELECT * FROM pg_create_logical_replication_slot('users_slot', 'pgoutput');

-- On subscriber (standby)
CREATE SUBSCRIPTION users_subscription
CONNECTION 'host=publisher_ip dbname=mydb user=repuser password=pwd'
PUBLICATION users_publication
WITH (copy_data = true);

-- Check subscription status
SELECT subname, subenabled, subconninfo
FROM pg_subscription;

-- Monitor replication status
SELECT slot_name, restart_lsn, confirmed_flush_lsn
FROM pg_replication_slots
WHERE slot_type = 'logical';
PostgreSQL - 逻辑复制设置:
sql
-- On publisher (primary)
CREATE PUBLICATION users_publication FOR TABLE users, orders;

-- Create replication slot
SELECT * FROM pg_create_logical_replication_slot('users_slot', 'pgoutput');

-- On subscriber (standby)
CREATE SUBSCRIPTION users_subscription
CONNECTION 'host=publisher_ip dbname=mydb user=repuser password=pwd'
PUBLICATION users_publication
WITH (copy_data = true);

-- Check subscription status
SELECT subname, subenabled, subconninfo
FROM pg_subscription;

-- Monitor replication status
SELECT slot_name, restart_lsn, confirmed_flush_lsn
FROM pg_replication_slots
WHERE slot_type = 'logical';

MySQL Replication

MySQL 复制

Master-Slave Setup

主从架构设置

MySQL - Configure Master Server:
sql
-- In MySQL config (my.cnf / my.ini)
-- [mysqld]
-- server-id = 1
-- log-bin = mysql-bin
-- binlog-format = ROW

-- Create replication user
CREATE USER 'replication'@'%' IDENTIFIED BY 'replication_password';
GRANT REPLICATION SLAVE ON *.* TO 'replication'@'%';
FLUSH PRIVILEGES;

-- Get binary log position
SHOW MASTER STATUS;
-- File: mysql-bin.000001
-- Position: 154
MySQL - Configure Slave Server:
sql
-- In MySQL config (my.cnf / my.ini)
-- [mysqld]
-- server-id = 2
-- relay-log = mysql-relay-bin
-- binlog-format = ROW

-- Configure replication
CHANGE MASTER TO
  MASTER_HOST = '192.168.1.100',
  MASTER_USER = 'replication',
  MASTER_PASSWORD = 'replication_password',
  MASTER_LOG_FILE = 'mysql-bin.000001',
  MASTER_LOG_POS = 154;

-- Start replication
START SLAVE;

-- Check slave status
SHOW SLAVE STATUS\G
-- Should show: Slave_IO_Running: Yes, Slave_SQL_Running: Yes
Monitor MySQL Replication:
sql
-- Check slave replication status
SHOW SLAVE STATUS\G

-- Check for replication errors
SHOW SLAVE STATUS\G
-- Look at Last_Error field

-- Stop and resume replication
STOP SLAVE;
-- Fix any issues...
START SLAVE;

-- Monitor replication lag
SHOW SLAVE STATUS\G
-- Check: Seconds_Behind_Master
MySQL - 配置主服务器:
sql
-- In MySQL config (my.cnf / my.ini)
-- [mysqld]
-- server-id = 1
-- log-bin = mysql-bin
-- binlog-format = ROW

-- Create replication user
CREATE USER 'replication'@'%' IDENTIFIED BY 'replication_password';
GRANT REPLICATION SLAVE ON *.* TO 'replication'@'%';
FLUSH PRIVILEGES;

-- Get binary log position
SHOW MASTER STATUS;
-- File: mysql-bin.000001
-- Position: 154
MySQL - 配置从服务器:
sql
-- In MySQL config (my.cnf / my.ini)
-- [mysqld]
-- server-id = 2
-- relay-log = mysql-relay-bin
-- binlog-format = ROW

-- Configure replication
CHANGE MASTER TO
  MASTER_HOST = '192.168.1.100',
  MASTER_USER = 'replication',
  MASTER_PASSWORD = 'replication_password',
  MASTER_LOG_FILE = 'mysql-bin.000001',
  MASTER_LOG_POS = 154;

-- Start replication
START SLAVE;

-- Check slave status
SHOW SLAVE STATUS\G
-- Should show: Slave_IO_Running: Yes, Slave_SQL_Running: Yes
监控MySQL复制:
sql
-- Check slave replication status
SHOW SLAVE STATUS\G

-- Check for replication errors
SHOW SLAVE STATUS\G
-- Look at Last_Error field

-- Stop and resume replication
STOP SLAVE;
-- Fix any issues...
START SLAVE;

-- Monitor replication lag
SHOW SLAVE STATUS\G
-- Check: Seconds_Behind_Master

Multi-Master Replication

多主复制

MySQL - Circular Replication:
sql
-- Server 1 (Master 1)
-- [mysqld]
-- server-id = 1
-- log-bin = mysql-bin
-- auto_increment_increment = 2
-- auto_increment_offset = 1

CHANGE MASTER TO
  MASTER_HOST = '192.168.1.101',
  MASTER_USER = 'replication',
  MASTER_PASSWORD = 'password',
  MASTER_LOG_FILE = 'mysql-bin.000001',
  MASTER_LOG_POS = 154;

START SLAVE;

-- Server 2 (Master 2)
-- [mysqld]
-- server-id = 2
-- log-bin = mysql-bin
-- auto_increment_increment = 2
-- auto_increment_offset = 2

CHANGE MASTER TO
  MASTER_HOST = '192.168.1.100',
  MASTER_USER = 'replication',
  MASTER_PASSWORD = 'password',
  MASTER_LOG_FILE = 'mysql-bin.000001',
  MASTER_LOG_POS = 154;

START SLAVE;
MySQL - 环形复制:
sql
-- Server 1 (Master 1)
-- [mysqld]
-- server-id = 1
-- log-bin = mysql-bin
-- auto_increment_increment = 2
-- auto_increment_offset = 1

CHANGE MASTER TO
  MASTER_HOST = '192.168.1.101',
  MASTER_USER = 'replication',
  MASTER_PASSWORD = 'password',
  MASTER_LOG_FILE = 'mysql-bin.000001',
  MASTER_LOG_POS = 154;

START SLAVE;

-- Server 2 (Master 2)
-- [mysqld]
-- server-id = 2
-- log-bin = mysql-bin
-- auto_increment_increment = 2
-- auto_increment_offset = 2

CHANGE MASTER TO
  MASTER_HOST = '192.168.1.100',
  MASTER_USER = 'replication',
  MASTER_PASSWORD = 'password',
  MASTER_LOG_FILE = 'mysql-bin.000001',
  MASTER_LOG_POS = 154;

START SLAVE;

Replication Monitoring

复制监控

PostgreSQL - Replication Health Check:
sql
-- Create monitoring function
CREATE OR REPLACE FUNCTION check_replication_health()
RETURNS TABLE (
  slot_name name,
  restart_lsn pg_lsn,
  confirmed_flush_lsn pg_lsn,
  lag_bytes bigint,
  status text
) AS $$
BEGIN
  RETURN QUERY
  SELECT
    rs.slot_name,
    rs.restart_lsn,
    rs.confirmed_flush_lsn,
    (pg_wal_lsn_diff(pg_current_wal_lsn(), rs.confirmed_flush_lsn))::bigint,
    CASE
      WHEN pg_wal_lsn_diff(pg_current_wal_lsn(), rs.confirmed_flush_lsn) < 1048576 THEN 'HEALTHY'
      WHEN pg_wal_lsn_diff(pg_current_wal_lsn(), rs.confirmed_flush_lsn) < 10485760 THEN 'WARNING'
      ELSE 'CRITICAL'
    END
  FROM pg_replication_slots rs
  WHERE slot_type = 'physical';
END;
$$ LANGUAGE plpgsql;

SELECT * FROM check_replication_health();
MySQL - Replication Lag Monitoring:
sql
-- Monitor replication lag across multiple slaves
CREATE TABLE replication_monitoring (
  slave_host VARCHAR(50),
  slave_port INT,
  master_log_file VARCHAR(50),
  read_master_log_pos BIGINT,
  relay_log_file VARCHAR(50),
  relay_log_pos BIGINT,
  seconds_behind_master INT,
  checked_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Insert monitoring data
INSERT INTO replication_monitoring
SELECT
  @@hostname,
  @@port,
  Master_Log_File,
  Read_Master_Log_Pos,
  Relay_Log_File,
  Relay_Log_Pos,
  Seconds_Behind_Master,
  CURRENT_TIMESTAMP
FROM INFORMATION_SCHEMA.TABLES
LIMIT 1;  -- Use SHOW SLAVE STATUS values
PostgreSQL - 复制健康检查:
sql
-- Create monitoring function
CREATE OR REPLACE FUNCTION check_replication_health()
RETURNS TABLE (
  slot_name name,
  restart_lsn pg_lsn,
  confirmed_flush_lsn pg_lsn,
  lag_bytes bigint,
  status text
) AS $$
BEGIN
  RETURN QUERY
  SELECT
    rs.slot_name,
    rs.restart_lsn,
    rs.confirmed_flush_lsn,
    (pg_wal_lsn_diff(pg_current_wal_lsn(), rs.confirmed_flush_lsn))::bigint,
    CASE
      WHEN pg_wal_lsn_diff(pg_current_wal_lsn(), rs.confirmed_flush_lsn) < 1048576 THEN 'HEALTHY'
      WHEN pg_wal_lsn_diff(pg_current_wal_lsn(), rs.confirmed_flush_lsn) < 10485760 THEN 'WARNING'
      ELSE 'CRITICAL'
    END
  FROM pg_replication_slots rs
  WHERE slot_type = 'physical';
END;
$$ LANGUAGE plpgsql;

SELECT * FROM check_replication_health();
MySQL - 复制延迟监控:
sql
-- Monitor replication lag across multiple slaves
CREATE TABLE replication_monitoring (
  slave_host VARCHAR(50),
  slave_port INT,
  master_log_file VARCHAR(50),
  read_master_log_pos BIGINT,
  relay_log_file VARCHAR(50),
  relay_log_pos BIGINT,
  seconds_behind_master INT,
  checked_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Insert monitoring data
INSERT INTO replication_monitoring
SELECT
  @@hostname,
  @@port,
  Master_Log_File,
  Read_Master_Log_Pos,
  Relay_Log_File,
  Relay_Log_Pos,
  Seconds_Behind_Master,
  CURRENT_TIMESTAMP
FROM INFORMATION_SCHEMA.TABLES
LIMIT 1;  -- Use SHOW SLAVE STATUS values

Replication Failover

复制故障转移

PostgreSQL - Promote Standby to Primary:
bash
undefined
PostgreSQL - 将备服务器提升为主服务器:
bash
undefined

On standby server

On standby server

Promote standby to primary

Promote standby to primary

pg_ctl promote -D /var/lib/postgresql/14/main
pg_ctl promote -D /var/lib/postgresql/14/main

Or use SQL command

Or use SQL command

SELECT pg_promote();

**MySQL - Promote Slave to Master:**

```sql
-- On slave
-- 1. Stop slave and wait for replication to complete
STOP SLAVE;
SHOW SLAVE STATUS\G  -- Verify Slave_IO_Running and Slave_SQL_Running are OFF

-- 2. Promote to master
RESET SLAVE ALL;

-- 3. Reset binary log
RESET MASTER;

-- 4. Old master becomes new slave
-- Configure old master as slave of new master
CHANGE MASTER TO
  MASTER_HOST = 'new_master_ip',
  MASTER_USER = 'replication',
  MASTER_PASSWORD = 'password',
  MASTER_AUTO_POSITION = 1;

START SLAVE;
SELECT pg_promote();

**MySQL - 将从服务器提升为主服务器:**

```sql
-- On slave
-- 1. Stop slave and wait for replication to complete
STOP SLAVE;
SHOW SLAVE STATUS\G  -- Verify Slave_IO_Running and Slave_SQL_Running are OFF

-- 2. Promote to master
RESET SLAVE ALL;

-- 3. Reset binary log
RESET MASTER;

-- 4. Old master becomes new slave
-- Configure old master as slave of new master
CHANGE MASTER TO
  MASTER_HOST = 'new_master_ip',
  MASTER_USER = 'replication',
  MASTER_PASSWORD = 'password',
  MASTER_AUTO_POSITION = 1;

START SLAVE;

Replication Configuration Best Practices

复制配置最佳实践

PostgreSQL - postgresql.conf settings:
conf
undefined
PostgreSQL - postgresql.conf 设置:
conf
undefined

WAL configuration

WAL configuration

wal_level = replica max_wal_senders = 10 wal_keep_size = 1GB wal_receiver_timeout = 60s wal_receiver_status_interval = 10s
wal_level = replica max_wal_senders = 10 wal_keep_size = 1GB wal_receiver_timeout = 60s wal_receiver_status_interval = 10s

Hot standby

Hot standby

hot_standby = on max_standby_streaming_delay = 3min
hot_standby = on max_standby_streaming_delay = 3min

Replication timeout

Replication timeout

wal_sender_timeout = 300s

**MySQL - my.cnf settings:**

```conf
[mysqld]
wal_sender_timeout = 300s

**MySQL - my.cnf 设置:**

```conf
[mysqld]

Replication configuration

Replication configuration

server-id = 1 log-bin = mysql-bin binlog_format = ROW binlog-row-image = FULL
server-id = 1 log-bin = mysql-bin binlog_format = ROW binlog-row-image = FULL

Slave configuration

Slave configuration

relay-log = mysql-relay-bin relay-log-index = mysql-relay-bin.index relay-log-info-repository = TABLE
relay-log = mysql-relay-bin relay-log-index = mysql-relay-bin.index relay-log-info-repository = TABLE

Safety

Safety

log_replica_updates = ON slave_parallel_workers = 4 slave_parallel_type = LOGICAL_CLOCK
undefined
log_replica_updates = ON slave_parallel_workers = 4 slave_parallel_type = LOGICAL_CLOCK
undefined

Troubleshooting Replication

复制故障排查

PostgreSQL - Replication Issues:
sql
-- Check for missing files
SELECT slot_name, restart_lsn, wal_status
FROM pg_replication_slots;

-- Restart replication slot
SELECT pg_replication_slot_advance('slot_name', pg_current_wal_lsn());

-- Synchronize replication
SYNCHRONOUS_COMMIT = remote_apply;
MySQL - Common Issues:
sql
-- Check duplicate entry error
SHOW SLAVE STATUS\G
-- Look for Last_SQL_Error

-- Skip error
SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;
START SLAVE;

-- Reset replication
RESET SLAVE;
RESET MASTER;
PostgreSQL - 复制问题:
sql
-- Check for missing files
SELECT slot_name, restart_lsn, wal_status
FROM pg_replication_slots;

-- Restart replication slot
SELECT pg_replication_slot_advance('slot_name', pg_current_wal_lsn());

-- Synchronize replication
SYNCHRONOUS_COMMIT = remote_apply;
MySQL - 常见问题:
sql
-- Check duplicate entry error
SHOW SLAVE STATUS\G
-- Look for Last_SQL_Error

-- Skip error
SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;
START SLAVE;

-- Reset replication
RESET SLAVE;
RESET MASTER;

Replication Verification

复制验证

  • Test failover in non-production first
  • Verify data consistency after replication
  • Monitor replication lag continuously
  • Document all replication configurations
  • Test backup/recovery procedures
  • Schedule regular replication audits
  • 先在非生产环境测试故障转移
  • 验证复制后的数据一致性
  • 持续监控复制延迟
  • 记录所有复制配置
  • 测试备份/恢复流程
  • 定期安排复制审计

Resources

参考资源