Patroni Cluster

From Ian Doob Technology
Revision as of 15:09, 3 August 2025 by Ian.readman (talk | contribs)

Setup

  • Debian1: 192.168.59.101 (Debian 12)
  • Debian2: 192.168.59.102 (Debian 12)
  • Debian3: 192.168.59.103 (Debian 12)

The servers were created using VirtualBox.
Network setup was 2 adapters (Bridged Adapter and NAT).

Installation

sudo apt install postgresql patroni
sudo systemctl stop patroni postgresql
sudo systemctl disable postgresql

etcd was installed from prebuilt binaries

ETCD_RELEASE=$(curl -s https://api.github.com/repos/etcd-io/etcd/releases/latest|grep tag_name | cut -d '"' -f 4)
curl -sL https://github.com/etcd-io/etcd/releases/download/${ETCD_RELEASE}/etcd-${ETCD_RELEASE}-linux-amd64.tar.gz \
 | sudo tar xz -C /usr/bin --strip=1 --wildcards --no-anchored etcdctl etcd

Configuration

etcd

sudo mkdir -p /var/lib/etcd/
sudo mkdir -p /etc/etcd
sudo groupadd --system etcd
sudo useradd -s /sbin/nologin --system -g etcd etcd
sudo chown -R etcd:etcd /var/lib/etcd/
sudo chmod -R a+rw /var/lib/etcd

Create the configuration file at /etcd/etcd/etcd.yaml

name: 'debian1' 
data-dir: '/var/lib/etcd/data.etcd'
initial-advertise-peer-urls: http://192.168.59.101:2380 
listen-peer-urls: http://192.168.59.101:2380 
advertise-client-urls: http://192.168.59.101:2379 
listen-client-urls: http://192.168.59.101:2379,http://127.0.0.1:2379 
initial-cluster: "debian1=http://192.168.59.101:2380,debian2=http://192.168.59.102:2380,debian3=http://192.168.59.103:2380" 
initial-cluster-state: 'new' 
initial-cluster-token: 'token-01'

Configure the systemd unit file

[Unit]
Description=etcd key-value store
Documentation=https://github.com/etcd-io/etcd
After=network-online.target local-fs.target remote-fs.target time-sync.target
Wants=network-online.target local-fs.target remote-fs.target time-sync.target

[Service]
User=etcd
Type=notify
Environment=ETCD_DATA_DIR=/var/lib/etcd
Environment=ETCD_NAME=%H
ExecStart=/usr/bin/etcd --config-file /etc/etcd/etcd.yml
Restart=always
RestartSec=10s
LimitNOFILE=40000

[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable etcd.service
sudo systemctl start etcd.service

Test that everything is working export ETCDCTL_ENDPOINTS="http://192.168.59.103:2379,http://192.168.59.101:2379,http://192.168.59.102:2379" etcdctl member list -w table etcdctl endpoint health -w table etcdctl endpoint status -w table

+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | http://192.168.59.102:2379 | 3cdde56ec82a0ca0 | 3.5.16 | 53 kB | true | false | 13 | 145 | 145 | | | http://192.168.59.101:2379 | 406189a3bb67b8bc | 3.5.16 | 53 kB | false | false | 13 | 145 | 145 | | | http://192.168.59.103:2379 | 939ed44c2e6d5e5b | 3.5.16 | 53 kB | false | false | 13 | 145 | 145 | | +----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

Change the leader if necessary etcdctl move-leader 939ed44c2e6d5e5b --endpoints=$ENDPOINTS Issues

One of the members would not connect to the cluster. The error message on the leader was “Prober detected unhealthy status …” On the problem member the message was “Failed to publish local member to cluster through raft”

No network issues, solved by removing and adding the member etcdctl member list -w table sudo systemctl stop etcd (On server being removed) etcdctl member remove <ID>

etcdctl member add debian2 --peer-urls=http://192.168.59.102:2380

  1. Edit the etcd.yml files copying all information displayed,
  2. including the new order for ETCD_INITIAL_CLUSTER

sudo systemctl start etcd (On server being added) Patroni sudo mkdir -p /etc/patroni/ sudo chown -R postgres:postgres /etc/patroni/

As the postgres user, create a configuration file. sudo -iu postgres vi /etc/patroni/patroni.yml namespace: /db/ scope: cluster_1 name: $THIS_NAME

log:

 format: '%(asctime)s %(levelname)s: %(message)s'
 level: INFO
 max_queue_size: 1000
 traceback_level: ERROR
 type: plain

restapi:

 listen: 0.0.0.0:8008
 connect_address: $THIS_IP:8008

etcd3:

 hosts: 
 - 192.168.59.101:2379
 - 192.168.59.102:2379
 - 192.168.59.103:2379

bootstrap:

 dcs:
   ttl: 30
   loop_wait: 10
   retry_timeout: 10
   maximum_lag_on_failover: 1048576
   postgresql:
     use_pg_rewind: true
     use_slots: true
     parameters:
       wal_level: replica
       hot_standby: "on"
       wal_keep_segments: 10
       max_wal_senders: 5
       max_replication_slots: 10
       wal_log_hints: "on"
       logging_collector: 'on'
       max_wal_size: '10GB'
       archive_mode: "on"
       archive_timeout: 600s
       archive_command: "/bin/true"
 initdb: # Note: It needs to be a list (some options need values, others are switches)
 - encoding: UTF8
 - data-checksums
 pg_hba: # Add following lines to pg_hba.conf after running 'initdb'
 - host replication replicator 127.0.0.1/32 trust
 - host replication replicator 0.0.0.0/0 md5
 - host all all 0.0.0.0/0 md5
 - host all all ::0/0 md5
 # Some additional users which needs to be created after initializing new cluster
 users:
   admin:
     password: qaz123
     options:
       - createrole
       - createdb

postgresql:

 listen: 0.0.0.0:5432
 connect_address: $THIS_IP:5432
 data_dir: /var/lib/postgresql/17
 bin_dir: /usr/lib/postgresql/17/bin
 pgpass: /tmp/pgpass0
 authentication:
   replication:
     username: replicator
     password: replPasswd
   superuser:
     username: postgres
     password: qaz123
 parameters:
   unix_socket_directories: "/var/run/postgresql/"

tags:

 nofailover: false
 noloadbalance: false
 clonefrom: false
 nosync: false

Validate the configuration file patroni --validate-config /etc/patroni/patroni.yml Service sudo vi /etc/systemd/system/multi-user.target.wants/patroni.service [Unit] Description=Runners to orchestrate a high-availability PostgreSQL After=network.target

[Service] Type=simple

User=postgres Group=postgres

  1. Start the patroni process

ExecStart=/usr/bin/patroni /etc/patroni/patroni.yml

  1. Send HUP to reload from patroni.yml

ExecReload=/bin/kill -s HUP $MAINPID

  1. Only kill the patroni process, not it's children, so it will gracefully stop postgres

KillMode=process

  1. Give a reasonable amount of time for the server to start up/shut down

TimeoutSec=30

  1. Restart the service if it crashed

Restart=on-failure

[Install] WantedBy=multi-user.target

Set permissions of the data directory to 700 sudo -iu postgres chmod 700 /var/lib/postgresql/17

Start the patroni service sudo systemctl daemon-reload sudo systemctl start patroni

Check status export PATRONICTL_CONFIG_FILE=/etc/patroni/patroni.yml patronictl -c /etc/patroni/patroni.yml list

+ Cluster: cluster_1 (7439347493790530098) ------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +---------+----------------+---------+-----------+----+-----------+ | debian1 | 192.168.59.101 | Replica | streaming | 1 | 0 | | debian2 | 192.168.59.102 | Replica | streaming | 1 | 0 | | debian3 | 192.168.59.103 | Leader | running | 1 | | +---------+----------------+---------+-----------+----+-----------+

Change configuration PATRONICTL_CONFIG_FILE=/etc/patroni/patroni.yml patronictl edit-config --pg archive_command="pgbackrest --stanza=ian archive-push %p" Issues

Replication showed the following error ERROR: Requested starting point 0/4000000 is ahead of the WAL flush

       position of this server 0/3000060

The replica instance needed to be re-initialised patronictl -c /etc/patroni/patroni.yml reinit <scope> <name> VIP Manager

Copy the latest release to your server :


sudo dpkg -i vip-manager_2.8.0_Linux_x86_64.deb sudo systemctl stop vip-manager sudo vi /etc/patroni/vip-manager.yml

The trigger key below contains the namespace and scope of the Patroni service. ip: 192.168.59.100 netmask: 24 interface: enp0s3 trigger-key: "/db/cluster_1/leader" dcs-type: etcd dcs-endpoints: http://192.168.59.101:2379 sudo vi /etc/systemd/system/multi-user.target.wants/vip-manager.service [Unit] Description=Manages Virtual IP for Patroni After=network-online.target Before=patroni.service

[Service] Type=simple ExecStart=/usr/bin/vip-manager --config=/etc/patroni/vip-manager.yml Restart=on-failure

[Install] WantedBy=multi-user.target sudo systemctl daemon-reload sudo systemctl start vip-manager sudo journalctl -u vip-manager.service -n 100 -f PGBackRest

Setup the repository on Debian3

Install pgbackrest on all the servers … sudo apt install pgbackrest sudo mkdir -p -m 770 /var/log/pgbackrest sudo chown postgres:postgres /var/log/pgbackrest sudo mkdir -p /etc/pgbackrest sudo touch /etc/pgbackrest/pgbackrest.conf sudo chmod 640 /etc/pgbackrest/pgbackrest.conf sudo chown postgres:postgres /etc/pgbackrest/pgbackrest.conf

On debian3, create the directory that will hold the backups … sudo mkdir -p /etc/pgbackrest/conf.d sudo mkdir -p /var/lib/pgbackrest sudo chmod 750 /var/lib/pgbackrest sudo chown postgres:postgres /var/lib/pgbackrest Setup SSH

We will need a passwordless SSH connection between the repository and the database servers. For this setup I was using postgres as the user that controls pgbackrest.

  1. debian1

ssh-keygen scp id_ras.pub postgres@debian3 ssh postgres@debian3

  1. debian3

ssh-keygen scp ./.ssh/id_ras.pub postgres@debian1 ssh postgres@debian1 Configuration

As the postgres user create the configuration file on the repository vi /etc/pgbackrest/pgbackrest.conf [cybertec] pg1-host=debian1 pg1-path=/var/lib/postgresql/17

[global] repo1-cipher-pass=zWaf6XtpjIVZC5444yXB+cgFDFl7MxGlgkZSaoPv repo1-cipher-type=aes-256-cbc repo1-path=/var/lib/pgbackrest repo1-retention-full=2 start-fast=y

As the postgres user create the configuration file on the database server [cybertec] pg1-path=/var/lib/postgresql/17

[global] log-level-file=detail repo1-host=debian3 repo1-host-user=postgres

Alter the archive_command parameter so that it uses the pgbackrest executable patronictl edit-config --pg archive_command="pgbackrest \

                      --stanza=cybertec archive-push %p"

Create a Stanza on the repository server and confirm it is working pgbackrest --stanza=cybertec stanza-create pgbackrest --stanza=cybertec check

Also check if the Stanza is correct on the database server. pgbackrest --stanza=cybertec check