Patroni Cluster

Setup[edit]

Debian1: 192.168.59.101 (Debian 12)
Debian2: 192.168.59.102 (Debian 12)
Debian3: 192.168.59.103 (Debian 12)

The servers were created using VirtualBox.
Network setup was 2 adapters (Bridged Adapter and NAT).

Installation[edit]

sudo apt install postgresql patroni
sudo systemctl stop patroni postgresql
sudo systemctl disable postgresql

etcd was installed from prebuilt binaries

ETCD_RELEASE=$(curl -s https://api.github.com/repos/etcd-io/etcd/releases/latest|grep tag_name | cut -d '"' -f 4)
curl -sL https://github.com/etcd-io/etcd/releases/download/${ETCD_RELEASE}/etcd-${ETCD_RELEASE}-linux-amd64.tar.gz \
 | sudo tar xz -C /usr/bin --strip=1 --wildcards --no-anchored etcdctl etcd

Configuration[edit]

etcd[edit]

sudo mkdir -p /var/lib/etcd/
sudo mkdir -p /etc/etcd
sudo groupadd --system etcd
sudo useradd -s /sbin/nologin --system -g etcd etcd
sudo chown -R etcd:etcd /var/lib/etcd/
sudo chmod -R a+rw /var/lib/etcd

Create the configuration file at /etcd/etcd/etcd.yaml

name: 'debian1' 
data-dir: '/var/lib/etcd/data.etcd'
initial-advertise-peer-urls: http://192.168.59.101:2380 
listen-peer-urls: http://192.168.59.101:2380 
advertise-client-urls: http://192.168.59.101:2379 
listen-client-urls: http://192.168.59.101:2379,http://127.0.0.1:2379 
initial-cluster: "debian1=http://192.168.59.101:2380,debian2=http://192.168.59.102:2380,debian3=http://192.168.59.103:2380" 
initial-cluster-state: 'new' 
initial-cluster-token: 'token-01'

Configure the systemd unit file

[Unit]
Description=etcd key-value store
Documentation=https://github.com/etcd-io/etcd
After=network-online.target local-fs.target remote-fs.target time-sync.target
Wants=network-online.target local-fs.target remote-fs.target time-sync.target

[Service]
User=etcd
Type=notify
Environment=ETCD_DATA_DIR=/var/lib/etcd
Environment=ETCD_NAME=%H
ExecStart=/usr/bin/etcd --config-file /etc/etcd/etcd.yml
Restart=always
RestartSec=10s
LimitNOFILE=40000

[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable etcd.service
sudo systemctl start etcd.service

Test that everything is working

export ETCDCTL_ENDPOINTS="http://192.168.59.103:2379,http://192.168.59.101:2379,http://192.168.59.102:2379"
etcdctl member list -w table
etcdctl endpoint health -w table
etcdctl endpoint status -w table

+----------------------------+------------------+---------+---------+-----------+-----------+------------+--------+
|          ENDPOINT          |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX | ERRORS |
+----------------------------+------------------+---------+---------+-----------+-----------+------------+--------+
| http://192.168.59.102:2379 | 3cdde56ec82a0ca0 |  3.5.16 |   53 kB |      true |        13 |        145 |        |
| http://192.168.59.101:2379 | 406189a3bb67b8bc |  3.5.16 |   53 kB |     false |        13 |        145 |        |
| http://192.168.59.103:2379 | 939ed44c2e6d5e5b |  3.5.16 |   53 kB |     false |        13 |        145 |        |
+----------------------------+------------------+---------+---------+-----------+-----------+------------+--------+

Change the leader if necessary

etcdctl move-leader 939ed44c2e6d5e5b --endpoints=$ETCDCTL_ENDPOINTS

Issues[edit]

One of the members would not connect to the cluster.
The error message on the leader was “Prober detected unhealthy status …”
On the problem member the message was “Failed to publish local member to cluster through raft”
No network issues, solved by removing and adding the member.

etcdctl member list -w table
sudo systemctl stop etcd (On server being removed)
etcdctl member remove <ID>
etcdctl member add debian2 --peer-urls=http://192.168.59.102:2380

Edit the etcd.yml files copying all information displayed, including the new order for ETCD_INITIAL_CLUSTER
On the server being added ..

sudo systemctl start etcd

Patroni[edit]

sudo mkdir -p /etc/patroni/
sudo chown -R  postgres:postgres /etc/patroni/

As the postgres user, create a configuration file.

sudo -iu postgres
vi /etc/patroni/patroni.yml

namespace: /db/
scope: cluster_1
name: $THIS_NAME

log:
  format: '%(asctime)s %(levelname)s: %(message)s'
  level: INFO
  max_queue_size: 1000
  traceback_level: ERROR
  type: plain

restapi:
  listen: 0.0.0.0:8008
  connect_address: $THIS_IP:8008

etcd3:
  hosts: 
  - 192.168.59.101:2379
  - 192.168.59.102:2379
  - 192.168.59.103:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        wal_level: replica
        hot_standby: "on"
        wal_keep_segments: 10
        max_wal_senders: 5
        max_replication_slots: 10
        wal_log_hints: "on"
        logging_collector: 'on'
        max_wal_size: '10GB'
        archive_mode: "on"
        archive_timeout: 600s
        archive_command: "/bin/true"

  initdb: # Note: It needs to be a list (some options need values, others are switches)
  - encoding: UTF8
  - data-checksums

  pg_hba: # Add following lines to pg_hba.conf after running 'initdb'
  - host replication replicator 127.0.0.1/32 trust
  - host replication replicator 0.0.0.0/0 md5
  - host all all 0.0.0.0/0 md5
  - host all all ::0/0 md5

  # Some additional users which needs to be created after initializing new cluster
  users:
    admin:
      password: qaz123
      options:
        - createrole
        - createdb

postgresql:
  listen: 0.0.0.0:5432
  connect_address: $THIS_IP:5432
  data_dir: /var/lib/postgresql/17
  bin_dir: /usr/lib/postgresql/17/bin
  pgpass: /tmp/pgpass0
  authentication:
    replication:
      username: replicator
      password: replPasswd
    superuser:
      username: postgres
      password: qaz123
  parameters:
    unix_socket_directories: "/var/run/postgresql/"

tags:
  nofailover: false
  noloadbalance: false
  clonefrom: false
  nosync: false

Validate the configuration file

patroni --validate-config /etc/patroni/patroni.yml

Service[edit]

sudo vi /etc/systemd/system/multi-user.target.wants/patroni.service

[Unit]
Description=Runners to orchestrate a high-availability PostgreSQL
After=network.target

[Service]
Type=simple

User=postgres
Group=postgres

# Start the patroni process
ExecStart=/usr/bin/patroni /etc/patroni/patroni.yml

# Send HUP to reload from patroni.yml
ExecReload=/bin/kill -s HUP $MAINPID

# Only kill the patroni process, not it's children, so it will gracefully stop postgres
KillMode=process

# Give a reasonable amount of time for the server to start up/shut down
TimeoutSec=30

# Restart the service if it crashed
Restart=on-failure

[Install]
WantedBy=multi-user.target

Set permissions of the data directory to 700

sudo -iu postgres
chmod 700 /var/lib/postgresql/17

Start the patroni service

sudo systemctl daemon-reload
sudo systemctl start patroni

Check status

export PATRONICTL_CONFIG_FILE=/etc/patroni/patroni.yml
patronictl -c /etc/patroni/patroni.yml list

+ Cluster: cluster_1 (7439347493790530098) ------+----+-----------+
| Member  | Host           | Role    | State     | TL | Lag in MB |
+---------+----------------+---------+-----------+----+-----------+
| debian1 | 192.168.59.101 | Replica | streaming |  1 |         0 |
| debian2 | 192.168.59.102 | Replica | streaming |  1 |         0 |
| debian3 | 192.168.59.103 | Leader  | running   |  1 |           |
+---------+----------------+---------+-----------+----+-----------+

Change configuration

PATRONICTL_CONFIG_FILE=/etc/patroni/patroni.yml
patronictl edit-config --pg archive_command="pgbackrest --stanza=ian archive-push %p"

Issues[edit]

Replication showed the following error

ERROR:  Requested starting point 0/4000000 is ahead of the WAL flush 
        position of this server 0/3000060

The replica instance needed to be re-initialised

patronictl -c /etc/patroni/patroni.yml reinit <scope> <name>

VIP Manager[edit]

Copy the latest release to your server.

 
sudo dpkg -i vip-manager_2.8.0_Linux_x86_64.deb
sudo systemctl stop vip-manager
sudo vi /etc/patroni/vip-manager.yml

The trigger key below contains the namespace and scope of the Patroni service.

ip: 192.168.59.100
netmask: 24
interface: enp0s3
trigger-key: "/db/cluster_1/leader"
dcs-type: etcd
dcs-endpoints: http://192.168.59.101:2379

sudo vi /etc/systemd/system/multi-user.target.wants/vip-manager.service

[Unit]
Description=Manages Virtual IP for Patroni
After=network-online.target
Before=patroni.service

[Service]
Type=simple
ExecStart=/usr/bin/vip-manager --config=/etc/patroni/vip-manager.yml
Restart=on-failure

[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl start vip-manager
sudo journalctl -u vip-manager.service -n 100 -f

PGBackRest[edit]

Setup the repository on Debian3.
Install pgbackrest on all the servers …

sudo apt install pgbackrest
sudo mkdir -p -m 770 /var/log/pgbackrest
sudo chown postgres:postgres /var/log/pgbackrest
sudo mkdir -p /etc/pgbackrest
sudo touch /etc/pgbackrest/pgbackrest.conf
sudo chmod 640 /etc/pgbackrest/pgbackrest.conf
sudo chown postgres:postgres /etc/pgbackrest/pgbackrest.conf

On debian3, create the directory that will hold the backups …

sudo mkdir -p /etc/pgbackrest/conf.d
sudo mkdir -p /var/lib/pgbackrest
sudo chmod 750 /var/lib/pgbackrest
sudo chown postgres:postgres /var/lib/pgbackrest

Setup SSH[edit]

We will need a passwordless SSH connection between the repository and the database servers.
For this setup I was using postgres as the user that controls pgbackrest.

# debian1
ssh-keygen
scp id_ras.pub postgres@debian3
ssh postgres@debian3
# debian3
ssh-keygen
scp ./.ssh/id_ras.pub postgres@debian1
ssh postgres@debian1

Configuration[edit]

As the postgres user create the configuration file on the repository.

vi /etc/pgbackrest/pgbackrest.conf

[cybertec]
pg1-host=debian1
pg1-path=/var/lib/postgresql/17 

[global]
repo1-cipher-pass=zWaf6XtpjIVZC5444yXB+cgFDFl7MxGlgkZSaoPv
repo1-cipher-type=aes-256-cbc
repo1-path=/var/lib/pgbackrest
repo1-retention-full=2
start-fast=y

As the postgres user create the configuration file on the database server.

[cybertec]
pg1-path=/var/lib/postgresql/17
 
[global]
log-level-file=detail
repo1-host=debian3 
repo1-host-user=postgres

Alter the archive_command parameter so that it uses the pgbackrest executable.

patronictl edit-config --pg archive_command="pgbackrest --stanza=cybertec archive-push %p"

Create a Stanza on the repository server and confirm it is working.

pgbackrest --stanza=cybertec stanza-create
pgbackrest --stanza=cybertec check

Also check if the Stanza is correct on the database servers.

pgbackrest --stanza=cybertec check