Patroni Cluster
Setup[edit]
- Debian1: 192.168.59.101 (Debian 12)
- Debian2: 192.168.59.102 (Debian 12)
- Debian3: 192.168.59.103 (Debian 12)
The servers were created using VirtualBox.
Network setup was 2 adapters (Bridged Adapter and NAT).
Installation[edit]
sudo apt install postgresql patroni sudo systemctl stop patroni postgresql sudo systemctl disable postgresql
etcd was installed from prebuilt binaries
ETCD_RELEASE=$(curl -s https://api.github.com/repos/etcd-io/etcd/releases/latest|grep tag_name | cut -d '"' -f 4) curl -sL https://github.com/etcd-io/etcd/releases/download/${ETCD_RELEASE}/etcd-${ETCD_RELEASE}-linux-amd64.tar.gz \ | sudo tar xz -C /usr/bin --strip=1 --wildcards --no-anchored etcdctl etcd
Configuration[edit]
etcd[edit]
sudo mkdir -p /var/lib/etcd/ sudo mkdir -p /etc/etcd sudo groupadd --system etcd sudo useradd -s /sbin/nologin --system -g etcd etcd sudo chown -R etcd:etcd /var/lib/etcd/ sudo chmod -R a+rw /var/lib/etcd
Create the configuration file at /etcd/etcd/etcd.yaml
name: 'debian1' data-dir: '/var/lib/etcd/data.etcd' initial-advertise-peer-urls: http://192.168.59.101:2380 listen-peer-urls: http://192.168.59.101:2380 advertise-client-urls: http://192.168.59.101:2379 listen-client-urls: http://192.168.59.101:2379,http://127.0.0.1:2379 initial-cluster: "debian1=http://192.168.59.101:2380,debian2=http://192.168.59.102:2380,debian3=http://192.168.59.103:2380" initial-cluster-state: 'new' initial-cluster-token: 'token-01'
Configure the systemd unit file
[Unit] Description=etcd key-value store Documentation=https://github.com/etcd-io/etcd After=network-online.target local-fs.target remote-fs.target time-sync.target Wants=network-online.target local-fs.target remote-fs.target time-sync.target [Service] User=etcd Type=notify Environment=ETCD_DATA_DIR=/var/lib/etcd Environment=ETCD_NAME=%H ExecStart=/usr/bin/etcd --config-file /etc/etcd/etcd.yml Restart=always RestartSec=10s LimitNOFILE=40000 [Install] WantedBy=multi-user.target sudo systemctl daemon-reload sudo systemctl enable etcd.service sudo systemctl start etcd.service
Test that everything is working
export ETCDCTL_ENDPOINTS="http://192.168.59.103:2379,http://192.168.59.101:2379,http://192.168.59.102:2379" etcdctl member list -w table etcdctl endpoint health -w table etcdctl endpoint status -w table
+----------------------------+------------------+---------+---------+-----------+-----------+------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX | ERRORS | +----------------------------+------------------+---------+---------+-----------+-----------+------------+--------+ | http://192.168.59.102:2379 | 3cdde56ec82a0ca0 | 3.5.16 | 53 kB | true | 13 | 145 | | | http://192.168.59.101:2379 | 406189a3bb67b8bc | 3.5.16 | 53 kB | false | 13 | 145 | | | http://192.168.59.103:2379 | 939ed44c2e6d5e5b | 3.5.16 | 53 kB | false | 13 | 145 | | +----------------------------+------------------+---------+---------+-----------+-----------+------------+--------+
Change the leader if necessary
etcdctl move-leader 939ed44c2e6d5e5b --endpoints=$ETCDCTL_ENDPOINTS
Issues[edit]
One of the members would not connect to the cluster.
The error message on the leader was “Prober detected unhealthy status …”
On the problem member the message was “Failed to publish local member to cluster through raft”
No network issues, solved by removing and adding the member.
etcdctl member list -w table sudo systemctl stop etcd (On server being removed) etcdctl member remove <ID> etcdctl member add debian2 --peer-urls=http://192.168.59.102:2380
Edit the etcd.yml files copying all information displayed, including the new order for ETCD_INITIAL_CLUSTER
On the server being added ..
sudo systemctl start etcd
Patroni[edit]
sudo mkdir -p /etc/patroni/ sudo chown -R postgres:postgres /etc/patroni/
As the postgres user, create a configuration file.
sudo -iu postgres vi /etc/patroni/patroni.yml namespace: /db/ scope: cluster_1 name: $THIS_NAME log: format: '%(asctime)s %(levelname)s: %(message)s' level: INFO max_queue_size: 1000 traceback_level: ERROR type: plain restapi: listen: 0.0.0.0:8008 connect_address: $THIS_IP:8008 etcd3: hosts: - 192.168.59.101:2379 - 192.168.59.102:2379 - 192.168.59.103:2379 bootstrap: dcs: ttl: 30 loop_wait: 10 retry_timeout: 10 maximum_lag_on_failover: 1048576 postgresql: use_pg_rewind: true use_slots: true parameters: wal_level: replica hot_standby: "on" wal_keep_segments: 10 max_wal_senders: 5 max_replication_slots: 10 wal_log_hints: "on" logging_collector: 'on' max_wal_size: '10GB' archive_mode: "on" archive_timeout: 600s archive_command: "/bin/true" initdb: # Note: It needs to be a list (some options need values, others are switches) - encoding: UTF8 - data-checksums pg_hba: # Add following lines to pg_hba.conf after running 'initdb' - host replication replicator 127.0.0.1/32 trust - host replication replicator 0.0.0.0/0 md5 - host all all 0.0.0.0/0 md5 - host all all ::0/0 md5 # Some additional users which needs to be created after initializing new cluster users: admin: password: qaz123 options: - createrole - createdb postgresql: listen: 0.0.0.0:5432 connect_address: $THIS_IP:5432 data_dir: /var/lib/postgresql/17 bin_dir: /usr/lib/postgresql/17/bin pgpass: /tmp/pgpass0 authentication: replication: username: replicator password: replPasswd superuser: username: postgres password: qaz123 parameters: unix_socket_directories: "/var/run/postgresql/" tags: nofailover: false noloadbalance: false clonefrom: false nosync: false
Validate the configuration file
patroni --validate-config /etc/patroni/patroni.yml
Service[edit]
sudo vi /etc/systemd/system/multi-user.target.wants/patroni.service [Unit] Description=Runners to orchestrate a high-availability PostgreSQL After=network.target [Service] Type=simple User=postgres Group=postgres # Start the patroni process ExecStart=/usr/bin/patroni /etc/patroni/patroni.yml # Send HUP to reload from patroni.yml ExecReload=/bin/kill -s HUP $MAINPID # Only kill the patroni process, not it's children, so it will gracefully stop postgres KillMode=process # Give a reasonable amount of time for the server to start up/shut down TimeoutSec=30 # Restart the service if it crashed Restart=on-failure [Install] WantedBy=multi-user.target
Set permissions of the data directory to 700
sudo -iu postgres chmod 700 /var/lib/postgresql/17
Start the patroni service
sudo systemctl daemon-reload sudo systemctl start patroni
Check status
export PATRONICTL_CONFIG_FILE=/etc/patroni/patroni.yml patronictl -c /etc/patroni/patroni.yml list + Cluster: cluster_1 (7439347493790530098) ------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +---------+----------------+---------+-----------+----+-----------+ | debian1 | 192.168.59.101 | Replica | streaming | 1 | 0 | | debian2 | 192.168.59.102 | Replica | streaming | 1 | 0 | | debian3 | 192.168.59.103 | Leader | running | 1 | | +---------+----------------+---------+-----------+----+-----------+
Change configuration
PATRONICTL_CONFIG_FILE=/etc/patroni/patroni.yml patronictl edit-config --pg archive_command="pgbackrest --stanza=ian archive-push %p"
Issues[edit]
Replication showed the following error
ERROR: Requested starting point 0/4000000 is ahead of the WAL flush position of this server 0/3000060
The replica instance needed to be re-initialised
patronictl -c /etc/patroni/patroni.yml reinit <scope> <name>
VIP Manager[edit]
Copy the latest release to your server.
sudo dpkg -i vip-manager_2.8.0_Linux_x86_64.deb sudo systemctl stop vip-manager sudo vi /etc/patroni/vip-manager.yml
The trigger key below contains the namespace and scope of the Patroni service.
ip: 192.168.59.100 netmask: 24 interface: enp0s3 trigger-key: "/db/cluster_1/leader" dcs-type: etcd dcs-endpoints: http://192.168.59.101:2379
sudo vi /etc/systemd/system/multi-user.target.wants/vip-manager.service [Unit] Description=Manages Virtual IP for Patroni After=network-online.target Before=patroni.service [Service] Type=simple ExecStart=/usr/bin/vip-manager --config=/etc/patroni/vip-manager.yml Restart=on-failure [Install] WantedBy=multi-user.target sudo systemctl daemon-reload sudo systemctl start vip-manager sudo journalctl -u vip-manager.service -n 100 -f
PGBackRest[edit]
Setup the repository on Debian3.
Install pgbackrest on all the servers …
sudo apt install pgbackrest sudo mkdir -p -m 770 /var/log/pgbackrest sudo chown postgres:postgres /var/log/pgbackrest sudo mkdir -p /etc/pgbackrest sudo touch /etc/pgbackrest/pgbackrest.conf sudo chmod 640 /etc/pgbackrest/pgbackrest.conf sudo chown postgres:postgres /etc/pgbackrest/pgbackrest.conf
On debian3, create the directory that will hold the backups …
sudo mkdir -p /etc/pgbackrest/conf.d sudo mkdir -p /var/lib/pgbackrest sudo chmod 750 /var/lib/pgbackrest sudo chown postgres:postgres /var/lib/pgbackrest
Setup SSH[edit]
We will need a passwordless SSH connection between the repository and the database servers.
For this setup I was using postgres as the user that controls pgbackrest.
# debian1 ssh-keygen scp id_ras.pub postgres@debian3 ssh postgres@debian3 # debian3 ssh-keygen scp ./.ssh/id_ras.pub postgres@debian1 ssh postgres@debian1
Configuration[edit]
As the postgres user create the configuration file on the repository.
vi /etc/pgbackrest/pgbackrest.conf [cybertec] pg1-host=debian1 pg1-path=/var/lib/postgresql/17 [global] repo1-cipher-pass=zWaf6XtpjIVZC5444yXB+cgFDFl7MxGlgkZSaoPv repo1-cipher-type=aes-256-cbc repo1-path=/var/lib/pgbackrest repo1-retention-full=2 start-fast=y
As the postgres user create the configuration file on the database server.
[cybertec] pg1-path=/var/lib/postgresql/17 [global] log-level-file=detail repo1-host=debian3 repo1-host-user=postgres
Alter the archive_command parameter so that it uses the pgbackrest executable.
patronictl edit-config --pg archive_command="pgbackrest --stanza=cybertec archive-push %p"
Create a Stanza on the repository server and confirm it is working.
pgbackrest --stanza=cybertec stanza-create pgbackrest --stanza=cybertec check
Also check if the Stanza is correct on the database servers.
pgbackrest --stanza=cybertec check