Patroni Cluster: Difference between revisions

From Ian Doob Technology
No edit summary
 
(10 intermediate revisions by the same user not shown)
Line 79: Line 79:
</pre>
</pre>
<pre>
<pre>
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
+----------------------------+------------------+---------+---------+-----------+-----------+------------+--------+
|          ENDPOINT          |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
|          ENDPOINT          |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX | ERRORS |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
+----------------------------+------------------+---------+---------+-----------+-----------+------------+--------+
| http://192.168.59.102:2379 | 3cdde56ec82a0ca0 |  3.5.16 |  53 kB |      true |      false |        13 |        145 |                145 |        |
| http://192.168.59.102:2379 | 3cdde56ec82a0ca0 |  3.5.16 |  53 kB |      true |        13 |        145 |        |
| http://192.168.59.101:2379 | 406189a3bb67b8bc |  3.5.16 |  53 kB |    false |      false |        13 |        145 |                145 |        |
| http://192.168.59.101:2379 | 406189a3bb67b8bc |  3.5.16 |  53 kB |    false |        13 |        145 |        |
| http://192.168.59.103:2379 | 939ed44c2e6d5e5b |  3.5.16 |  53 kB |    false |      false |        13 |        145 |                145 |        |
| http://192.168.59.103:2379 | 939ed44c2e6d5e5b |  3.5.16 |  53 kB |    false |        13 |        145 |        |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
+----------------------------+------------------+---------+---------+-----------+-----------+------------+--------+
</pre>
</pre>


Change the leader if necessary </br>
Change the leader if necessary </br>
<pre>
<pre>
etcdctl move-leader 939ed44c2e6d5e5b --endpoints=$ENDPOINTS
etcdctl move-leader 939ed44c2e6d5e5b --endpoints=$ETCDCTL_ENDPOINTS
</pre>
</pre>


=== Issues ===
==== Issues ====


One of the members would not connect to the cluster.
One of the members would not connect to the cluster. </br>
The error message on the leader was “Prober detected unhealthy status …”
The error message on the leader was “Prober detected unhealthy status …” </br>
On the problem member the message was “Failed to publish local member to cluster through raft”
On the problem member the message was “Failed to publish local member to cluster through raft” </br>
 
No network issues, solved by removing and adding the member. </br>
No network issues, solved by removing and adding the member
<pre>
etcdctl member list -w table
etcdctl member list -w table
sudo systemctl stop etcd (On server being removed)
sudo systemctl stop etcd (On server being removed)
etcdctl member remove <ID>
etcdctl member remove <ID>
etcdctl member add debian2 --peer-urls=http://192.168.59.102:2380
</pre>
Edit the etcd.yml files copying all information displayed, including the new order for ETCD_INITIAL_CLUSTER </br>
On the server being added ..
<pre>
sudo systemctl start etcd
</pre>


etcdctl member add debian2 --peer-urls=http://192.168.59.102:2380
=== Patroni ===
# Edit the etcd.yml files copying all information displayed,
<pre>
# including the new order for ETCD_INITIAL_CLUSTER
sudo systemctl start etcd (On server being added)
Patroni
sudo mkdir -p /etc/patroni/
sudo mkdir -p /etc/patroni/
sudo chown -R  postgres:postgres /etc/patroni/
sudo chown -R  postgres:postgres /etc/patroni/
 
</pre>
As the postgres user, create a configuration file.
As the postgres user, create a configuration file. </br>
<pre>
sudo -iu postgres
sudo -iu postgres
vi /etc/patroni/patroni.yml
vi /etc/patroni/patroni.yml
namespace: /db/
namespace: /db/
scope: cluster_1
scope: cluster_1
Line 197: Line 203:
   clonefrom: false
   clonefrom: false
   nosync: false
   nosync: false
</pre>


Validate the configuration file
Validate the configuration file </br>
<pre>
patroni --validate-config /etc/patroni/patroni.yml
patroni --validate-config /etc/patroni/patroni.yml
Service
</pre>
==== Service ====
<pre>
sudo vi /etc/systemd/system/multi-user.target.wants/patroni.service
sudo vi /etc/systemd/system/multi-user.target.wants/patroni.service
[Unit]
[Unit]
Description=Runners to orchestrate a high-availability PostgreSQL
Description=Runners to orchestrate a high-availability PostgreSQL
Line 229: Line 240:
[Install]
[Install]
WantedBy=multi-user.target
WantedBy=multi-user.target
</pre>


Set permissions of the data directory to 700
Set permissions of the data directory to 700 </br>
<pre>
sudo -iu postgres
sudo -iu postgres
chmod 700 /var/lib/postgresql/17
chmod 700 /var/lib/postgresql/17
</pre>


Start the patroni service
Start the patroni service </br>
<pre>
sudo systemctl daemon-reload
sudo systemctl daemon-reload
sudo systemctl start patroni
sudo systemctl start patroni
</pre>


Check status
Check status
<pre>
export PATRONICTL_CONFIG_FILE=/etc/patroni/patroni.yml
export PATRONICTL_CONFIG_FILE=/etc/patroni/patroni.yml
patronictl -c /etc/patroni/patroni.yml list
patronictl -c /etc/patroni/patroni.yml list
Line 249: Line 266:
| debian3 | 192.168.59.103 | Leader  | running  |  1 |          |
| debian3 | 192.168.59.103 | Leader  | running  |  1 |          |
+---------+----------------+---------+-----------+----+-----------+
+---------+----------------+---------+-----------+----+-----------+
 
</pre>
Change configuration
Change configuration </br>
<pre>
PATRONICTL_CONFIG_FILE=/etc/patroni/patroni.yml
PATRONICTL_CONFIG_FILE=/etc/patroni/patroni.yml
patronictl edit-config --pg archive_command="pgbackrest --stanza=ian archive-push %p"
patronictl edit-config --pg archive_command="pgbackrest --stanza=ian archive-push %p"
Issues
</pre>
 
==== Issues ====


Replication showed the following error
Replication showed the following error
<pre>
ERROR:  Requested starting point 0/4000000 is ahead of the WAL flush  
ERROR:  Requested starting point 0/4000000 is ahead of the WAL flush  
         position of this server 0/3000060
         position of this server 0/3000060
</pre>


The replica instance needed to be re-initialised
The replica instance needed to be re-initialised </br>
<pre>
patronictl -c /etc/patroni/patroni.yml reinit <scope> <name>
patronictl -c /etc/patroni/patroni.yml reinit <scope> <name>
VIP Manager
</pre>
 
Copy the latest release to your server :


=== VIP Manager ===
Copy the latest release to your server.
<pre>
sudo dpkg -i vip-manager_2.8.0_Linux_x86_64.deb
sudo dpkg -i vip-manager_2.8.0_Linux_x86_64.deb
sudo systemctl stop vip-manager
sudo systemctl stop vip-manager
sudo vi /etc/patroni/vip-manager.yml
sudo vi /etc/patroni/vip-manager.yml
</pre>


The trigger key below contains the namespace and scope of the Patroni service.
The trigger key below contains the namespace and scope of the Patroni service. </br>
<pre>
ip: 192.168.59.100
ip: 192.168.59.100
netmask: 24
netmask: 24
Line 277: Line 302:
dcs-type: etcd
dcs-type: etcd
dcs-endpoints: http://192.168.59.101:2379
dcs-endpoints: http://192.168.59.101:2379
</pre>
<pre>
sudo vi /etc/systemd/system/multi-user.target.wants/vip-manager.service
sudo vi /etc/systemd/system/multi-user.target.wants/vip-manager.service
[Unit]
[Unit]
Description=Manages Virtual IP for Patroni
Description=Manages Virtual IP for Patroni
Line 293: Line 322:
sudo systemctl start vip-manager
sudo systemctl start vip-manager
sudo journalctl -u vip-manager.service -n 100 -f
sudo journalctl -u vip-manager.service -n 100 -f
PGBackRest
</pre>


Setup the repository on Debian3
=== PGBackRest ===


Install pgbackrest on all the servers …
Setup the repository on Debian3. </br>
Install pgbackrest on all the servers … </br>
<pre>
sudo apt install pgbackrest
sudo apt install pgbackrest
sudo mkdir -p -m 770 /var/log/pgbackrest
sudo mkdir -p -m 770 /var/log/pgbackrest
Line 305: Line 336:
sudo chmod 640 /etc/pgbackrest/pgbackrest.conf
sudo chmod 640 /etc/pgbackrest/pgbackrest.conf
sudo chown postgres:postgres /etc/pgbackrest/pgbackrest.conf
sudo chown postgres:postgres /etc/pgbackrest/pgbackrest.conf
</pre>


On debian3, create the directory that will hold the backups …
On debian3, create the directory that will hold the backups … </br>
<pre>
sudo mkdir -p /etc/pgbackrest/conf.d
sudo mkdir -p /etc/pgbackrest/conf.d
sudo mkdir -p /var/lib/pgbackrest
sudo mkdir -p /var/lib/pgbackrest
sudo chmod 750 /var/lib/pgbackrest
sudo chmod 750 /var/lib/pgbackrest
sudo chown postgres:postgres /var/lib/pgbackrest
sudo chown postgres:postgres /var/lib/pgbackrest
Setup SSH
</pre>


We will need a passwordless SSH connection between the repository and the database servers.
==== Setup SSH ====
For this setup I was using postgres as the user that controls pgbackrest.
We will need a passwordless SSH connection between the repository and the database servers. </br>
For this setup I was using postgres as the user that controls pgbackrest. </br>
<pre>
# debian1
# debian1
ssh-keygen
ssh-keygen
Line 323: Line 358:
scp ./.ssh/id_ras.pub postgres@debian1
scp ./.ssh/id_ras.pub postgres@debian1
ssh postgres@debian1
ssh postgres@debian1
Configuration
</pre>
 
==== Configuration ====


As the postgres user create the configuration file on the repository
As the postgres user create the configuration file on the repository. </br>
<pre>
vi /etc/pgbackrest/pgbackrest.conf
vi /etc/pgbackrest/pgbackrest.conf
[cybertec]
[cybertec]
pg1-host=debian1
pg1-host=debian1
Line 337: Line 376:
repo1-retention-full=2
repo1-retention-full=2
start-fast=y
start-fast=y
</pre>


As the postgres user create the configuration file on the database server
As the postgres user create the configuration file on the database server. </br>
<pre>
[cybertec]
[cybertec]
pg1-path=/var/lib/postgresql/17
pg1-path=/var/lib/postgresql/17
Line 346: Line 387:
repo1-host=debian3  
repo1-host=debian3  
repo1-host-user=postgres
repo1-host-user=postgres
</pre>


Alter the archive_command parameter so that it uses the pgbackrest executable
Alter the archive_command parameter so that it uses the pgbackrest executable. </br>
patronictl edit-config --pg archive_command="pgbackrest \
<pre>
                      --stanza=cybertec archive-push %p"
patronictl edit-config --pg archive_command="pgbackrest --stanza=cybertec archive-push %p"
</pre>


Create a Stanza on the repository server and confirm it is working
Create a Stanza on the repository server and confirm it is working. </br>
<pre>
pgbackrest --stanza=cybertec stanza-create
pgbackrest --stanza=cybertec stanza-create
pgbackrest --stanza=cybertec check
pgbackrest --stanza=cybertec check
</pre>


Also check if the Stanza is correct on the database server.
Also check if the Stanza is correct on the database servers. </br>
<pre>
pgbackrest --stanza=cybertec check
pgbackrest --stanza=cybertec check
</pre>

Latest revision as of 15:39, 3 August 2025

Setup[edit]

  • Debian1: 192.168.59.101 (Debian 12)
  • Debian2: 192.168.59.102 (Debian 12)
  • Debian3: 192.168.59.103 (Debian 12)

The servers were created using VirtualBox.
Network setup was 2 adapters (Bridged Adapter and NAT).

Installation[edit]

sudo apt install postgresql patroni
sudo systemctl stop patroni postgresql
sudo systemctl disable postgresql

etcd was installed from prebuilt binaries

ETCD_RELEASE=$(curl -s https://api.github.com/repos/etcd-io/etcd/releases/latest|grep tag_name | cut -d '"' -f 4)
curl -sL https://github.com/etcd-io/etcd/releases/download/${ETCD_RELEASE}/etcd-${ETCD_RELEASE}-linux-amd64.tar.gz \
 | sudo tar xz -C /usr/bin --strip=1 --wildcards --no-anchored etcdctl etcd

Configuration[edit]

etcd[edit]

sudo mkdir -p /var/lib/etcd/
sudo mkdir -p /etc/etcd
sudo groupadd --system etcd
sudo useradd -s /sbin/nologin --system -g etcd etcd
sudo chown -R etcd:etcd /var/lib/etcd/
sudo chmod -R a+rw /var/lib/etcd

Create the configuration file at /etcd/etcd/etcd.yaml

name: 'debian1' 
data-dir: '/var/lib/etcd/data.etcd'
initial-advertise-peer-urls: http://192.168.59.101:2380 
listen-peer-urls: http://192.168.59.101:2380 
advertise-client-urls: http://192.168.59.101:2379 
listen-client-urls: http://192.168.59.101:2379,http://127.0.0.1:2379 
initial-cluster: "debian1=http://192.168.59.101:2380,debian2=http://192.168.59.102:2380,debian3=http://192.168.59.103:2380" 
initial-cluster-state: 'new' 
initial-cluster-token: 'token-01'

Configure the systemd unit file

[Unit]
Description=etcd key-value store
Documentation=https://github.com/etcd-io/etcd
After=network-online.target local-fs.target remote-fs.target time-sync.target
Wants=network-online.target local-fs.target remote-fs.target time-sync.target

[Service]
User=etcd
Type=notify
Environment=ETCD_DATA_DIR=/var/lib/etcd
Environment=ETCD_NAME=%H
ExecStart=/usr/bin/etcd --config-file /etc/etcd/etcd.yml
Restart=always
RestartSec=10s
LimitNOFILE=40000

[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable etcd.service
sudo systemctl start etcd.service

Test that everything is working

export ETCDCTL_ENDPOINTS="http://192.168.59.103:2379,http://192.168.59.101:2379,http://192.168.59.102:2379"
etcdctl member list -w table
etcdctl endpoint health -w table
etcdctl endpoint status -w table 
+----------------------------+------------------+---------+---------+-----------+-----------+------------+--------+
|          ENDPOINT          |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX | ERRORS |
+----------------------------+------------------+---------+---------+-----------+-----------+------------+--------+
| http://192.168.59.102:2379 | 3cdde56ec82a0ca0 |  3.5.16 |   53 kB |      true |        13 |        145 |        |
| http://192.168.59.101:2379 | 406189a3bb67b8bc |  3.5.16 |   53 kB |     false |        13 |        145 |        |
| http://192.168.59.103:2379 | 939ed44c2e6d5e5b |  3.5.16 |   53 kB |     false |        13 |        145 |        |
+----------------------------+------------------+---------+---------+-----------+-----------+------------+--------+

Change the leader if necessary

etcdctl move-leader 939ed44c2e6d5e5b --endpoints=$ETCDCTL_ENDPOINTS

Issues[edit]

One of the members would not connect to the cluster.
The error message on the leader was “Prober detected unhealthy status …”
On the problem member the message was “Failed to publish local member to cluster through raft”
No network issues, solved by removing and adding the member.

etcdctl member list -w table
sudo systemctl stop etcd (On server being removed)
etcdctl member remove <ID>
etcdctl member add debian2 --peer-urls=http://192.168.59.102:2380

Edit the etcd.yml files copying all information displayed, including the new order for ETCD_INITIAL_CLUSTER
On the server being added ..

sudo systemctl start etcd

Patroni[edit]

sudo mkdir -p /etc/patroni/
sudo chown -R  postgres:postgres /etc/patroni/

As the postgres user, create a configuration file.

sudo -iu postgres
vi /etc/patroni/patroni.yml

namespace: /db/
scope: cluster_1
name: $THIS_NAME

log:
  format: '%(asctime)s %(levelname)s: %(message)s'
  level: INFO
  max_queue_size: 1000
  traceback_level: ERROR
  type: plain

restapi:
  listen: 0.0.0.0:8008
  connect_address: $THIS_IP:8008

etcd3:
  hosts: 
  - 192.168.59.101:2379
  - 192.168.59.102:2379
  - 192.168.59.103:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        wal_level: replica
        hot_standby: "on"
        wal_keep_segments: 10
        max_wal_senders: 5
        max_replication_slots: 10
        wal_log_hints: "on"
        logging_collector: 'on'
        max_wal_size: '10GB'
        archive_mode: "on"
        archive_timeout: 600s
        archive_command: "/bin/true"

  initdb: # Note: It needs to be a list (some options need values, others are switches)
  - encoding: UTF8
  - data-checksums

  pg_hba: # Add following lines to pg_hba.conf after running 'initdb'
  - host replication replicator 127.0.0.1/32 trust
  - host replication replicator 0.0.0.0/0 md5
  - host all all 0.0.0.0/0 md5
  - host all all ::0/0 md5

  # Some additional users which needs to be created after initializing new cluster
  users:
    admin:
      password: qaz123
      options:
        - createrole
        - createdb

postgresql:
  listen: 0.0.0.0:5432
  connect_address: $THIS_IP:5432
  data_dir: /var/lib/postgresql/17
  bin_dir: /usr/lib/postgresql/17/bin
  pgpass: /tmp/pgpass0
  authentication:
    replication:
      username: replicator
      password: replPasswd
    superuser:
      username: postgres
      password: qaz123
  parameters:
    unix_socket_directories: "/var/run/postgresql/"

tags:
  nofailover: false
  noloadbalance: false
  clonefrom: false
  nosync: false

Validate the configuration file

patroni --validate-config /etc/patroni/patroni.yml

Service[edit]

sudo vi /etc/systemd/system/multi-user.target.wants/patroni.service

[Unit]
Description=Runners to orchestrate a high-availability PostgreSQL
After=network.target

[Service]
Type=simple

User=postgres
Group=postgres

# Start the patroni process
ExecStart=/usr/bin/patroni /etc/patroni/patroni.yml

# Send HUP to reload from patroni.yml
ExecReload=/bin/kill -s HUP $MAINPID

# Only kill the patroni process, not it's children, so it will gracefully stop postgres
KillMode=process

# Give a reasonable amount of time for the server to start up/shut down
TimeoutSec=30

# Restart the service if it crashed
Restart=on-failure

[Install]
WantedBy=multi-user.target

Set permissions of the data directory to 700

sudo -iu postgres
chmod 700 /var/lib/postgresql/17

Start the patroni service

sudo systemctl daemon-reload
sudo systemctl start patroni

Check status

export PATRONICTL_CONFIG_FILE=/etc/patroni/patroni.yml
patronictl -c /etc/patroni/patroni.yml list

+ Cluster: cluster_1 (7439347493790530098) ------+----+-----------+
| Member  | Host           | Role    | State     | TL | Lag in MB |
+---------+----------------+---------+-----------+----+-----------+
| debian1 | 192.168.59.101 | Replica | streaming |  1 |         0 |
| debian2 | 192.168.59.102 | Replica | streaming |  1 |         0 |
| debian3 | 192.168.59.103 | Leader  | running   |  1 |           |
+---------+----------------+---------+-----------+----+-----------+

Change configuration

PATRONICTL_CONFIG_FILE=/etc/patroni/patroni.yml
patronictl edit-config --pg archive_command="pgbackrest --stanza=ian archive-push %p"

Issues[edit]

Replication showed the following error

ERROR:  Requested starting point 0/4000000 is ahead of the WAL flush 
        position of this server 0/3000060

The replica instance needed to be re-initialised

patronictl -c /etc/patroni/patroni.yml reinit <scope> <name>

VIP Manager[edit]

Copy the latest release to your server.

 
sudo dpkg -i vip-manager_2.8.0_Linux_x86_64.deb
sudo systemctl stop vip-manager
sudo vi /etc/patroni/vip-manager.yml

The trigger key below contains the namespace and scope of the Patroni service.

ip: 192.168.59.100
netmask: 24
interface: enp0s3
trigger-key: "/db/cluster_1/leader"
dcs-type: etcd
dcs-endpoints: http://192.168.59.101:2379
sudo vi /etc/systemd/system/multi-user.target.wants/vip-manager.service

[Unit]
Description=Manages Virtual IP for Patroni
After=network-online.target
Before=patroni.service

[Service]
Type=simple
ExecStart=/usr/bin/vip-manager --config=/etc/patroni/vip-manager.yml
Restart=on-failure

[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl start vip-manager
sudo journalctl -u vip-manager.service -n 100 -f

PGBackRest[edit]

Setup the repository on Debian3.
Install pgbackrest on all the servers …

sudo apt install pgbackrest
sudo mkdir -p -m 770 /var/log/pgbackrest
sudo chown postgres:postgres /var/log/pgbackrest
sudo mkdir -p /etc/pgbackrest
sudo touch /etc/pgbackrest/pgbackrest.conf
sudo chmod 640 /etc/pgbackrest/pgbackrest.conf
sudo chown postgres:postgres /etc/pgbackrest/pgbackrest.conf

On debian3, create the directory that will hold the backups …

sudo mkdir -p /etc/pgbackrest/conf.d
sudo mkdir -p /var/lib/pgbackrest
sudo chmod 750 /var/lib/pgbackrest
sudo chown postgres:postgres /var/lib/pgbackrest

Setup SSH[edit]

We will need a passwordless SSH connection between the repository and the database servers.
For this setup I was using postgres as the user that controls pgbackrest.

# debian1
ssh-keygen
scp id_ras.pub postgres@debian3
ssh postgres@debian3
# debian3
ssh-keygen
scp ./.ssh/id_ras.pub postgres@debian1
ssh postgres@debian1

Configuration[edit]

As the postgres user create the configuration file on the repository.

vi /etc/pgbackrest/pgbackrest.conf

[cybertec]
pg1-host=debian1
pg1-path=/var/lib/postgresql/17 

[global]
repo1-cipher-pass=zWaf6XtpjIVZC5444yXB+cgFDFl7MxGlgkZSaoPv
repo1-cipher-type=aes-256-cbc
repo1-path=/var/lib/pgbackrest
repo1-retention-full=2
start-fast=y

As the postgres user create the configuration file on the database server.

[cybertec]
pg1-path=/var/lib/postgresql/17
 
[global]
log-level-file=detail
repo1-host=debian3 
repo1-host-user=postgres

Alter the archive_command parameter so that it uses the pgbackrest executable.

patronictl edit-config --pg archive_command="pgbackrest --stanza=cybertec archive-push %p"

Create a Stanza on the repository server and confirm it is working.

pgbackrest --stanza=cybertec stanza-create
pgbackrest --stanza=cybertec check

Also check if the Stanza is correct on the database servers.

pgbackrest --stanza=cybertec check