Best of Both Worlds: Integrating Pgpool with EDB Postgres Failover Manager

 

EDB Postgres Failover Manager (EFM) is a high availability module from EnterpriseDB (EDB) that monitors the health of Postgres clusters and verifies failures quickly. Should one occur, EFM can automatically promote a Standby node to Master to ensure continued performance and protect against data loss.

Among EDB customers, virtually all of them use EDB Postgres Failover Manager in production within their high availability infrastructures, often alongside other solutions. EDB, in fact, developed EFM with hooks and parameters to ease the tool’s integration with external systems and other software.

One such piece of software is Pgpool, the open source middleware that sits between the database clients and the Postgres server. The Pgpool module helps in connection pooling and load balancing of SELECT queries on multiple standbys in EDB Postgres clusters. EDB has also invested resources in Pgpool development and supports multiple engineers whose work advances and maintains the tool.

A commonly asked question is how best to integrate the two tools because of the benefits they both provide. In the event an EDB Postgres Master fails and EFM promotes a Standby, DBAs would also like to be able to update the Pgpool about the new Master.

The following are the steps for automating the Pgpool update should EFM execute a Master failover:

On Pgpool server:

  1. Disable the auto-failover of Pgpool by modifying the following parameter in pgpool conf:
backend_flag = 'DISALLOW_TO_FAILOVER'

The above parameter is important in order to avoid a split-brain situation between Pgpool and EFM for Postgres Failover/switchover.

For more information, please refer the following link:

http://www.pgpool.net/docs/latest/en/html/runtime-config-backend-settings.html#RUNTIME-CONFIG-BACKEND-CONNECTION-SETTINGS

  1. Configure the conf.

For more information on PCP settings, please use the following link:

http://www.pgpool.net/docs/latest/en/html/pcp-commands.html

On EDB Postgres database nodes:

  1. Install the Pgpool binaries.

Pgpool binaries come with PCP commands. Users can use the PCP command to notify pgpool about new master through EFM

  1. The following is a sample script which uses the PCP command to notify Pgpool about the new master:
#!/bin/bash

###################################################################################
#title           : EFM fencing script for updating pgpool
#description     : This script executes pcp commands. Therefore pcp & pgpool 
#                : should be available on the server
#author          : Vibhor Kumar (vibhor.aim@gmail.com).
#date            : Jan 5 2018
#version         : 1.0
#notes           : Install Vim and Emacs to use this script.
#                : configure the pcppass file for EFM user and set 
#                : the password correctly
#bash_version    : GNU bash, version 4.2.46(2)-release (x86_64-redhat-linux-gnu)
###################################################################################
# quit on any error
set -e
# verify any  undefined shell variables
set -u

###################################################################################
# Code for notifiying pgpool for promote the standby
###################################################################################

NEW_PRIMARY=$1                              # argument from EFM fencing hook
PCP_USER=enterprisedb                       # PCP user name
PCP_PORT=9051                               # PCP port number as in pgpool.conf
PCP_HOST=pgpool                             # hostname of Pgpool-II  
PGPOOL_PATH=/usr/edb/pgpool3.6              # Pgpool-II installation path
PCPPASSFILE=/var/efm/pcppass                # Path to PCPPASS file

PCP_NODE_COUNT=${PGPOOL_PATH}/bin/pcp_node_count
PCP_NODE_INFO=${PGPOOL_PATH}/bin/pcp_node_info
PCP_PROMOTE=${PGPOOL_PATH}/bin/pcp_promote_node
 
export PCPPASSFILE PCP_USER PCP_PORT PGPOOL_PATH \
       PCP_PROMOTE PCP_NODE_INFO PCP_NODE_COUNT

###################################################################################
# find the number of nodes and search for node-id of NEW_PRIMARY
###################################################################################
NO_OF_NODES=$(${PCP_NODE_COUNT} --host=${PCP_HOST} \
                      --username=${PCP_USER} \
                      --port=${PCP_PORT} \
                      --no-password )

for (( i=0 ; i < ${NO_OF_NODES} ; i++ ))
do
   exists=$(${PCP_NODE_INFO} --host=${PCP_HOST} \
                    --username=${PCP_USER} \
                    --port=${PCP_PORT} \
                    --no-password ${i} |grep ${NEW_PRIMARY}|wc -l)
   if [[ ${exists} -eq 1 ]]; then
      NODE_ID=${i}
      break
   fi
done

###################################################################################
# Promote the specific node id using PCP command
###################################################################################
if [[ ! -z ${NODE_ID} ]]; then
    ${PCP_PROMOTE} --host=${PCP_HOST} \
                   --username=${PCP_USER} \
                   --port=${PCP_PORT} \
                   --no-password \
                   --verbose \
                   ${NODE_ID}
fi
exit 0
  1. Modify the following parameters in properties file of EDB Postgres Servers (Master and Standby):
script.fence=/usr/efm-2.1/bin/efm_pgpool_notify %p

Please note the above script only covers the failover scenario of an EDB Postgres cluster. However, the above script can be extended to cover a switchover use case too.

Let’s have a look at how it works in an environment:

  1. Connect to the Pgpool server and verify the nodes:
[root@master /]# psql -h pgpool
psql.bin (9.6.6.11)
Type "help" for help.

edb=# show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | master   | 5444 | up     | 0.500000  | primary | 13         | true              | 0
 1       | standby  | 5444 | up     | 0.500000  | standby | 19         | false             | 0
(2 rows)

Above output shows that we have Master and Standby EDB Postgres nodes and both are up and running.

  1. Connect to the EDB Postgres nodes and check the status of EFM. The following is a snapshot:
[root@master /]# /usr/efm-2.1/bin/efm cluster-status efm
Cluster Status: efm
VIP: 

    Agent Type  Address              Agent  DB       Info
    --------------------------------------------------------------
    Witness     pgpool               UP     N/A       
    Standby     standby              UP     UP        
    Master      master               UP     UP        

Allowed node host list:
    master pgpool standby

Membership coordinator: master

Standby priority host list:
    standby

Promote Status:

    DB Type     Address              XLog Loc         Info
    --------------------------------------------------------------
    Master      master               0/E0001E0        
    Standby     standby              0/E0001E0        

    Standby database(s) in sync with master. It is safe to promote.
    1. Perform a failover as shown below:
/usr/efm-2.1/bin/efm promote efm
Promote/switchover command accepted by local agent. Proceeding with promotion and will reconfigure original master. Run the 'cluster-status' command for information about the new cluster state.
  1. Connect to Pgpool node and verify the status of the EDB Postgres nodes:
[root@master /]# psql -h pgpool
psql.bin (9.6.6.11)
Type "help" for help.

edb=# show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | master   | 5444 | down   | 0.500000  | standby | 13         | false             | 0
 1       | standby  | 5444 | up     | 0.500000  | primary | 20         | true              | 0
(2 rows)

The above shows that the Master server which was primary is now down and the Standby is promoted to primary.

Meet BART – A New Tool for Backup And Recovery Management

EnterpriseDB recently launched a new tool for backup and recovery – named simply EDB Backup and Recovery Tool, or BART. This tool makes the DBA’s life easier by simplifying the tasks for managing their Postgres physical backup and recovery tasks, whether they are PostgreSQL or Postgres Plus Advanced Server deployments.

BART has the following advantages over custom scripts for managing backups:

1. It’s stable and it uses the tool pg_basebackup to take a physical backup. This tool has been well defined and is well-supported by the PostgreSQL community.

2. It catalogs all of the backups users are taking, which is important in terms of:
    i. Listing the type of backups used
   ii. Listing the status of those backups with server information.

3. BART also provides functionality to restore backups, with all required archived WAL files. So automation around this tool will make DBAs’ lives easier for restore and recovery.

4. BART provides an option to validate your backup by using checksum. This is useful for confirming you took a valid backup and it is not corrupted at disk level.

5. BART provides an option to define your retention policy around the backups you are keeping.

Given all of the above advantages, I decided to give this new tool a try and share some tips. To get started, you need the following prerequisites:

1. BART currently requires a Linux 64 bit platform, CentOS 6.x or RHEL 6.x
2. Need to have password-less, direct SSH access to the target machine where you want to restore backups as well as the database servers you want backed up
3. Install the Postgres Plus Advanced Server or PostgreSQL binaries for pg_basebackup

Yum or rpm

To install this tool, you have two options that I will explore below:

1. Yum command
2. Rpm command.

Using the yum command:

To perform a yum command installation, BART users can ask EDB for credentials to the EnterpriseDB yum repository and configure the their local yum repository as follows:

echo &quot;[tools]
name=EnterpriseDB Tools
baseurl=http://username:password@yum.enterprisedb.com/tools/redhat/rhel-$releasever-$basearch
enabled=1
gpgcheck=0&quot; &gt; /etc/yum.repos.d/edbtools.repo

After creating the yum repo, the user can execute the following command to install BART:

 yum install edb-bart

If the user doesn’t want to install the EDB Backup and Recovery Tool using the yum command, then the user can download a free standing rpm using the link below from EDB’s website:

http://www.enterprisedb.com/downloads/postgres-postgresql-downloads

and then enter the rpm install command as follows:

rpm -ivh edb-bart-1.0.1-1.rhel6.x86_64.rpm

After installing BART using the above commands, the user can see the binaries in the directory:/usr/edb-bart-1.0/bin and a sample BART configuration file in /usr/edb-bart-1.0/etc

That’s a very easy installation.

For more information on configuring BART Host and Database Host, the following are some documents that will help:
1. pg_basebackup configuration for PostgreSQL:
http://www.postgresql.org/docs/current/static/app-pgbasebackup.html

2. For direct password less ssh configuration user can refer following link
http://www.enterprisedb.com/docs/en/1.0/bart/EDB_Backup_and_Recovery_Tool_Guide-17.htm#P1008_76316

After the installation of the BART binaries, the user also has to create a BART configuration file.

The following is a sample configuration file for BART:

[BART]
bart-host= enterprisedb@127.0.0.1
backup_path = /opt/backup
pg_basebackup_path = /usr/ppas-9.4/bin/pg_basebackup
logfile = /tmp/bart.log

[PG]
host = 127.0.0.1
port = 5432
user = postgres
description = &quot;Postgres server&quot;

[PPAS94]
host = 127.0.0.1
port = 5444
user = enterprisedb
description = &quot;PPAS 94 server&quot;

Global Configuration Settings

Content under the [BART] tag are called global configuration settings. Under this tag are the following:

1. bart-host: the IP address of the host on which BART is installed. The value for this parameter must be specified in the form: bart_user@bart_host_address, where bart_user is the operating system user account on the BART host that is used to run BART and owns the BART backup catalog directory. bart_host_address is the IP address of the BART host.

2. backup_path: specifies the file system parent directory where all BART database server base backups and archived WAL files are stored. This parameter is required.

3. pg_basebackup_path: specifies the path to the pg_basebackup program of the Postgres database server installed on the BART host.

4. log file: specifies the path to the BART log file. This parameter is optional. If no path to a log file is specified after logfile =, or if the parameter is commented out, BART does not create a log file.

The remaining part of configuration file is self-explanatory. The TAG: [PG]/[PPAS94] part is content for servers which the user wants to back up.

Pg_basebackup Settings

After performing the above configuration on the Backup Server, the user has to do set following settings on the servers that they want to back up. Below are the settings for enabling backup using pg_basebackup.

The user has to set a few parameters in PostgreSQL postgresql.conf file, which he wants to backup:

1. wal_level parameter to archive or hot_standby.
2. archive_mode=on
3. archive_command setting.
4. max_wal_senders to 1 or more than one, since pg_basebackup uses the replication protocol to copy data directory.

For more information on each setting please refer to the following:
1. wal_level:
http://www.postgresql.org/docs/9.4/static/runtime-config-wal.html

2. archive_mode and archive_command:
http://www.postgresql.org/docs/9.4/static/runtime-config-wal.html#RUNTIME-CONFIG-WAL-ARCHIVING

3. max_wal_senders:
http://www.postgresql.org/docs/9.4/static/runtime-config-replication.html
http://www.enterprisedb.com/docs/en/1.0/bart/EDB_Backup_and_Recovery_Tool_Guide-19.htm#TopOfPage

With the above settings, the user then needs to update the pg_hba.conf file for the replication connection.

Note: The above settings are for pg_basebackup to take backups using replication protocols. In case users need more information about pg_basebackup and settings, please use the above mentioned link

How BART Works

Now, since we have configured both servers, let’s have a look how BART works.

The following command executes a backup:

 bart -c bart.cfg BACKUP -s ppas94

And below is the output:

[bart@localhost ~]$ bart -c bart.cfg BACKUP -s ppas94

INFO:  creating backup for server 'ppas94'
INFO:  backup identifier: '1413852137762'
6394456/6394456 kB (100%), 1/1 tablespace

INFO:  backup checksum: 7f49ea9653511308710c174f22ec765d
INFO:  backup completed successfully
[bart@localhost ~]$ 

That was an easy way to take a backup. The DBA can also create a job to execute the above command to take backups.

If the user wants to list the backup using BART, the user can use the option SHOW-BACKUPS:

[bart@localhost ~]$ bart -c bart.cfg SHOW-BACKUPS -s ppas94
 Server Name   Backup ID       Backup Time           Backup Size  
                                                                  
 ppas94        1413852137762   2014-10-20 17:43:41   6244.59 MB   

This is useful for knowing what backups a user has available for recovery. The above command gives important information:

1.	Backup ID: It’s a unique ID for the physical backup
2.	Backup Time: Time when backup was taken
3.	Backup Size: Size of backup

This information is useful when a user wants to plan for recovery using backup. This way, the user can also plan for disk size.

Sometimes a user wants to verify their backup state. VERIFY-CHKSUM option is useful in this case:

[bart@localhost ~]$ bart -c bart.cfg VERIFY-CHKSUM -s ppas94 -i 1413852137762
 Server Name   Backup ID       Verify  
                                       
 ppas94        1413852137762   OK      

I have to say, after putting EDB BART through its paces, I think DBAs will enjoy having such a great tool for making Backup Management easy.

In my next post, I will blog about the Recovery process.