Best of Both Worlds: Integrating Pgpool with EDB Postgres Failover Manager

 

EDB Postgres Failover Manager (EFM) is a high availability module from EnterpriseDB (EDB) that monitors the health of Postgres clusters and verifies failures quickly. Should one occur, EFM can automatically promote a Standby node to Master to ensure continued performance and protect against data loss.

Among EDB customers, virtually all of them use EDB Postgres Failover Manager in production within their high availability infrastructures, often alongside other solutions. EDB, in fact, developed EFM with hooks and parameters to ease the tool’s integration with external systems and other software.

One such piece of software is Pgpool, the open source middleware that sits between the database clients and the Postgres server. The Pgpool module helps in connection pooling and load balancing of SELECT queries on multiple standbys in EDB Postgres clusters. EDB has also invested resources in Pgpool development and supports multiple engineers whose work advances and maintains the tool.

A commonly asked question is how best to integrate the two tools because of the benefits they both provide. In the event an EDB Postgres Master fails and EFM promotes a Standby, DBAs would also like to be able to update the Pgpool about the new Master.

The following are the steps for automating the Pgpool update should EFM execute a Master failover:

On Pgpool server:

  1. Disable the auto-failover of Pgpool by modifying the following parameter in pgpool conf:
backend_flag = 'DISALLOW_TO_FAILOVER'

The above parameter is important in order to avoid a split-brain situation between Pgpool and EFM for Postgres Failover/switchover.

For more information, please refer the following link:

http://www.pgpool.net/docs/latest/en/html/runtime-config-backend-settings.html#RUNTIME-CONFIG-BACKEND-CONNECTION-SETTINGS

  1. Configure the conf.

For more information on PCP settings, please use the following link:

http://www.pgpool.net/docs/latest/en/html/pcp-commands.html

On EDB Postgres database nodes:

  1. Install the Pgpool binaries.

Pgpool binaries come with PCP commands. Users can use the PCP command to notify pgpool about new master through EFM

  1. The following is a sample script which uses the PCP command to notify Pgpool about the new master:
#!/bin/bash

###################################################################################
#title           : EFM fencing script for updating pgpool
#description     : This script executes pcp commands. Therefore pcp & pgpool 
#                : should be available on the server
#author          : Vibhor Kumar (vibhor.aim@gmail.com).
#date            : Jan 5 2018
#version         : 1.0
#notes           : Install Vim and Emacs to use this script.
#                : configure the pcppass file for EFM user and set 
#                : the password correctly
#bash_version    : GNU bash, version 4.2.46(2)-release (x86_64-redhat-linux-gnu)
###################################################################################
# quit on any error
set -e
# verify any  undefined shell variables
set -u

###################################################################################
# Code for notifiying pgpool for promote the standby
###################################################################################

NEW_PRIMARY=$1                              # argument from EFM fencing hook
PCP_USER=enterprisedb                       # PCP user name
PCP_PORT=9051                               # PCP port number as in pgpool.conf
PCP_HOST=pgpool                             # hostname of Pgpool-II  
PGPOOL_PATH=/usr/edb/pgpool3.6              # Pgpool-II installation path
PCPPASSFILE=/var/efm/pcppass                # Path to PCPPASS file

PCP_NODE_COUNT=${PGPOOL_PATH}/bin/pcp_node_count
PCP_NODE_INFO=${PGPOOL_PATH}/bin/pcp_node_info
PCP_PROMOTE=${PGPOOL_PATH}/bin/pcp_promote_node
 
export PCPPASSFILE PCP_USER PCP_PORT PGPOOL_PATH \
       PCP_PROMOTE PCP_NODE_INFO PCP_NODE_COUNT

###################################################################################
# find the number of nodes and search for node-id of NEW_PRIMARY
###################################################################################
NO_OF_NODES=$(${PCP_NODE_COUNT} --host=${PCP_HOST} \
                      --username=${PCP_USER} \
                      --port=${PCP_PORT} \
                      --no-password )

for (( i=0 ; i < ${NO_OF_NODES} ; i++ ))
do
   exists=$(${PCP_NODE_INFO} --host=${PCP_HOST} \
                    --username=${PCP_USER} \
                    --port=${PCP_PORT} \
                    --no-password ${i} |grep ${NEW_PRIMARY}|wc -l)
   if [[ ${exists} -eq 1 ]]; then
      NODE_ID=${i}
      break
   fi
done

###################################################################################
# Promote the specific node id using PCP command
###################################################################################
if [[ ! -z ${NODE_ID} ]]; then
    ${PCP_PROMOTE} --host=${PCP_HOST} \
                   --username=${PCP_USER} \
                   --port=${PCP_PORT} \
                   --no-password \
                   --verbose \
                   ${NODE_ID}
fi
exit 0
  1. Modify the following parameters in properties file of EDB Postgres Servers (Master and Standby):
script.fence=/usr/efm-2.1/bin/efm_pgpool_notify %p

Please note the above script only covers the failover scenario of an EDB Postgres cluster. However, the above script can be extended to cover a switchover use case too.

Let’s have a look at how it works in an environment:

  1. Connect to the Pgpool server and verify the nodes:
[root@master /]# psql -h pgpool
psql.bin (9.6.6.11)
Type "help" for help.

edb=# show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | master   | 5444 | up     | 0.500000  | primary | 13         | true              | 0
 1       | standby  | 5444 | up     | 0.500000  | standby | 19         | false             | 0
(2 rows)

Above output shows that we have Master and Standby EDB Postgres nodes and both are up and running.

  1. Connect to the EDB Postgres nodes and check the status of EFM. The following is a snapshot:
[root@master /]# /usr/efm-2.1/bin/efm cluster-status efm
Cluster Status: efm
VIP: 

    Agent Type  Address              Agent  DB       Info
    --------------------------------------------------------------
    Witness     pgpool               UP     N/A       
    Standby     standby              UP     UP        
    Master      master               UP     UP        

Allowed node host list:
    master pgpool standby

Membership coordinator: master

Standby priority host list:
    standby

Promote Status:

    DB Type     Address              XLog Loc         Info
    --------------------------------------------------------------
    Master      master               0/E0001E0        
    Standby     standby              0/E0001E0        

    Standby database(s) in sync with master. It is safe to promote.
    1. Perform a failover as shown below:
/usr/efm-2.1/bin/efm promote efm
Promote/switchover command accepted by local agent. Proceeding with promotion and will reconfigure original master. Run the 'cluster-status' command for information about the new cluster state.
  1. Connect to Pgpool node and verify the status of the EDB Postgres nodes:
[root@master /]# psql -h pgpool
psql.bin (9.6.6.11)
Type "help" for help.

edb=# show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | master   | 5444 | down   | 0.500000  | standby | 13         | false             | 0
 1       | standby  | 5444 | up     | 0.500000  | primary | 20         | true              | 0
(2 rows)

The above shows that the Master server which was primary is now down and the Standby is promoted to primary.

efm_sql_command: Postgres database extension for EFM (EDB Failover Manager 2.1.x)

 

EDB Failover manager (EFM) continues to gain popularity among Postgres DBAs. EDB customers are using this tool to build a highly available EDB Postgres platform.

EFM’s primarily role is to monitor Postgres Clusters, notify the DBA of any failed clusters and automatically promote the standby cluster to function as a master.

Here are some of the high-level features of EFM:

  1. Automatic failover
  2. Switchover to a standby (s) close to master in terms of transaction xlog
  3. Set priority of standby (s)
  4. Customize notifications/alerts
  5. Transfer VIP to the new master with failover (provided, the master has been configured with VIP)

Utility “efm” is used by the DBAs to gain control of these actions. However, the need to connect to EFM cluster servers/nodes using ssh continues to not optimize the DBA experience.

Keeping this pain point in mind, we have developed “efm_sql_command” extension for the EFM 2.1.x utility command using the inherent Postgres interface. DBAs no longer need to ssh to EFM node to allow/disallow new/old nodes for EFM clusters. Additionally, through the Postgres interface itself, EFM can also be integrated with other monitoring systems like EDB Postgres Enterprise Manager, Nagios, etc.

efm_sql_command extension provides following functions:

  • efm_sql_command.efm_allow_node(‘ip address’);

This functions takes IP address as an argument and notifies EFM agents to allow the specific IP address to be part of a EFM cluster. It returns status of EFM command in 0/1. 0 means successful and 1 means failed.DBAs now can use SQL interface to notify EFM for allowing the new standby (which they are still building) without ssh to one of the EFM nodes.

  • efm_sql_command.efm_disallow_node(‘ip address’);

Similar to efm_allow_node, this function notifies to EFM agents to disallow the IP address from EFM cluster. Function report the status of the command in 0 and 1, where 0 means successfully notified and 1 means failed.

  • efm_sql_command.efm_failover();

Function notifies the EFM cluster agents to perform manual failover of master to closest standby. The function returns 0/1 status, where 0 means successfully notified EFM cluster agents and 1 means failed to notify agents.

  • efm_sql_command.efm_resume_monitoring();

If an agent is not monitoring the local EDB Postgres database, this function makes EFM agent start monitoring the local EDB Postgres database.

  • efm_sql_command.efm_set_priority(‘ip address’, ‘priority’);

Set failover priority for standby and return the status of the command in 0/1, where 0 means success and 1 means failed.

  • efm_sql_command.efm_switchover();

This function performs the switchover to closest standby of master and reconfigures the master as a new standby.

  • efm_sql_command.efm_local_properties

This a view using which DBA can view the efm properties.

  • efm_sql_command.efm_nodes_details:

This view provides the details of each node in EFM cluster. A user can use this for other the purpose. For example checking the status of standby(s), How far are the standbys from a master? etc.

  • efm_sql_command.efm_cluster_status

This function takes following arguments as text:
‘text’: To print the status of EFM cluster in TEXT
‘json’: To print the status of EFM cluster in JSON format.

This extension also gives following GUC, which DBAs can set at cluster/database/user level

ALTER SYSTEM SET efm.cluster_name TO 'clustername';
ALTER DATABASE  SET efm.cluster_name TO 'clustername';
ALTER USER  SET efm.cluster_name TO 'clustername';

Following are some snapshots of efm_sql_command’s functions

CREATE EXTENSION efm_sql_command;
edb=# select efm_extension.efm_cluster_status('text');
INFO: efm command is available
efm_cluster_status
------------------------------------------------------------------------
Cluster Status: efm
VIP:
 
Agent Type Address Agent DB Info
--------------------------------------------------------------
Idle 172.17.0.2 UP UNKNOWN
 
Allowed node host list:
172.17.0.2
 
Membership coordinator: 172.17.0.2
 
Standby priority host list:
(List is empty.)
 
Promote Status:
 
 
Idle Node Status (idle nodes ignored in XLog location comparisons):
 
Address XLog Loc Info
--------------------------------------------------------------
172.17.0.2 0/35BFC10 DB is not in recovery.
(23 rows)
edb=# select jsonb_pretty(efm_extension.efm_cluster_status('json')::jsonb);
INFO:  efm command is available 
                         jsonb_pretty                         
--------------------------------------------------------------
 {                                                           +
     "VIP": "",                                              +
     "nodes": {                                              +
         "172.17.0.2": {                                     +
             "db": "UNKNOWN",                                +
             "info": " ",                                    +
             "type": "Idle",                                 +
             "xlog": "0/35BFC10",                            +
             "agent": "UP",                                  +
             "xloginfo": "DB is not in recovery."            +
         }                                                   +
     },                                                      +
     "messages": [                                           +
         "Did not find XLog location for any non-idle nodes."+
     ],                                                      +
     "allowednodes": [                                       +
         "(List",                                            +
         "is",                                               +
         "empty.)"                                           +
     ],                                                      +
     "minimumstandbys": 0,                                   +
     "failoverpriority": [                                   +
     ],                                                      +
     "membershipcoordinator": "172.17.0.2"                   +
 }
(1 row)
edb=# select efm_extension.efm_allow_node('172.17.0.2');
INFO: efm command is available
efm_allow_node
----------------
0
(1 row)
edb=# select efm_extension.efm_disallow_node('172.17.0.2');
INFO: efm command is available
efm_disallow_node
-------------------
0
(1 row)
edb=# select * from efm_extension.efm_nodes_details ;
INFO: efm command is available
node_ip | property | value
------------+----------+--------------------------
172.17.0.2 | db | "UNKNOWN"
172.17.0.2 | info | " "
172.17.0.2 | type | "Idle"
172.17.0.2 | xlog | "0/35BC388"
172.17.0.2 | agent | "UP"
172.17.0.2 | xloginfo | "DB is not in recovery."
(6 rows)
edb=# select * from efm_extension.efm_local_properties ;
           name            |              value              
---------------------------+----------------------------------
 efm.license               |
 db.user                   | efm
 db.password.encrypted     | 074b627bf50168881d246c5dd32fd8d0
 db.port                   | 5444
 db.database               | edb
 db.service.owner          | enterprisedb
 db.service.name           | edb-as-9.6
 db.bin                    | /usr/edb/as9.6/bin
 db.recovery.conf.dir      | /pgdata
 jdbc.ssl                  | false
 jdbc.ssl.mode             | verify-ca
 user.email                | vibhor.kumar@enterprisedb.com
 script.notification       |
 bind.address              | 172.17.0.2:5430
 admin.port                | 5431
 is.witness                | false
 local.period              | 10
 local.timeout             | 60
 local.timeout.final       | 10
 remote.timeout            | 10
 node.timeout              | 50
 pingServerIp              | 8.8.8.8
 pingServerCommand         | /bin/ping -q -c3 -w5
 auto.allow.hosts          | true
 db.reuse.connection.count | 0
 auto.failover             | true
 auto.reconfigure          | true
 promotable                | true
 minimum.standbys          | 0
 recovery.check.period     | 2
 auto.resume.period        | 0
 virtualIp                 |
 virtualIp.interface       |
 virtualIp.netmask         |
 script.fence              |
 script.post.promotion     |
 script.resumed            |
 script.db.failure         |
 script.master.isolated    |
 sudo.command              | sudo
 sudo.user.command         | sudo -u %u
 jgroups.loglevel          | INFO
 efm.loglevel              | INFO
 jvm.options               | -Xmx32m
(44 rows)

I look forward to your comments on this topic. Click here to request a full demo of EFM.