New Replication and Recovery Features in PostgreSQL 9.1

1. Streaming Replication and Continuous Archiving Features.

a. Synchronous Replication
PostgreSQL 9.1 came with Synchronous Replication. In Synchronous Replication, all commited transaction will be transferred to standby synchronously.
When Primary is in Synchronous replication, then each commit transaction will wait untill transaction get transfered to Slave/Replication Server. This reduces the loss of any committed transaction. And gives high degree of durability.
For setting up synchronous replication you can look at my blog:

https://vibhorkumar.wordpress.com/2011/10/20/asynchronoussynchronous-streaming-replication-in-postgresql-9-1/ 

2. New parameter replication_timeout:
This is a new parameter which has been added in 9.1. Using this parameter user can terminate replication if connection is inactive more than replication_timeout seconds. This is useful parameter for Primary Server to detect Replication Server Crash or network outage.

3. New role/permission for Streaming Replication
REPLICATION is a new role has been added in PostgreSQL 9.1 for Streaming Replication. If a user has REPLICATION role granted then user would be able to initiate Streaming Replication. Some example is given below:
a. CREATE USER with REPLICATION Privilege:

     CREATE USER rep_user WITH REPLICATION PASSWORD 'password';

b. Assign REPLICATION Privilege to a user:

      ALTER USER username WITH REPLICATION;

REPLICATION privileged can also used for Online Backup of PostgreSQL.

4. New tool for Base Backup: pg_basebackup
pg_basebackup is a new Backup tool introduce in PostgreSQL 9.1. Using this tool user can take Base Backups of running PostgreSQL database cluster.
To use pg_basebackup, user has to make following in Changes in PostgreSQL Cluster
a. Make following changes in pg_hba.conf file:

      host    replication     postgres        [Ipv4 address of client]/32  trust

b. Make following changes in postgresql.conf file of PostgreSQL Cluster:

    archive_command = 'cp -i %p /Users/postgres/archive/%f'
    archive_mode = on # Require Restart
    max_wal_senders = 3 # Maximum 'wal_senders'
    wal_keep_segments = # How many WAL segments (=files) should be kept on the server

c. After making above changes, user can restart the PostgreSQL Cluster and can use pg_basebackup to take the backup as given below:

     pg_basebackup -D /Users/vibhor/testbackup -v -Fp -l Testbackup -h 127.0.0.1 -U postgres

Note:: User which can use pg_basebackup should have either SUPERUSER or REPLICATION privilege.

5. Functions to Control Streaming Replication replay:
New version of PostgreSQL has New Streaming Replication Control Function. Information on it is given below:
a. pg_xlog_replay_pause():
Using this function user can pause recovery of Standby and would be able to take consistent backup of Standby Data Directory. Example is given below:

postgres=# select  pg_xlog_replay_pause();
 pg_xlog_replay_pause
----------------------

(1 row)

b. pg_is_xlog_replay_paused():
Using this function user would be able to check the Standby/Streaming Replication is paused or not. Example is given below:

 postgres=# select  pg_is_xlog_replay_paused();
 pg_is_xlog_replay_paused
--------------------------
 t
(1 row)

c. pg_xlog_replay_resume():
Using this function user would be able resume replication of standby/streaming replication, if its recovery/replay is paused. Example is given below:

postgres=# select pg_xlog_replay_resume();
 pg_xlog_replay_resume
-----------------------

(1 row)

postgres=# select  pg_is_xlog_replay_paused();
 pg_is_xlog_replay_paused
--------------------------
 f
(1 row)

6. New in Replication Monitoring:
a. View: pg_stat_replication:
This view displays information on WAL sender processes. View contains one row for each WAL sender process shows, information on processid, user (oid), username, application name, host name (if available) and port number, time at which the server process began execution, and the current WAL sender state and transaction log location. Example is given below:

postgres=# select * from pg_stat_replication;
 procpid | usesysid | usename  | application_name | client_addr | client_hostname | client_port |          backend_start           |   state   | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state
---------+----------+----------+------------------+-------------+-----------------+-------------+----------------------------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------
    2319 |       10 | postgres | sync_replication | ::1         |                 |       50224 | 2011-11-10 21:39:45.424503+05:30 | streaming | 0/33002798    | 0/33002798     | 0/33002798     | 0/33002798      |             0 | async
(1 row)

Note: pg_stat_replication view will give information on Master. Executing Query against pg_stat_replication view on Standby/streaming will return zero rows.

b. Function: pg_last_xact_replay_timestamp():
Above function gives time stamp of last transaction replayed during recovery, time at which the commit or abort WAL record for that transaction was generated on the primary. If no transactions have been replayed during recovery, this function returns NULL. Otherwise, if recovery is still in progress this will increase monotonically. If recovery has completed then this value will remain static at the value of the last transaction applied during that recovery. When the server has been started normally without recovery the function returns NULL. Example is given below:

postgres=# select * from pg_last_xact_replay_timestamp();
  pg_last_xact_replay_timestamp
----------------------------------
 2011-11-10 22:17:26.431321+05:30
(1 row)

7. New in Hot Standby:
a. New parameter hot_standby_feedback:
This is a new parameter which has been added in postgresql.conf file for Standby server. Using this parameter, now user would be able avoid canceling of Queries. This enable Hot Standby to postpone of cleaning old version of rows if any SELECT query running on Standby and makes Hot Standby to send feedback, once as per wal_receive_status_interval, to primary about queries currently executing on Standby.
Note:: Setting this parameter may result in bloat on primary.

b. New column:(conflicts) in pg_stat_database
In PostgreSQL 9.1, New column conflicts has been added in pg_stat_database. This columns gives the total number of queries canceled due to conflict with recovery on standby. Example is given below:

postgres=# select datname, conflicts from pg_stat_database;
  datname  | conflicts
-----------+-----------
 template1 |         0
 template0 |         0
 postgres  |         0
 korean    |         0
(4 rows)

c. New view: pg_stat_database_conflicts
In 9.1, pg_stat_database_conflicts view has been added for monitoring and finding the cancelled queries due dropped tablespaces/ lock timeouts/old snapshots/pinned buffers/deadlocks.
This view contains one row per database, which gives information on database OID, database name and the number of queries that have been canceled in this database due to dropped tablespaces, lock timeouts, old snapshots, pinned buffers and deadlocks. Example is given below:

postgres=# select * from pg_stat_database_conflicts;
 datid |  datname  | confl_tablespace | confl_lock | confl_snapshot | confl_bufferpin | confl_deadlock
-------+-----------+------------------+------------+----------------+-----------------+----------------
     1 | template1 |                0 |          0 |              0 |               0 |              0
 12172 | template0 |                0 |          0 |              0 |               0 |              0
 12180 | postgres  |                0 |          0 |              0 |               0 |              0
 25354 | korean    |                0 |          0 |              0 |               0 |              0
(4 rows)

Note: User has to run queries against this view on Standby, since conflicts occurs on Standby.

d. Increase the maximum values for max_standby_archive_delay and max_standby_streaming_delay.
In PostgreSQL 9.0, maximum value for max_standby_archive_delay and max_standby_streaming_delay were 35 minutes. Now in PostgreSQL 9.1, user can mention much larger value. These parameters determines the maximum total time allowed to apply any WAL segment/WAL data.
For more detail about these parameter, please refer following page of document:

   http://www.postgresql.org/docs/9.1/static/runtime-config-replication.html#GUC-MAX-STANDBY-ARCHIVE-DELAY

e. New Error Code: ERRCODE_T_R_DATABASE_DROPPED

  Error Code    Condition Name
      57P04	database_dropped

Till 9.0, PostgreSQL used to use ERRCODE_ADMIN_SHUTDOWN for recovery conflict( on Standby) caused by Database Drop on Master. Now, in 9.1, ERRCODE_T_R_DATABASE_DROPPED will be used for Same situation. This change made for poolers to handle such situation (where database on Master no longer exists) correctly like pgpool.

8. New in Recovery Control:

a. pg_create_restore_point(text) function and recovery_target_name parameter:

PostgreSQL 9.1 has come with the special function pg_create_restore_point(text). Using this function, admin/DBA can now create their own recovery/restore point. pg_create_restore_point returns Transaction log location, upto which user can restore their Hotbackup. To support the named restore point, PostgreSQL 9.1, has new parameter recovery_target_name for recovery.conf file
Let see these two in action.
i. Set following parameters in postgresql.conf file of cluster:

    archive_mode=on
    archive_command='cp -i %p /Users/postgres/archive/%f'

ii. Now, lets take base backup using pg_basebackup as given below:

edbs-MacBook-Pro:standby postgres$ pg_basebackup -D /Users/postgres/test_backup -v -Fp -l "Standby Backup" -h localhost
NOTICE:  pg_stop_backup complete, all required WAL segments have been archived
pg_basebackup: base backup completed

iii. Now lets connect to database and make Named restore point using pg_create_restore_point

postgres=# select pg_create_restore_point('myrecoverylocation');
 pg_create_restore_point
-------------------------
 0/3B000118
(1 row)

To verify the behavior of recovery_target_name parameter, lets create a table and perform some activities:

postgres=# create table test_id as select id from generate_series(1,1000000) as t(id);
SELECT 1000000
postgres=#

iv. Now, create recovery.conf file as given below in Backup directory:

restore_command = 'cp -i /Users/postgres/archive/%f %p'		# e.g. 'cp /mnt/server/archivedir/%f %p'
recovery_target_name = myrecoverylocation

and change the port number of restored postgresql.conf as given below:

vi /Users/postgres/test_backup/postgresql.conf
port=5433

v. Now, start the restored PostgreSQL cluster as given below:

pg_ctl -D /Users/postgres/test_backup start

After executing above command, PostgreSQL will perform recovery till recovery_target_name location. To verify this, user can check PostgreSQL logfile, which will display message something like given below:

2011-11-14 17:15:18 IST LOG:  database system was interrupted; last known up at 2011-11-14 17:10:51 IST
2011-11-14 17:15:18 IST LOG:  creating missing WAL directory "pg_xlog/archive_status"
2011-11-14 17:15:18 IST LOG:  starting point-in-time recovery to "myrecoverylocation"

With this, after connecting to restore database, user can see that test_id table which we had created, after pg_create_restore_point(‘myrecoverylocation’), will not be part of restored database as given below:

--Restored Database:
edbs-MacBook-Pro:test_backup postgres$ psql -p 5433
Password:
psql (9.1.1)
Type "help" for help.

postgres=# \dt
             List of relations
 Schema |     Name      | Type  |  Owner
--------+---------------+-------+----------
 public | dept          | table | postgres
 public | emp           | table | postgres
 public | employees     | table | postgres
 public | entities      | table | postgres
 public | jobhist       | table | postgres
 public | like_op_table | table | postgres
 public | person_job    | table | postgres
 public | test          | table | postgres
 public | test_default  | table | postgres
(9 rows)

-- Primary Database:
postgres=# \dt
             List of relations
 Schema |     Name      | Type  |  Owner
--------+---------------+-------+----------
 public | dept          | table | postgres
 public | emp           | table | postgres
 public | employees     | table | postgres
 public | entities      | table | postgres
 public | jobhist       | table | postgres
 public | like_op_table | table | postgres
 public | person_job    | table | postgres
 public | test          | table | postgres
 public | test_default  | table | postgres
 public | test_id       | table | postgres
(10 rows)

b. Standby recovery to switch to a new timeline automatically

Till 9.0, when user specify recovery_target_timeline=’latest’, postgreSQL scan for the latest timeline at the beginning of recovery, and pick that as the target. If new timelines appear during recovery, PostgreSQL stick to the target chosen in the beginning, the new timelines are ignored. To make PostgreSQL to notice about the new timeline, user has to restart the Standby. Now in 9.1, Standby servers scan the archive directory for new timelines periodically and switch to new timeline new timeline appears during recovery.

c. New parameter: restart_after_crash = on/off

By default till 9.0, whenever, there is backend crash PostgreSQL used to automatically get restarted and there was no control on restarting of PostgreSQL (like for clusterware solutions, where clusterware userd to try to restart the PostgreSQL). Adding this parameter gives control on restart of postgreSQL.
If value of restart_after_crash is on, then PostgreSQL will restart automatically, after a backend crash.
If value is off, then PostgreSQL will not restart, after backend crash and will be get shutdown.

New Features in Slony 2.1

Slony 2.1 has been released on 19th Oct. 2011. So, I thought to look at some important improvement done in this new release, which can make users life easier.

Before discussing about the changes, lets setup slony replication.

Following Codes can be use for setting up slony replication.

####   Preable Scripts:

cluster name=slonytest;
NODE 1 ADMIN CONNINFO = 'dbname=postgres host=localhost user=postgres port=5432 password=postgres';
NODE 2 ADMIN CONNINFO = 'dbname=repdb host=localhost user=postgres port=5432 password=postgres';

Adding Node script:

### create_nodes.slonik

include <preamble.slonik>;
init cluster (id=1, comment='slonytest node 1');
store node (id=2, comment='slonytest subscriber node 2', event node=1);

Storing Path:

### store_paths.slonik

include <preamble.slonik>;
STORE PATH (SERVER=1, CLIENT=2, CONNINFO='dbname=postgres host=localhost user=postgres port=5432 password=password');
STORE PATH (SERVER=2, CLIENT=1, CONNINFO='dbname=repdb host=localhost user=postgres port=5432 password=passsword');

Master Slon Process conf file:

## Master Slon Conf:

vac_frequency=3
cleanup_interval="10 minutes"
log_level=4
sync_interval=2000
sync_interval_timeout=10000
sync_group_maxsize=6
sync_max_rowsize=8192
sync_max_largemem=5242880
syslog=0
log_pid=false
log_timestamp=true
pid_file='/Users/vibhor/PGtools/slony/master_slon.pid'
syslog_ident=slon
cluster_name='slonytest'
conn_info='dbname=postgres host=localhost user=postgres port=5432 password=postgres'
desired_sync_time=60000
sql_on_connection="SET log_min_duration_statement TO '1000';"

Slave Slon process conf file:

## Slave Slon Conf:
vac_frequency=3
cleanup_interval="10 minutes"
log_level=4
sync_interval=2000
sync_interval_timeout=10000
sync_group_maxsize=6
sync_max_rowsize=8192
sync_max_largemem=5242880
syslog=0
log_pid=false
log_timestamp=true
pid_file='/Users/vibhor/PGtools/slony/slave_slon.pid'
syslog_ident=slon
cluster_name='slonytest'
conn_info='dbname=repdb host=localhost user=postgres port=5432 password=Empid#042'
desired_sync_time=60000
sql_on_connection="SET log_min_duration_statement TO '1000';"

Ater creating above configuration, user can setup Slony Replication.

Now, lets look at the features added in slony 2.1

1. Support for adding tables in bulk

In Previous Version of Slony, if user has to add tables of a particular schema, then he has to write slonik command as given below:

include <preamble.slonik>;
create set (id=1, origin=1, comment='slonytest Tables and Sequences');
set add table (id=1, set id=1, origin=1, fully qualified name='public.dept', comment='slonytest table public.dept');
set add table (id=2, set id=1, origin=1, fully qualified name='public.emp', comment='slonytest table public.emp');
set add table (id=3, set id=1, origin=1, fully qualified name='public.employees', comment='slonytest table public.employees');

Which was a bit work, where user has to list all the tables and then add into slonik command Or user has to write a script which can generate slonik command.

Now, in slony 2.1, user can write single command to add all tables of a schema in slony replication as given below:

include <preamble.slonik>;
create set (id=1, origin=1, comment='slonytest Tables and Sequences');
set add table( set id=1, tables='public.*');

Snapshot of activity is given below:

edbs-MacBook-Pro:slony vibhor$ cat create_set.slonik
include <preamble.slonik>;
create set (id=1, origin=1, comment='slonytest Tables and Sequences');
set add table( set id=1, tables='public.*');
edbs-MacBook-Pro:slony vibhor$
edbs-MacBook-Pro:slony vibhor$ slonik < create_set.slonik

User can verify the added tables in bulk using following command:

postgres=# select * from _slonytest.sl_table;
 tab_id | tab_reloid | tab_relname | tab_nspname | tab_set |   tab_idxname   | tab_altered |   tab_comment
--------+------------+-------------+-------------+---------+-----------------+-------------+------------------
      1 |      16438 | dept        | public      |       1 | dept_pk         | f           | replicated table
      2 |      16445 | emp         | public      |       1 | emp_pk          | f           | replicated table
      3 |      16995 | employees   | public      |       1 | employees_pkey  | f           | replicated table
      4 |      16987 | entities    | public      |       1 | entities_pkey   | f           | replicated table
      5 |      16456 | jobhist     | public      |       1 | jobhist_pk      | f           | replicated table
      6 |      16967 | person_job  | public      |       1 | person_job_pkey | f           | replicated table
(6 rows)

If user wants to add all sequences of a particular schema in Slony Replication, then he can try something like given below:

include <preamble.slonik>;
set add sequences(set id=1, tables='public.*seq'); ## Adding all sequences of Public schema

User can also use some patterns while adding schema as given below:

include <preamble.slonik>;
set add tables(set id=1, tables='public.slonytest_[1234]',add sequences=true); ## adding tables: slonytest_1, slonytest_2, slonytest_3

2. Implicit WAIT FOR

In Previous version of Slony, user has to use WAIT FOR Clause to before executing command like MergeSet/MoveSet

     SUBSCRIBE SET (ID = 999, PROVIDER = 1, RECEIVER = 2);
     WAIT FOR EVENT (ORIGIN = 1, CONFIRMED = ALL, WAIT ON=1);
     SUBSCRIBE SET (ID = 999, PROVIDER = 1, RECEIVER = 3);
     WAIT FOR EVENT (ORIGIN = 1, CONFIRMED = ALL, WAIT ON=1);
     MERGE SET ( ID = 1, ADD ID = 999, ORIGIN = 1 );

Now, in 2.1, if user forgets WAIT FOR clause, then slony will use implicit WAIT FOR whenever it requires. However, user has to keep following in mind:
a. User should not run multiple slonik script parallely. Since, running parallel slonik script may confuse Slonu implicit WAIT FOR behavior.
b. Slonik Command will use WAIT FOR if event node changes.
c. Slonik Command will wait before executing SUBSCRIVE SET/DROP NODE/CLONE NODE Commands, till primary caught up with other slave nodes.
d. Slonik Command CREATE SET will wait, untill all outstanding DROP SET confirmed by master and slaves in a Slony Replication cluster.
e. WAIT FOR command will not work in TRY block. — (Incompatibilities of other version)

3. Support for replicating TRUNCATE
Slony 2.1 has introduced feature of Replication TRUNCATE Command on slaves. TRUNCATE replication is possible for PostgreSQL version >=8.4.
When user subscibe a table using slony 2.1, then slony creates a _slonytest_truncatetrigger trigger on master to replicate those truncate command.
Lets see its work:

edbs-MacBook-Pro:~ vibhor$ psql -c "truncate table person_job" postgres
TRUNCATE TABLE
edbs-MacBook-Pro:~ vibhor$ psql -c "select * from person_job" repdb
 pid | job
-----+-----
(0 rows)

4. Health checks at startup
slony 2.1, now checks the status of Each node during startup and give proper message to fix the issue. In health check it checks the configuration information and returns OK, if there is no problem. Pl/pgSQL function which has been added for health check is given below:

CREATE OR REPLACE FUNCTION _slonytest.slon_node_health_check()
 RETURNS boolean
 LANGUAGE plpgsql
AS $function$
declare
		prec record;
		all_ok boolean;
begin
		all_ok := 't'::boolean;
		-- validate that all tables in sl_table have:
		--      sl_table agreeing with pg_class
		for prec in select tab_id, tab_relname, tab_nspname from
		"_slonytest".sl_table t where not exists (select 1 from pg_catalog.pg_class c, pg_catalog.pg_namespace n
				where c.oid = t.tab_reloid and c.relname = t.tab_relname and c.relnamespace = n.oid and n.nspname = t.tab_nspname) loop
				all_ok := 'f'::boolean;
				raise warning 'table [id,nsp,name]=[%,%,%] - sl_table does not match pg_class/pg_namespace', prec.tab_id, prec.tab_relname, prec.tab_nspname;
		end loop;
		if not all_ok then
		   raise warning 'Mismatch found between sl_table and pg_class.  Slonik command REPAIR CONFIG may be useful to rectify this.';
		end if;
		return all_ok;
end
$function$

5. Performance improvement in cases of large backlog
Slony 2.1 came with performance improvement in case of large replication backlog. In previous versions, user used to experience performance impact when sl_log_* tables grows and replication of data to subscriber takes so long. Due to which Slony Replication takes huge time to catch up. Main cause of this performance issue was: Slony used to do sequential scan while pulling data from sl_log_* tables. Now, in 2.1, queries which pulls data from sl_log_* have been modified and now it uses index scan.

6. Monitoring thread to provide better monitoring data
2.1, also came with one monitoring table sl_components. This table is very useful in monitoring the Slony Threads. Description/detail of sl_components can be found in following link:

http://slony.info/documentation/2.1/table.sl-components.html

Snapshot of informantion display by sl_components is given below:

postgres=# select * from sl_components;
       co_actor       | co_pid | co_node | co_connection_pid |   co_activity    |       co_starttime        |  co_event  | co_eventtype
----------------------+--------+---------+-------------------+------------------+---------------------------+------------+--------------
 local_listen         |  26688 |       1 |             26692 | thread main loop | 2011-11-03 12:07:56+05:30 |            | n/a
 local_sync           |  26688 |       0 |             26697 | thread main loop | 2011-11-03 12:08:00+05:30 |            | n/a
 remote listener      |  26688 |       2 |             26695 | thread main loop | 2011-11-03 12:08:00+05:30 |            | n/a
 remoteWorkerThread_2 |  26688 |       2 |             26694 | SYNC             | 2011-11-03 12:08:00+05:30 | 5000004917 | SYNC
 local_monitor        |  26688 |       0 |             26698 | thread main loop | 2011-11-03 00:32:19+05:30 |            | n/a
 local_cleanup        |  26688 |       0 |             26696 | cleanupEvent     | 2011-11-03 12:00:15+05:30 |            | n/a
(6 rows)