Data Replication from Oracle logical standby to EDB Postgres

At EnterpriseDB (EDB), we are sometimes presented with individual customer use cases or requests that call for the development of innovative solutions to support their deployment of the EDB Postgres Platform. For example, one of our customers  wanted to leverage Oracle Logical standby for replication to EDB Postgres. And they wanted to know if the EDB Postgres Replication Server supported Oracle logical replication to EDB Postgres.

DBAs and database architects can use the EDB Postgres Replication Server for replicating data from Oracle Logical replication to the EDB Postgres Platform. However, the process of setting up the replication,requires extra steps. Because this was a very interesting use case, I thought a blog containing the recommended steps for using the EDB Postgres Replication Server with Oracle Logical Standby would provide some guidance for many end users. There are instances where this could support strategic replication deployments where EDB Postgres is complementing a legacy solution

For this blog, we will use following IP addresses and hostnames:

  1. EDB Postgres server: 172.17.0.4
  2. Oracle Logical Standby server: 172.17.0.3
  3. Oracle primary server: 172.17.0.2

To use EDB Postgres Replication Server 6.0, users should make sure Oracle Logical Standby is in standby guard status. The following is a list of guard status for Oracle Logical Standby:

 ALTER DATABASE GUARD mode
Mode Detail
ALL (Default) No users (except SYS) can make changes to logical standby data
STANDBY Users may modify logical standby data unless it’s being maintained by local LSPn processes (e.g. via Logical Standby)
NONE Normal security rules are applied; any user with appropriate privileges can modify logical standby data

Following a SQL command, DBAs can use the following to check the guard_status in Oracle:

SQL> SELECT guard_status FROM v$database;
GUARD_S
-------

STANDBY

After confirming the guard status in Oracle Logical Sandby, create a database user with the following privileges:

CREATE USER pubuser IDENTIFIED BY oracle;
GRANT DBA TO pubuser;
GRANT CREATE ANY TRIGGER TO pubuser;
GRANT SELECT ANY TABLE TO pubuser;
GRANT LOCK ANY TABLE TO pubuser;

Let’s create the EDB Postgres Replication Server between Oracle Logical Standby and EDB Postgres. We will be leveraging the EDB Postgres Replication Server command line interface for building publications and subscriptions. If you are not familiar with the EDB Postgres Replication Server command line interface options, please refer to the following link for more information:

https://www.enterprisedb.com/docs/en/6.0/repguide/EDB_Postgres_Replication_Server_Users_Guide.1.59.html#

  1. Add Oracle as the publication database in the EDB Postgres Replication Server. The following command can be used to add the database:
[root@epas95 /]# . /usr/ppas-xdb-6.0/etc/sysconfig/xdbReplicationServer-60.config
[root@epas95 /]# ${JAVA_EXECUTABLE_PATH} -jar /usr/ppas-xdb-6.0/bin/edb-repcli.jar \
    -addpubdb \
>   -repsvrfile ${PUBCONF} \
>   -dbtype oracle \
>   -dbhost 172.17.0.3 \
>   -dbport 1521 \
>   -dbuser ${ORAUSER} \
>   -dbpassword ${ORA_ENCRYPT_PASSWD}  \
>   -database ORCL
Adding publication database...
Publication database added successfully. Publication database id:1

Please note the publication database id: 1 and -dbhost: 172.17.0.3 which is the Logical Standby IP Address

  1. Execute the following anonymous PL/SQL block in Oracle Logical Standby as SYSDBA:
BEGIN
 FOR s IN (SELECT TRIGGER_NAME FROM ALL_TRIGGERS WHERE OWNER ='PUBUSER')
 LOOP
         IF DBMS_DDL.IS_TRIGGER_FIRE_ONCE('PUBUSER', s.TRIGGER_NAME)
         THEN
                DBMS_DDL.set_trigger_firing_property( 
                trig_owner => 'PUBUSER',  
                trig_name => s.trigger_name,   
                fire_once => FALSE);
        END IF;
 END LOOP;
END;
/

/

The DBA/User should execute the above PL/SQL block in Oracle Logical Standby to make sure triggers fire when a process changes the base table. By default, in Oracle Logical Standby, triggers never get fired properly in STANDBY guard status.

  1. Add the subscription database in EDB Postgres Replication Server. The following command can be used for adding the subscription database. In our case, we have the EDB Postgres database add:
[root@epas95 /]# ${JAVA_EXECUTABLE_PATH} -jar /usr/ppas-xdb-6.0/bin/edb-repcli.jar \
-addsubdb \
>   -repsvrfile ${SUBCONF} \
>   -dbtype enterprisedb \
>   -dbhost 172.17.0.4 \
>   -dbport 5432 \
>   -dbuser ${PGUSER} \
>   -dbpassword ${PG_ENCRYPT_PASSWD}  \
>   -database orcl
Adding Subscription Database...
Subscription database added successfully. Subscription Database id:1006

Please note the subscription database id: 1006 and -dbhost: 172.17.0.3, which is the EDB Postgres server IP address.

  1. Before creating a publication and subscription, first change the GUARD status to NONE in the Oracle Logical Standby as given below:
SQL> ALTER DATABASE GUARD NONE;

Database altered.

The above change is needed because the EDB Postgres Replication Server acquires a LOCK on tables, managed by logical standby, for creating triggers for publication.

  1. Create publication for table: REPLICATION.REPLICATION_TABLE:
[root@epas95 /]# ${JAVA_EXECUTABLE_PATH} -jar /usr/ppas-xdb-6.0/bin/edb-repcli.jar \
-createpub replication_table \
>            -repsvrfile ${PUBCONF} \
>            -pubdbid 1 \
>            -reptype t \
>            -tables REPLICATION.REPLICATION_TABLE \
>            -repgrouptype s
Creating publication...
Tables:[[REPLICATION.REPLICATION_TABLE, TABLE]]
Filter clause:[]
Publication created.
  1. After creating the publication, please use the following anonymous block to set the EDB Postgres Replication Server trigger property to fire every time there is a change to the base table.
BEGIN
 FOR s IN (SELECT TRIGGER_NAME FROM ALL_TRIGGERS WHERE OWNER ='PUBUSER')
 LOOP
         IF DBMS_DDL.IS_TRIGGER_FIRE_ONCE('PUBUSER', s.TRIGGER_NAME)
         THEN
                DBMS_DDL.set_trigger_firing_property( 
                trig_owner => 'PUBUSER',  
                trig_name => s.trigger_name,   
                fire_once => FALSE);
        END IF;
 END LOOP;
END;
/
  1. Create subscription for publication: replication_table (created in step 5) using following command:
[root@epas95 /]# ${JAVA_EXECUTABLE_PATH} -jar /usr/ppas-xdb-6.0/bin/edb-repcli.jar \
-createsub replication_table \
>   -subsvrfile ${SUBCONF} \
>   -subdbid 1006 \
>   -pubsvrfile ${PUBCONF} \
>   -pubname replication_table
Creating subscription...
Subscription created successfully
  1. After creating the subscription, re-execute the anonymous block to set EDB Postgres Replication Server trigger property:
BEGIN
 FOR s IN (SELECT TRIGGER_NAME FROM ALL_TRIGGERS WHERE OWNER ='PUBUSER')
 LOOP
         IF DBMS_DDL.IS_TRIGGER_FIRE_ONCE('PUBUSER', s.TRIGGER_NAME)
         THEN
                DBMS_DDL.set_trigger_firing_property( 
                trig_owner => 'PUBUSER',  
                trig_name => s.trigger_name,   
                fire_once => FALSE);
        END IF;
 END LOOP;
END;
/
  1. After creating a subscription, rollback the GUARD status to STANDBY in Oracle Logical Standby as shown below:
SQL> ALTER DATABASE GUARD STANDBY;

Database altered.

Now, we are ready to replicate data from Oracle Logical Standby to EDB Postgres.

Let’s first take the snapshot of data from Oracle Logical Standby to EDB Postgres using the following command:

[root@epas95 /]# ${JAVA_EXECUTABLE_PATH} -jar /usr/ppas-xdb-6.0/bin/edb-repcli.jar\
-dosnapshot replication_table \
-repsvrfile ${SUBCONF} \
-verboseSnapshotOutput true
Performing snapshot...
Running EnterpriseDB Migration Toolkit (Build 49.1.5) ...
Source database connectivity info...
conn =jdbc:oracle:thin:@172.17.0.3:1521:ORCL
user =pubuser
password=******
Target database connectivity info...
conn =jdbc:edb://172.17.0.4:5432/orcl?loginTimeout=60&connectTimeout=30
user =pubuser
password=******
Connecting with source Oracle database server...
Connected to Oracle, version 'Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options'
Connecting with target EnterpriseDB database server...
Connected to EnterpriseDB, version '9.5.5.10'
Importing redwood schema REPLICATION...
Table List: 'REPLICATION_TABLE'
Loading Table Data in 8 MB batches...
Disabling FK constraints & triggers on replication.replication_table before truncate...
Truncating table REPLICATION_TABLE before data load...
Disabling indexes on replication.replication_table before data load...
Loading Table: REPLICATION_TABLE ...
[REPLICATION_TABLE] Migrated 10 rows.
[REPLICATION_TABLE] Table Data Load Summary: Total Time(s): 0.04 Total Rows: 10
Enabling FK constraints & triggers on replication.replication_table...
Enabling indexes on replication.replication_table after data load...
Performing ANALYZE on EnterpriseDB database...
Data Load Summary: Total Time (sec): 0.191 Total Rows: 10 Total Size(MB): 0.0
Schema REPLICATION imported successfully.
Migration process completed successfully.
Migration logs have been saved to /var/log/xdb-6.0
******************** Migration Summary ********************
Tables: 1 out of 1
Total objects: 1
Successful count: 1
Failed count: 0
Invalid count: 0
*************************************************************
Snapshot taken successfully.

After taking an initial snapshot, let’s modify the replication table from Oracle primary:

18-JAN-17 REPLICATION@ ORCL PRIMARY> UPDATE replication_table SET id=11 WHERE id=10;
Elapsed: 00:00:00.01
18-JAN-17 REPLICATION@ ORCL PRIMARY> COMMIT;
Elapsed: 00:00:00.01
18-JAN-17 REPLICATION@ ORCL PRIMARY> SELECT * FROM replication_table;
1 FIRST
2 SECOND
3 THIRD
4 FOURTH
5 FIFTH
6 SIXTH
7 SEVENTH
8 EIGHTH
9 NINTH
11 TENTH

Elapsed: 00:00:00.00

After making the changes on Oracle primary, we can execute SELECT command on Oracle Logical Standby to verify if the data change was replicated by Oracle in logical standby mode.

18-JAN-17 REPLICATION@ ORCL STANDBY> SELECT * FROM replication_table;
1 FIRST
2 SECOND
3 THIRD
4 FOURTH
5 FIFTH
6 SIXTH
7 SEVENTH
8 EIGHTH
9 NINTH
11 TENTH

10 rows selected.

Elapsed: 00:00:00.00
  1. Now perform the manual sync to verify EDB Postgres Replication Server can replicate the above changes in EDB Postgres.
[root@epas95 /]# ${JAVA_EXECUTABLE_PATH} -jar /usr/ppas-xdb-6.0/bin/edb-repcli.jar  -dosynchronize  replication_table -repsvrfile ${SUBCONF} -repgrouptype s
Performing synchronize...
Synchronize done successfully.
[root@epas95 /]# psql orcl
psql.bin (9.5.5.10)
Type "help" for help.
orcl=# select * from replication.replication_table ;
id |   col
----+---------
1 | FIRST
2 | SECOND
3 | THIRD
4 | FOURTH
5 | FIFTH
6 | SIXTH
7 | SEVENTH
8 | EIGHTH
9 | NINTH
11 | TENTH

(10 rows)

The above snapshot shows that after performing -dosynchronize, the EDB Postgres Replication Server was able to replicate incremental changes on Oracle Logical Standby to EDB Postgres.

Tip:: PPAS 9.4 and Global Temporary Table

Customers who moved/migrated their database from Oracle to PPAS frequently ask for Global Temporary Table in PPAS.

Currently, PPAS doesn’t support Global Temporary tables. However, there is a way user can achieve this functionality in PPAS.

Before we continue with the implementation, lets first understand characteristics of Global Temporary Table. Following are the important characteristics of Global Temporary Table.
1. Global Temporary Table gives predefined structure for storing data.
2. It’s an unlogged table which means any activity on this table will not be logged.
3. The data in a global temporary table are private, such that data inserted by a session can only be accessed by that session.

Based on the above characteristics of Global Temporary Table AKA GTT, we can define similar kind of work by using the following method:
1. Create UNLOGGED TABLE in PPAS, which activity won’t be logged.
2. Create Row Level Security in such a way that session should be able to see their information (based on PID).
3. Create a process which can cleanup data from GTT based on pids which are not active in database.

Lets see how we can implement it in Adavanced Server.

1. Create an UNLOGGED table with all columns required and extra column of Pid.

CREATE UNLOGGED TABLE test_global_temporary_table(id numeric, col text, pid bigint default pg_backend_pid());

2. Create a function to restrict the visibility of data.

CREATE OR REPLACE FUNCTION verify_pid_context (
    p_schema       TEXT,
    p_object       TEXT
)
RETURN VARCHAR2
IS
DECLARE
   predicate TEXT;
BEGIN
    IF ( current_setting('is_superuser') = 'on')
    THEN
      predicate = 'true';
    ELSE
      predicate := format('pid = %s',pg_backend_pid());
    END IF;
    RETURN predicate;
END;

3. Apply the security policy based on above function.

DECLARE
  v_object_schema VARCHAR2(30)   := 'public';
  v_object_name VARCHAR2(30)     := 'test_global_temporary_table';
  v_policy_name VARCHAR2(30)     := 'secure_by_pid';
  v_function_schema VARCHAR2(30) := 'public';
  v_policy_function VARCHAR2(30) := 'verify_pid_context';
  v_statement_types VARCHAR2(30) := 'INSERT,UPDATE,DELETE,SELECT';
  v_update_check BOOLEAN         := TRUE;
  v_enable BOOLEAN               := TRUE;
BEGIN
  DBMS_RLS.ADD_POLICY( v_object_schema,
                       v_object_name,
                       v_policy_name,
                       v_function_schema,
                       v_policy_function,
                       v_statement_types,
                       v_update_check,
                       v_enable
                     );
END;

4. Create UPDATABLE view which can hide pid column. All sessions will be using this view as GTT.

   CREATE OR REPLACE VIEW test_global_temporary AS SELECT id, col FROM test_global_temporary_table;

5. Create a backend job, which can cleanup Table based on stale/old sessions.
For job, user/developer can do following:
a. use superuser and execute DELETE command on table:
DELETE FROM test_global_temporary WHERE pid NOT in (SELECT pid FROM pg_stat_activity);
b. To Schedule above DELETE command, user can use one of the following:
i. Crontab
ii. Or PPAS DBMS_SCHEDULE Package.

6. GRANT ALL privileges to database user who can access Global Temporary Table.

    GRANT ALL on test_global_temporary TO testdbuser;
    GRANT ALL on test_global_temporary_table To testdbuser;

Now, lets try above implementation of Global Temporary Table.

Open two sessions as a normal user (testdbuser) as given below:

[vibhorkumar@localhost ~]$ psql -U testdbuser edb
psql.bin (9.4.4.9)
Type "help" for help.

edb=> 
edb=> select pg_backend_pid();
 pg_backend_pid 
----------------
          32722
(1 row)


edb=> select pg_backend_pid();
 pg_backend_pid 
----------------
          32729
(1 row)

Now from both session insert some records:
From first session:

edb=> INSERT INTO test_global_temporary VALUES(1,'FROM pid 32722');
INSERT 0 1
edb=> INSERT INTO test_global_temporary VALUES(2,'FROM pid 32722');
INSERT 0 1
edb=> INSERT INTO test_global_temporary VALUES(3,'FROM pid 32722');
INSERT 0 1

From Second session:

edb=> INSERT INTO test_global_temporary VALUES(1,'FROM pid 32729');
INSERT 0 1
edb=> INSERT INTO test_global_temporary VALUES(2,'FROM pid 32729');
INSERT 0 1
edb=> INSERT INTO test_global_temporary VALUES(3,'FROM pid 32729');
INSERT 0 1

From First Session:

edb=> SELECT * FROM test_global_temporary;
 id |      col       
----+----------------
  1 | FROM pid 32722
  2 | FROM pid 32722
  3 | FROM pid 32722
(3 rows)

From Second Session:

edb=> SELECT * FROm test_global_temporary;
 id |      col       
----+----------------
  1 | FROM pid 32729
  2 | FROM pid 32729
  3 | FROM pid 32729
(3 rows)

which shows that unlogged table with right RLS policy and backend job, can be a potential solution for Global Temporary Tables.

Monitoring approach for Streaming Replication with Hot Standby in PostgreSQL 9.3.

The people using PostgreSQL and the Streaming Replication feature seem to ask many of the same questions:

1. How best to monitor Streaming Replication?

2. What is the best way to do that?

3. Are there alternatives, when monitoring on Standby, to using the pg_stat_replication view on Master?

4. How should I calculate replication lag-time, in seconds, minutes, etc.?

In light of these commonly asked questions, I thought a blog would help. The following are some methods I’ve found to be useful.

Monitoring is critical for large infrastructure deployments where you have Streaming Replication for:

1. Disaster recovery

2. Streaming Replication is for High Availability

3. Load balancing, when using Streaming Replication with Hot Standby

PostgreSQL has some building blocks for replication monitoring, and the following are some important functions and views which can be use for monitoring the replication:

1. pg_stat_replication view on master/primary server.

This view helps in monitoring the standby on Master. It gives you the following details:

   pid:              Process id of walsender process
   usesysid:         OID of user which is used for Streaming replication.
   usename:          Name of user which is used for Streaming replication
   application_name: Application name connected to master
   client_addr:      Address of standby/streaming replication
   client_hostname:  Hostname of standby.
   client_port:      TCP port number on which standby communicating with WAL sender
   backend_start:    Start time when SR connected to Master.
   state:            Current WAL sender state i.e streaming
   sent_location:    Last transaction location sent to standby.
   write_location:   Last transaction written on disk at standby
   flush_location:   Last transaction flush on disk at standby.
   replay_location:  Last transaction flush on disk at standby.
   sync_priority:    Priority of standby server being chosen as synchronous standby
   sync_state:       Sync State of standby (is it async or synchronous).

e.g.:

postgres=# select * from pg_stat_replication ;
-[ RECORD 1 ]----+---------------------------------
pid              | 1114
usesysid         | 16384
usename          | repuser
application_name | walreceiver
client_addr      | 172.17.0.3
client_hostname  |
client_port      | 52444
backend_start    | 15-MAY-14 19:54:05.535695 -04:00
state            | streaming
sent_location    | 0/290044C0
write_location   | 0/290044C0
flush_location   | 0/290044C0
replay_location  | 0/290044C0
sync_priority    | 0
sync_state       | async

2. pg_is_in_recovery() : Function which tells whether standby is still in recovery mode or not.

e.g.

postgres=# select pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 t
(1 row)

3. pg_last_xlog_receive_location: Function which tells location of last transaction log which was streamed by Standby and also written on standby disk.

e.g.

postgres=# select pg_last_xlog_receive_location();
 pg_last_xlog_receive_location
-------------------------------
 0/29004560
(1 row)

4. pg_last_xlog_replay_location: Function which tells last transaction replayed during recovery process. e.g is given below:

postgres=# select pg_last_xlog_replay_location();
 pg_last_xlog_replay_location
------------------------------
 0/29004560
(1 row)

5. pg_last_xact_replay_timestamp: This function tells about the time stamp of last transaction which was replayed during recovery. Below is an example:

postgres=# select pg_last_xact_replay_timestamp();
  pg_last_xact_replay_timestamp
----------------------------------
 15-MAY-14 20:54:27.635591 -04:00
(1 row)

Above are some important functions/views, which are already available in PostgreSQL for monitoring the streaming replication.

So, the logical next question is, “What’s the right way to monitor the Hot Standby with Streaming Replication on Standby Server?”

If you have Hot Standby with Streaming Replication, the following are the points you should monitor:

1. Check if your Hot Standby is in recovery mode or not:

For this you can use pg_is_in_recovery() function.

2.Check whether Streaming Replication is working or not.

And easy way of doing this is checking the pg_stat_replication view on Master/Primary. This view gives information only on master if Streaming Replication is working.

3. Check If Streaming Replication is not working and Hot standby is recovering from archived WAL file.

For this, either the DBA can use the PostgreSQL Log file to monitor it or utilize the following functions provided in PostgreSQL 9.3:

pg_last_xlog_replay_location();
pg_last_xact_replay_timestamp();

4. Check how far off is the Standby from Master.

There are two ways to monitor lag for Standby.



   i. Lags in Bytes: For calculating lags in bytes, users can use the pg_stat_replication view on the master with the function pg_xlog_location_diff function. Below is an example:

pg_xlog_location_diff(pg_stat_replication.sent_location, pg_stat_replication.replay_location)

which gives the lag in bytes.

  ii. Calculating lags in Seconds. The following is SQL, which most people uses to find the lag in seconds:

   SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location()
                 THEN 0
               ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
          END AS log_delay;

Including the above into your repertoire can give you good monitoring for PostgreSQL.

I will in a future post include the script that can be used for monitoring the Hot Standby with PostgreSQL streaming replication.

New in PostgreSQL 9.3: Data Types

In series of writing further on upcoming features in 9.3, I thought about including new improvements coming in data types in PostgreSQL.

1. Increase the maximum length of large objects from 2GB to 4TB

PostgreSQL has support of Large Objects from starting. However the limit of large objects in PostgreSQL was limited to 2GB.

From 9.3 onwards, PostgreSQL can store large objects up to 4TB. Thats happened due to lifting the limitation of API for large object. APIs like lo_seek(),lo_tell cannot return over 2GB offset and main reason was offset parameters defined for these function is of 4 bytes and results length provided by these functions is 4 bytes. If user do the calculation, he can see 2^31-1=2GB, it resulted in 2GB.

To overcome from this limitation, PostgreSQL is coming with new APIs: lo_seek64 and lo_tell64 functions. Libpq interface will check if those lo_tell64/seek64 exits then use it or use the 32 bit of lo_seek/lo_tell functions, this way compatibility has been maintained in PostgreSQL for older release and new upcoming release. This is good add-on for PostgreSQL. Thanks to Tatsuo Ishii (pgpool developer and developer for this add-on).

2. Text timezone designations using ISO “T” function for timestamptz

This is new for timestamtz. In 9.3, text timezone designation is allowed using ISO “T” format. This was not working in pre-9.3. Lets look at one example:

In pre-9.3


worktest=# select '2011-08-29T09:11:14.123 America/Chicago'::timestamptz;
ERROR:  invalid input syntax for type timestamp with time zone: "2011-08-29T09:11:14.123 America/Chicago"
LINE 1: select '2011-08-29T09:11:14.123 America/Chicago'::timestampt...

As user can see, pre-9.3 has complained about it. However in In 9.3 this has been fixed.

pgsqltest=# select '2011-08-29T09:11:14.123 America/Chicago'::timestamptz;
        timestamptz         
----------------------------
 2011-08-29 10:11:14.123-04
(1 row)

Good to see this fix.

3. New operators and Functions for JSON data strings

PostgreSQL 9.3, is coming withe some new functions and operators for JSON data types, which is add-ons for users who uses JSON data type in their application. Now, they can explore new functions and operators for their use case.
Pre-9.3, had following functions:

array_to_json
row_to_json

In 9.3, we have following new operators:

Operator Right Operand Type Description Example
-> int Get JSON array element '[1,2,3]'::json->2
-> text Get JSON object field '{"a":1,"b":2}'::json->'b'
->> int Get JSON array element as text '[1,2,3]'::json->>2
->> text Get JSON object field as text '{"a":1,"b":2}'::json->>'b'
#> array of text Get JSON object at specified path '{"a":[1,2,3],"b":[4,5,6]}'::json#>'{a,2}'
#>> array of text Get JSON object at specified path as text '{"a":[1,2,3],"b":[4,5,6]}'::json#>>'{a,2}'

And following new functions:

Function Return Type Example
to_json(anyelement) json to_json('Fred said
"Hi."'::text)
json_array_length(json) int json_array_length('[1,2,3,{"f1":1,"f2":[5,6]},4]')
json_each(json) SETOF key text, value json select * from
json_each('{"a":"foo", "b":"bar"}')
json_each_text(from_json
json)
SETOF key text, value text select * from
json_each_text('{"a":"foo", "b":"bar"}')
json_extract_path(from_json json,
VARIADIC path_elems text[])
json json_extract_path('{"f2":{"f3":1},"f4":{"f5":99,"f6":"foo"}}','f4')
json_extract_path_text(from_json
json, VARIADIC path_elems text[])
text json_extract_path_text('{"f2":{"f3":1},"f4":{"f5":99,"f6":"foo"}}','f4',
'f6')
json_object_keys(json) SETOF text json_object_keys('{"f1":"abc","f2":{"f3":"a",
"f4":"b"}}')
json_populate_record(base
anyelement, from_json json, [, use_json_as_text
bool=false]
anyelement select * from
json_populate_record(null::x, '{"a":1,"b":2}')
json_populate_recordset(base
anyelement, from_json json, [, use_json_as_text
bool=false]
SETOF anyelement select * from
json_populate_recordset(null::x,
'[{"a":1,"b":2},{"a":3,"b":4}]')
json_array_elements(json) SETOF json json_array_elements('[1,true,
[2,false]]')

4. New functions to support hstore to JSON

PostgreSQL 9.3, is also coming with new functions for converting values of hstore to JSON.
Following are new functions with example which is coming in for hstore to JSON data types.

Function Return Type Description Example Result
hstore_to_json(hstore) json get hstore as a json value hstore_to_json('"a key"=>1,
b=>t, c=>null, d=>12345, e=>012345,
f=>1.234, g=>2.345e+4')
{"a key": "1", "b": "t", "c":
null, "d": "12345", "e": "012345", "f": "1.234", "g":
"2.345e+4"}
hstore_to_json_loose(hstore) json get hstore as a json value, but attempting to distinguish
numerical and Boolean values so they are unquoted in
the JSON
hstore_to_json_loose('"a
key"=>1, b=>t, c=>null, d=>12345,
e=>012345, f=>1.234, g=>2.345e+4')
{"a key": 1, "b": true, "c":
null, "d": 12345, "e": "012345", "f": 1.234, "g":
2.345e+4}

New in PostgreSQL 9.3: New in Functions

In the series of blogging about new features in 9.3, today, I thought about blogging new functions and improvements coming in PostgreSQL.

Lets look whats new in 9.3, in terms of in build functions:

1. New in one array functions for one dimensional array

PostgreSQL 9.3, is coming with new functions which can help users to manipulate one dimensional arrays by calling simple functions at the place of crafting their own functions and following some methods to do the modification in it.

i. array_remove function

This is a new function added in 9.3, which provides ability for removing the elements from array. Function takes two arguments:
a. One dimensional array from which user wants to remove elements
b. element value which user wants to remove
.

Syntax of this function is given below:

ARRAY_REMOVE(<one dimensional array>, element)

Example of array_remove is given below:

postgres=# select array_remove(ARRAY['First','Second','Delete','Four'],'Delete');
    array_remove     
---------------------
 {First,Second,Four}
(1 row)

ii. array_replace function

This is a new in 9.3, this helps user to replace any element in array. This function can be use multi-dimensional array Or for single dimensional array.
Example of array_replace is given below:

pgsqltest=# select array_replace(ARRAY[ARRAY[1,2],ARRAY[3,4],ARRAY[56,6],ARRAY[7,8]],56,5);
       array_replace       
---------------------------
 {{1,2},{3,4},{5,6},{7,8}}
(1 row)

2. VARIADIC-labeled arguments expansion for concat and format functions

This is new addition to concat and format function. pre-9.3 concat and format, function used to ignore VARIADIC label and doesn’t used to print right output. However, this has been fixed in 9.3. Lets look at the example of pre-9.3 and in 9.3

In pre-9.3

worktest=# select concat(variadic array[1,2,3]);
 concat  
---------
 {1,2,3}
(1 row)

In 9.3

pgsqltest=# select concat(variadic array[1,2,3]);
 concat 
--------
 123
(1 row)

You can see above in 9.3, using VARIADIC label working properly.

Lets look at the format() function:

In pre-9.3

worktest=# select format(string_agg('%s',','), variadic array_agg(i)) from generate_series(1,3) g(i);
ERROR:  too few arguments for format

woow, its error out. However, this is fixed in 9.3 :-).

In 9.3

pgsqltest=# select format(string_agg('%s',','), variadic array_agg(i)) from generate_series(1,3) g(i);
 format 
--------
 1,2,3
(1 row)

3. Improvement in format() function to handle field width and left/right alignment

This new added in 9.3 format() function, which is going to increase the usability of format() function for developers. In 9.3, format function is coming with following enhancements:
i). Proper handling of field width.
ii). Proper handling of Left/right alignment.

Above two improvement was missing in pre-9.3. However, addition of above improvement made format() function to be follower of sprintf() C function.

Lets look at the example:
In pre-9.3, using of field width resulted into error message, as given below

ERROR:  unterminated conversion specifier
worktest=#  select format('>>%10s<<', 'Hello');
ERROR:  unterminated conversion specifier
worktest=# 

However, 9.3 is coming with proper field width support.

pgsqltest=#  select format('>>%10s<<', 'Hello');
     format     
----------------
 >>     Hello<<
(1 row)

9.3 is also coming proper handling left/right alignment. Examples are given below:
Left alignment

pgsqltest=# select format('>>%-10s<<', 'Hello');
     format     
----------------
 >>Hello     <<
(1 row)

Right alignment

pgsqltest=# select format('>>%1$10s<<', 'Hello');
     format     
----------------
 >>     Hello<<
(1 row)

4. Proper handling of Negative century in to_char, to_date and to_timestamp functions

In pre-9.3, following function behavior for negative century was wrong or inconsistent
a. to_char
b. to_date
c. to_timestamp.

However, 9.3 is coming with proper fix/handling for negative century. Lets look at the output in pre-9.3 and in 9.3 for each functions.

pre-9.3

worktest=# select to_timestamp('4713-01-08 BC','YYYY-DD-MM bc');
         to_timestamp         
------------------------------
 01-AUG-13 00:00:00 -04:56:02
(1 row)

Above you can see that its displaying wrong result for BC. In 9.3

pgsqltest=# select to_timestamp('4713-01-08 BC','YYYY-DD-MM bc');
          to_timestamp           
---------------------------------
 4713-08-01 00:00:00-04:56:02 BC
(1 row)

Lets see for to_date functions. In 9.2/pre-9.3

worktest=# SELECT to_date('4713-01-08 BC','YYYY-DD-MM bc');
ERROR:  full year must be between -4712 and +9999, and not be 0
worktest=# 

In 9.3

pgsqltest=# SELECT to_date('4713-01-08 BC','YYYY-DD-MM bc');
    to_date    
---------------
 4713-08-01 BC
(1 row)

Above, you can see in 9.3,to_date function is working right. However to_date in 9.2 didn’t able to handle it. Similar behavior you can see for to_char function.

5. Improvement in pg_get_viewdef() to print new line after each SELECT and FROM entry

This is kind of enhancement made in pg_get_viewdef function of 9.3. And user will be able to see this enhancement in pg_dump plain dump too.

This enhancement is more like readability of output of view definition in 9.3 and reducing the line length of view definition. Lets look at how this improvement makes user experience of getting/viewing view definition much better.

Before 9.3 or in pre-9.3

worktest=# select pg_get_viewdef('pg_tables'::regclass);
                                                                                                    
                                                                                  pg_get_viewdef    
                                                                                                    
                                                                              
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------
 SELECT n.nspname AS schemaname, c.relname AS tablename, pg_get_userbyid(c.relowner) AS tableowner, 
t.spcname AS tablespace, c.relhasindex AS hasindexes, c.relhasrules AS hasrules, c.relhastriggers AS
 hastriggers FROM ((pg_class c LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) LEFT JOIN pg_
tablespace t ON ((t.oid = c.reltablespace))) WHERE (c.relkind = 'r'::"char");
(1 row)

Above you can see that pg_get_viewdef has printed defintion of view in one line, which not only make readability of view definition difficult. However also increase the line length.

Lets look at the pg_get_viewdef in 9.3

pgsqltest=# select pg_get_viewdef('pg_tables'::regclass);
                        pg_get_viewdef                        
--------------------------------------------------------------
  SELECT n.nspname AS schemaname,                            +
     c.relname AS tablename,                                 +
     pg_get_userbyid(c.relowner) AS tableowner,              +
     t.spcname AS tablespace,                                +
     c.relhasindex AS hasindexes,                            +
     c.relhasrules AS hasrules,                              +
     c.relhastriggers AS hastriggers                         +
    FROM ((pg_class c                                        +
    LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))  +
    LEFT JOIN pg_tablespace t ON ((t.oid = c.reltablespace)))+
   WHERE (c.relkind = 'r'::"char");
(1 row)

which seems me better in terms of readability and doesn’t have long line.

Enjoy!!

New in PostgreSQL 9.3: Server Side languages

In series of blogging about new features coming in PostgreSQL 9.3, I thought to blog about the server side language improvements in PostgreSQL. Lets see whats coming in server side language.

As PostgreSQL user, you know, PostgreSQL supports multiple server side language. In 9.3, there are some interesting features are coming. Lets look at the new improvements about to come.

1. SPI access to number of rows processed by COPY command.

This is more like new feature which is introduced in 9.3. Before 9.3, this feature was missing. i.e if user uses COPY command inside the plperl/plpython functions, then there was no way inside the procedure to trace the number of rows processed by COPY. However in 9.3, this limitation is no more exists with procedural language. There are many languages supported in PostgreSQL, However I chose to test this with mostly used language plperl and plpython. Below are some snapshot pre-9.3 and in 9.3.

Lets check with plperl. Following is a plperl function which can be use:

Definition of table is given below:

   Table "public.test_copy"
 Column |  Type   | Modifiers 
--------+---------+-----------
 id     | numeric | 

Content of data file:

cat /tmp/test.data
1
2
3
4
5

Following is a plperl function which can be use for testing in pre-9.3 and in 9.3

CREATE OR REPLACE FUNCTION test_copy() RETURNS integer
AS $$
    my $rv = spi_exec_query("COPY test_copy FROM '/tmp/test.data'");
    my $status = $rv->{status};
    my $nrows = $rv->{processed};
    return $nrows;
$$ LANGUAGE plperl;

If we execute COPY command on psql prompt, user will get message like given below:

worktest=# COPY test_copy from '/tmp/test.data';
COPY 5

which shows COPY has processed 5 rows and accordingly loaded in table.

If we use above plperl function in pre-9.3, user will get following result:

worktest=# select split_part(version(),' ',2) as version;
 version  
----------
 9.2.4.10
(1 row)

worktest=# select test_copy();
 test_copy 
-----------
         0
(1 row)

which shows function was not able to get the number of rows processed by COPY command.
However if we use same plperl function in 9.3, we will get following result:

postgres=# select split_part(version(),' ',2) as version;
 version  
----------
 9.3beta2
(1 row)

postgres=# select test_copy();
 test_copy 
-----------
         5
(1 row)

which shows that plperl function in 9.3 is able to get the number of rows processed.

Similarly we can use following plpython function to test this new feature:

CREATE FUNCTION result_copy_test(cmd text) RETURNS int
AS $$
plan = plpy.prepare(cmd)
plpy.info(plan.status()) 
result = plpy.execute(plan)
return result.nrows()
$$ LANGUAGE plpythonu;

pre-9.3

worktest=#  SELECT result_copy_test($$ COPY test_copy FROM '/tmp/test.data' $$);
INFO:  True
CONTEXT:  PL/Python function "result_copy_test"
 result_copy_test 
------------------
                0
(1 row)

which shows function was not able to get the processed rows by COPY command.

In 9.3

postgres=# SELECT result_copy_test($$ COPY test_copy FROM '/tmp/test.data' $$);
INFO:  True
CONTEXT:  PL/Python function "result_copy_test"
 result_copy_test 
------------------
                5
(1 row)

which shows function was able to get the processed rows of COPY command.

2. Allow GET DIAGNOSTICS x = ROW_COUNT to access rows processed by COPY
This is an enhancement in plpgsql, if user wants to access the rows processed by COPY command in plpgsql Block, then he can use
GET DIAGNOSTICS variable = ROW_COUNT, which was not possible before 9.3.

To check with plpgsql, user can use either create a sample function or can use following anonymous block to verify this feature. I will be using DO block of plpgsql to verify this feature.
Example is given below:
PL/pgSQL anonymous block

DO $$
  DECLARE
    r int; 
  BEGIN
     COPY test_copy FROM '/tmp/test.data';
     GET DIAGNOSTICS r = row_count;
     RAISE NOTICE 'processed rows => %',r;
  END;
$$ language plpgsql;

pre-9.3

worktest=# DO $$
worktest$#   DECLARE
worktest$#     r int; 
worktest$#   BEGIN
worktest$#      COPY test_copy FROM '/tmp/test.data';
worktest$#      GET DIAGNOSTICS r = row_count;
worktest$#      RAISE NOTICE 'processed rows => %',r;
worktest$#   END;
worktest$# $$ language plpgsql;
NOTICE:  processed rows => 0
DO

which shows pre-9.3 was not able to get processed rows.

In 9.3

postgres=# DO $$
postgres$#   DECLARE
postgres$#     r int; 
postgres$#   BEGIN
postgres$#      COPY test_copy FROM '/tmp/test.data';
postgres$#      GET DIAGNOSTICS r = row_count;
postgres$#      RAISE NOTICE 'processed rows => %',r;
postgres$#   END;
postgres$# $$ language plpgsql;
NOTICE:  processed rows => 5
DO
postgres=

In 9.3, plpgsql is able to get processed rows. Interesting.

3. Allow use of RETURN with a composite expression in PL/pgSQL.
This is new addition in PL/pgSQL and also useful for user who is reluctant to declare variable too much and wants RETURN in PL/pgSQL to return the expression.

This feature was not available pre-9.3 PL/pgSQL. However, in 9.3, user can use this easily. Below is an example.

create type footype as (x int, y varchar);

create or replace function foo() returns footype as $$
begin
  return (1, 'hello')::footype;
end;
$$ language plpgsql;

Lets try with pre-9.3 first ,
when user will try to create above function user will get following error message in pre-9.3:

worktest=# create or replace function foo() returns footype as $$
worktest$# begin
worktest$#   return (1, 'hello')::footype;
worktest$# end;
worktest$# $$ language plpgsql;
ERROR:  RETURN must specify a record or row variable in function returning row
LINE 3:   return (1, 'hello')::footype;

In 9.3, due to addition of above feature, this works great.

postgres=# create or replace function foo() returns footype as $$
postgres$# begin
postgres$#   return (1, 'hello')::footype;
postgres$# end;
postgres$# $$ language plpgsql;
CREATE FUNCTION
postgres=# 

And user can see the working output too:

postgres=# select foo();
    foo    
-----------
 (1,hello)
(1 row)

4. New in PL/Python
There are some good improvement coming in plpython too. These improvement will be useful for users who like to make plpython and wants to debug the function or wants to print some useful information. This was missing in pre-9.3. Lets look at whats new in plptyhon.

i. Addition of object string in Pl/Python
Before 9.3, whenever user wants to look at the information provided by object handler in Pl/Python, it doesn’t used to give useful information. However, In 9.3, this is going to give some useful information which will be helpful for Pl/Python users.

Lets look at example, how this change is useful. Following is small code which can be use for testing this new addition:

CREATE FUNCTION test_debug_info() RETURNS text AS $$
    try:
        rv=plpy.execute("SELECT datname FROM pg_catalog.pg_database",5)
        plpy.info(rv);
    except plpy.SPIError:
        return "Not working"
    else:
        return "Working good"
$$ LANGUAGE plpythonu;

pre 9.3

worktest=# select test_debug_info() ;
INFO:  <PLyResult object at 0x7f3594b8f270>
CONTEXT:  PL/Python function "test_debug_info"
 test_debug_info 
-----------------
 Working good
(1 row)

Above user can see INFO hasn’t given the very much useful information.

In 9.3

postgres=# select test_debug_info() ;
INFO:  <PLyResult status=5 nrows=3 rows=[{'datname': 'template1'}, {'datname': 'template0'}, {'datname': 'postgres'}]>
CONTEXT:  PL/Python function "test_debug_info"
 test_debug_info 
-----------------
 Working good
(1 row)

Above you can see INFO has given some useful information about status, number rows and rows return by the query. Very useful, if somebody wants to know what handler contains in Pl/Python.

ii. Conversion of OID values to proper Pl/Python numeric type
Before 9.3, Database type OID used to be treated as string in Pl/Python, which makes developers to convert string into number data type in plptyhon and do some processing on it, which is kind of extra coding. However in 9.3 they don’t have to worry any more.
Below is an example to test this feature.

CREATE FUNCTION test_type_conversion_oid(x oid) RETURNS oid AS $$
   plpy.info(x, type(x))
   return x +1
$$ LANGUAGE plpythonu;

Lets look at the pre 9.3

worktest=# select test_type_conversion_oid(123456);
INFO:  ('123456', <type 'str'>)
CONTEXT:  PL/Python function "test_type_conversion_oid"
ERROR:  TypeError: cannot concatenate 'str' and 'int' objects
CONTEXT:  Traceback (most recent call last):
  PL/Python function "test_type_conversion_oid", line 3, in <module>
    return x +1
PL/Python function "test_type_conversion_oid"

which shows direct number operations is not possible with OID in pre9.3, since OID in pre 9.3 is treated as string.

In 9.3

postgres=# select test_type_conversion_oid(123456)
postgres-# ;
INFO:  (123456L, <type 'long'>)
CONTEXT:  PL/Python function "test_type_conversion_oid"
 test_type_conversion_oid 
--------------------------
                   123457
(1 row)

This works great and direct number operation with OID is possible. A good addition in plpython.

iii. Handle SPI errors raised explicitly (with PL/Python’s RAISE) the same as internal SPI errors
This is new addition to plpython. Now in plpython function’s body, user can raise SPIError/exceptions using “raise” statement of plpython, which was missing in pre 9.3. Now in 9.3, user will be able to use “raise” statement to raise exceptions. Addition to this, if user sets the sqlstate attribute, plpython preserver that change.
Below is an example:

Following is definition of function which can be use for testing this addition in pre 9.3 and in 9.3

CREATE OR REPLACE FUNCTION plpy_raise_spiexception() RETURNS void AS $$
   raise plpy.spiexceptions.DivisionByZero()
$$ LANGUAGE plpythonu;

Lets use anonymous block of plpgsql to test how raise exception can be handle.
Before 9.3

DO $$
BEGIN
   SELECT plpy_raise_spiexception();
EXCEPTION WHEN division_by_zero THEN
   RAISE NOTICE 'Found Exception';
END
$$ LANGUAGE plpgsql;

ERROR:  spiexceptions.DivisionByZero: 
CONTEXT:  Traceback (most recent call last):
  PL/Python function "plpy_raise_spiexception", line 2, in <module>
    raise plpy.spiexceptions.DivisionByZero()
PL/Python function "plpy_raise_spiexception"
SQL statement "SELECT plpy_raise_spiexception()"
PL/pgSQL function inline_code_block line 3 at SQL statement

As you can see Before 9.3, user will raise statement was not working properly for raising exception in plpython.

In 9.3

postgres=# DO $$
postgres$# BEGIN
postgres$#    SELECT plpy_raise_spiexception();
postgres$# EXCEPTION WHEN division_by_zero THEN
postgres$#    RAISE NOTICE 'Found Exception';
postgres$# END
postgres$# $$ LANGUAGE plpgsql;
NOTICE:  Found Exception
DO

As you can see in 9.3. it works great!

pg_xlog_location_diff function for PostgreSQL/PPAS

In PostgreSQL 9.2, community has added a function pg_xlog_location_diff(), which is very useful for finding the difference between two xlog location in bytes and also useful for monitoring replication. Detail of this function is given in following link:

http://www.postgresql.org/docs/9.2/static/functions-admin.html#FUNCTIONS-ADMIN-BACKUP

However this function is not available for users of PostgreSQL/PPAS 9.0/9.1 users. So, I thought to write same function plpgsql so, that users can take benefit of same in 9.0/9.1.

Before using formula and developing function, lets understand what is xlog and offset.
Let’s consider user has used function pg_current_xlog_location() function and he gets following information:

worktest=# select pg_current_xlog_location();
 pg_current_xlog_location 
--------------------------
 1/D1012B80
(1 row)

In above, first field before forward slash is the hexadecimal value of logical xlog file and second field i.e. D1012B80 is hexadecimal offset inside the logical xlogfile.

So, whenerver user sees information for xlog location, he gets xlog information in following format:

(hexadecimal) xlog/ (hexadecimal) offset

To calculate the difference between two xlog location, user can use following formula, which gives difference in hexadecimal:

(FF000000 * xlog + offset) - (FF000000 * xlog + offset)

where FF000000 is max value of offset i.e In PostgreSQL/PPAS offset value can go from 00000000 to FF000000 and if we convert that in decimal, it gives maximum 4278190080 bytes (4080MB)

Based on above formula, following is plpgsql function which can be use to get the difference:

CREATE OR REPLACE FUNCTION pg_xlog_location_diff_sql( text, text)
 RETURNS numeric
 LANGUAGE plpgsql
AS 
 $function$
    DECLARE
       offset1 text;
       offset2 text;
       xlog1 text;
       xlog2 text;
       SQL text;
       diff text;
    BEGIN
       /* Extract the Offset and xlog from input in
          offset and xlog variables */
          
       offset1=split_part($1,'/',2);
         xlog1=split_part($1,'/',1);
       offset2=split_part($2,'/',2);
         xlog2=split_part($2,'/',1);
       
       /* Prepare SQL query for calculation based on following formula
         (FF000000 * xlog + offset) - (FF000000 * xlog + offset)
         which gives value in hexadecimal. Since, hexadecimal calculation is cumbersome
         so convert into decimal and then calculate the difference */
       
       SQL='SELECT (x'''||'FF000000'||'''::bigint * x'''||xlog1||'''::bigint 
                                +  x'''||offset1||'''::bigint)'||' 
                - 
                   (x'''||'FF000000'||'''::bigint * x'''||xlog2||'''::bigint 
                                +  x'''||offset2||'''::bigint)';
       EXECUTE SQL into diff;
       
       /* Return the value in numeric by explicit casting  */
       
       RETURN diff::numeric;
    END;
 $function$;
 

Usage example is given below:

worktest=# select pg_xlog_location_diff_sql(pg_current_xlog_location(),'1/D009F578');
 pg_xlog_location_diff_sql 
---------------------------
                  16230472
(1 row)

I hope this will help PostgreSQL/PPAS 9.0/9.1 users.