Get Number of Segments in PostgreSQL 9.0 (PL/Perl)

In my Blog on get_number_of_segments, I had used the simple query to find the number of segments.
Since, from PostgreSQL 9.0, system admin function pg_relation_filepath can be use to find the location of relfilenode, therefore I thought to reduce the code of get_number_function and use the pg_relation_filepath function.
Following is a modified plperl function get_number_of_segments(text) using system admin function pg_relation_filepath(relation reglcass):

CREATE OR REPLACE FUNCTION get_number_of_segments(text) returns  table(tablename text, segments int)
as
$$
 my $sql = spi_prepare("select 'ls -1 '||pg_relation_filepath(\$1)||'.*' as cmd",'TEXT');
 my $q = spi_query_prepared($sql,$_[0]);
 my $rv = spi_fetchrow($q);
  my $cmd = $rv->{rows}[0];
  my $command = $cmd -> {location};
  open(CMD, "$command |");
    $count = <cmd>;
  close(CMD);
    @count=split(/[\n\r]+/,$count);

  return_next {'tablename' => $_[0] , 'segments' => $count[0]+1};
  spi_freeplan($sql);
  return undef;
$$ language plperlu;

Some Examples:

psql -c " select * from get_number_of_segments('public.test');"
  tablename  | segments
-------------+----------
 public.test |        1
(1 row)

psql -c " select * from get_number_of_segments('public.testa');"
ERROR:  relation "public.testa" does not exist at line 4.
CONTEXT:  PL/Perl function "get_number_of_segments"

PL/Perl Getting Data From External File in PostgreSQL/PPAS

Till PostgreSQL 9.0, PostgreSQL doesn’t have any feature which can be use to read the external file (a concept of External Table).

External Table is a feature of Database using which, any one can read the flat files as if it were in a table in the database.

So, I thought give an idea how user can implement this in Database using plperl.

Let’s see how you can implement it.

I have a flat file, which has following data:

Filename: test.data
Location: /tmp/test.data

1,Check1
2,Check2
3,check3
4,check4
5a,check5
5a,check5

Since, I know this flat file is having two field (numeric and text), so,I can write a plperl function, which can read the file and can get the attribute(column values) of a line on the basis of passed delimiter, with this if there is any bad record, that will go to the bad file for verification if required.

Following is a sample plperl function:

CREATE OR REPLACE FUNCTION read_external(text,text) returns table(id numeric, col1  text)
as
$$
## Written by Vibhor to Read external file
my $badfile = ">>"."$_[0]".".bad";
open (BADFILE,$badfile);
open (DATFILE, $_[0]);
while ($line=<datfile>) {
  chomp($line);
  @line=split($_[1],$line);

if ($line[0] =~ /^[0-9]+$/ && $line[0] !="")
{
    return_next({id => $line[0], col1 => $line[1]}); }
else {
  print BADFILE "$line\n";
  }
}
close(DATFILE);
close(BADFILE);
return undef;
$$ language plperlu;

And I have made a view which can be use to SELECT the records in flat file as a table. Following is a result

postgres=# create or replace view external_table as select * from read_external('/tmp/test.data',',');
CREATE VIEW
postgres=# select * from external_table;
 id |  col1
----+--------
  1 | Check1
  2 | Check2
  3 | check3
  4 | check4
(4 rows)
And I know, there is a bad record: 5a,check5, which I can find in bad file:

postgres=# \! cat /tmp/test.data.bad
5a,check5

PostgreSQL/Postgres Plus advanced Server Process Details

Following are some of the important Process of PostgreSQL/Postgres Plus Advanced Server.

vibhore@ubuntu:~/edb/edb-postgres$ ps -eaf|grep postgre
1002      1080     1  0 06:07 ?        00:00:00 /opt/PostgresPlus/8.4AS/bin/edb-postgres -D /opt/PostgresPlus/8.4AS/data (Mandatory)
1002      1101  1080  0 06:07 ?        00:00:00 postgres: logger process   (Mandatory)
1002      1103  1080  0 06:07 ?        00:00:01 postgres: writer process                                               (Mandatory)
1002      1104  1080  0 06:07 ?        00:00:01 postgres: wal writer process (Mandatory)
1002      1105  1080  0 06:07 ?        00:00:00 postgres: autovacuum launcher process  (Optional if autovacuum is on)
1002      1106  1080  0 06:07 ?        00:00:00 postgres: stats collector process      (Mandatory)

Above is an snapshot of the processes which exist in PostgreSQL/Postgres Plus Advanced Server.
Postmaster: is a superior process among others its the process which keep monitor the other process and also spawns new process if user request comes to it. It is the process which is responsible for Host Based Authentication and User Based Authentication. This process also checks the consistency of database and accordingly does the recovery of PostgreSQL Instance.

Logger Process: This process is responsible for logging the details of activity from the startup to shutdown of the PostgreSQL Instance. Logging behavior of this process can be control by following catagories of parameters:
Category: Where to Log: Parameters:
1. log_destination (string): This parameter tells logger process about methods for logging server messages, including stderr, csvlog and syslog.
2. logging_collector (boolean): This parameter allows messages sent to stderr, and CSV-format log output, to be captured and redirected into log files.
3. log_directory (string): This parameter specifies directory of logfile.
4. log_filename (string): When logging_collector is enabled, this parameter sets the file names of the created log files. The value is treated as a strftime pattern, so %-escapes can be used to specify time-varying file names.
5. log_rotation_age (integer): This parameter determines the maximum lifetime of an individual log file.After this many minutes have elapsed, a new log file will be created. Set to zero to disable time-based creation of new log files. This parameter can only be set in the postgresql.conf file or on the server command line.
6. log_rotation_size (integer): this parameter determines the maximum size of an individual log file. After this many kilobytes have been emitted into a log file, a new log file will be created.
7. log_truncate_on_rotation (boolean): This parameter will cause PostgreSQL to truncate (overwrite), rather than append to, any existing log file of the same name. However, truncation will occur only when a new file is being opened due to time-based rotation, not during server startup or size-based rotation.
8. syslog_facility (enum): This parameter determines the syslog “facility” to be used. You can choose from LOCAL0, LOCAL1, LOCAL2, LOCAL3, LOCAL4, LOCAL5, LOCAL6, LOCAL7; the default is LOCAL0. See also the documentation of your system’s syslog daemon.
9. syslog_ident (string): This parameter determines the program name used to identify PostgreSQL messages in syslog logs. The default is postgres.
10. silent_mode (boolean): This parameter is set, the server will automatically run in background and disassociate from the controlling terminal. This parameter can only be set at server start.

Category: When to Log: Parameters:
Following are parameters which controls the message level:
1. client_min_messages (enum)
2. log_min_messages (enum)
3. log_error_verbosity (enum)
4. log_min_error_statement (enum)
5. log_min_duration_statement (integer)
Detail about the above parameter values can be found in following link:
http://www.postgresql.org/docs/8.4/static/runtime-config-logging.html#RUNTIME-CONFIG-SEVERITY-LEVELS

Category: What to Log: Parameters:
1. Debug Parameters:
debug_print_parse (boolean)
debug_print_rewritten (boolean)
debug_print_plan (boolean)
These parameters enable various debugging output to be emitted. When set, they print the resulting parse tree, the query rewriter output, or the execution plan for each executed query.
2. debug_pretty_print (boolean): This indents the messages produced by debug_print_parse, debug_print_rewritten, or debug_print_plan. This results in more readable but much longer output than the “compact” format used when it is off.
3. log_checkpoints (boolean): This checkpoints to be logged in the server log. Some statistics about each checkpoint are included in the log messages, including the number of buffers written and the time spent writing them.
4. log_connections (boolean): This parameter causes each attempted connection to the server to be logged, as well as successful completion of client authentication.
5. log_disconnections (boolean): This outputs a line in the server log similar to log_connections but at session termination, and includes the duration of the session
6. log_duration (boolean): This parameter causes the duration of every completed statement to be logged.
7. log_lock_waits (boolean): It controls whether a log message is produced when a session waits longer than deadlock_timeout to acquire a lock.
8. log_statement (enum): It controls which SQL statements are logged. Valid values are none, ddl, mod, and all. ddl logs all data definition statements, such as CREATE, ALTER, and DROP statements. mod logs all ddl statements, plus data-modifying statements such as INSERT, UPDATE, DELETE, TRUNCATE, and COPY FROM. PREPARE, EXECUTE, and EXPLAIN ANALYZE statements are also logged if their contained command is of an appropriate type.
9. log_temp_files (integer): It controls logging of use of temporary files. Temporary files can be created for sorts, hashes, and temporary query results. A log entry is made for each temporary file when it is deleted.
10. log_timezone (string): It sets the time zone used for timestamps written in the log.Unlike timezone, this value is cluster-wide, so that all sessions will report timestamps consistently.

writer process: This is a BG (Back Ground) Writer process. This process writes dirty shared buffer to Disk on following events:
1. When Checkpoint occurs.
2. When Checkpoint timeout happens
3. When WAL Writter finishes the number segments mention in Checkpoint_segments.
4. When there is no space left in shared buffer for new Blocks.

Parameters which can help in controlling the activity of BG Writter activities if given below:
1. bgwriter_delay (integer): This parameter specifies delay between activity rounds for the background writer. In each round the writer issues writes for some number of dirty buffers (controllable by the following parameters). It then sleeps for bgwriter_delay milliseconds, and repeats.
2. bgwriter_lru_maxpages (integer): This parameter specifies max number of buffers will be written to disk in each round. Setting this to zero disables background writing (except for checkpoint activity).
3. bgwriter_lru_multiplier (floating point): This parameter use in estimating of the number of buffers that will be needed during the next round. Formula which use is given below:
(average recent need) * bgwriter_lru_multiplier
Dirty buffers are written until there are ((average recent need) * bgwriter_lru_multiplier) many clean, reusable buffers available.

wal writer process: This process writes all the transaction from WAL Buffer to WAL files in pg_xlog. Parameters which can control the WAL Writter is given below. It is similar to log writer in Oracle (which flush redo log buffer to redo logfile). Parameter which can be use to control the behavior of WAL Writer can be found in following link:
http://www.postgresql.org/docs/8.3/static/runtime-config-wal.html

stats collector process:
The collector can count accesses to tables and indexes in both disk-block and individual-row terms. It also tracks the total number of rows in each table, and information about vacuum and analyze actions for each table. It can also count calls to user-defined functions and the total time spent in each one.
Information of paramter which can be use to control its activity can be find in following link:
http://www.postgresql.org/docs/8.3/static/runtime-config-statistics.html

autovacuum process:
Autovacuum is a process, which performs following activity:
1. To recover or reuse disk space occupied by updated or deleted rows.
2. To update data statistics used by the PostgreSQL query planner.
3. To protect against loss of very old data due to transaction ID wraparound.

Till 8.2, default value of this parameter is off. From 8.3 onwards, its value is on, which means now, its default process from 8.3 Onwards.

sessionwatch: A small tool to monitor session activities of PostgreSQL

Generally DBA, has to login in system to monitor the activities by querying the pg_stat_activity veiw. However if someone wants to do continous monitor for every 2 seconds. He has to run SQL against pg_stat_activity every 2 seconds.

So, For such activity, I have made following shell script which can be use with watch command on linux to monitor the activities.


#!/bin/bash

## Prepared by Vibhor Kumar (EDB).

Q="select procpid,now() - query_start as long, waiting, current_query from pg_stat_activity where procpid pg_backend_pid() limit 10;"
while getopts "b:U:d:p:" opt; do
 case $opt in
 b) BIN=$OPTARG;;
 U) PGUSER=$OPTARG;;
 d) DB=$OPTARG;;
 p) port1=$OPTARG;;
 *) echo "Usage: $0 -b -d -U -mp -sp ";;
 esac
 done

if [ -z $BIN ];then
 echo "Usage: $0 -b bin directory -d database name -U user -p port"
 exit
fi
if [ -z $PGUSER ];then
 echo "Usage: $0 -b bin directory -d database name -U user -p port"
 exit
fi

if [ -z $DB ];then
 echo "Usage: $0 -b bin directory -d database name -U user -p port"
 exit
fi
if [ -z $port1 ];then
 echo "Usage: $0 -b bin directory -d database name -U user -p port"
 exit
fi

function verifybin_connect()
{
 if [ -f $BIN/psql ];then
 :
 else
 echo "ERROR: psql Not Found!"
 usage
 exit 1
 fi
 Q1="SELECT 'ping';"
 $BIN/psql -U $PGUSER -h localhost -p $port1 -c "$Q1" -d $DB -t >/dev/null 2>/dev/null
 if [ $? -ne 0 ];then
 echo "ERROR: $DB is not pinging on $host"
 exit 1
 fi
}

verifybin_connect
$BIN/psql -c "$Q" -U $PGUSER -d $DB -p $port1

Usage is given below:

vibhore@ubuntu:~$ ./sessionwatch
Usage: ./sessionwatch -b bin directory -d database name -U user -p port

Example of monitoring the activity is given below:

watch ./sessionwatch -d edb -U enterprisedb -p 5444 -b /opt/PostgresPlus/8.4AS/bin
Every 2.0s: ./sessionwatch -d edb -U enterprisedb -p 5444 -b /opt/PostgresPlus/8.4AS/bin Sun Apr 24 09:25:50 2011

procpid | long | waiting | current_query
---------+------------------------+---------+---------------
 1292 | 00:00:25.582061 | f |
 1314 | 03:13:59.158917 | f |
 1172 | 00:00:04.899938 | f |
 1324 | 08:18:58.855179 | f |
 1330 | 00:00:18.448067 | f |
 1819 | 2 days 06:11:09.667475 | f |
 1836 | 00:00:00.637558 | f |
(7 rows)

PL/Perl Functions for Getting number of segments and Executing the Shell Commands

 
I was going through the one of user posting over EnterpriseDB Forum, on which user has asked if there is any function exists in PostgreSQL/Postgres Plus Advanced Server, which can be use for Finding the number of segments of a table/relation on server.

So, I thought to write about it. Currently, PostgreSQL/Postgres Plus Advanced Server doesn’t come with any such function. However, if some one wants, he can write a plperl function which can find the number of segments of a table for him, as I have done in following sample code:

Perl Program to get the number of segments of a table:

CREATE OR REPLACE FUNCTION get_number_of_segments(text,text) returns  table(tablename text, segments int)
as
$$
 my $sql = "select 'ls -1 '||case reltablespace when 0 then setting||'/base/'||pg_database.oid||'/'||relfilenode||'.*|grep -v vm|grep -v fsm|wc -l' else (select spclocation||'/'||pg_database.oid||'/'||relfilenode||'.*|grep -v vm|grep -v fsm|wc -l'  from pg_tablespace where pg_class.reltablespace=pg_tablespace.oid) END as location from pg_class, pg_database,pg_settings,pg_namespace where name='data_directory' and relname='$_[0]' and datname=current_database() and pg_namespace.oid=pg_class.relnamespace and nspname='$_[1]';";
    my $relname = $_[1].".".$_[0];
  my $rv = spi_exec_query($sql);
  my $nrows = $rv->{processed};
  if ($nrows == 0) {
   return_next{'tablename' => $relname , 'segments' =>0};
   return undef;
}
  my $cmd = $rv->{rows}[0];
  my $command = $cmd -> {location};
  open(CMD, "$command |");
    $count = <cmd>;
  close(CMD);
    @count=split(/[\n\r]+/,$count);

  return_next {'tablename' => $relname , 'segments' => $count[0]+1};
  return;
$$ language plperlu;

Example of usage:

select * from get_number_of_segments('test2','enterprisedb');
     tablename      | segments
--------------------+----------
 enterprisedb.test2 |        1
(1 row)

Similarly some times people also ask about a function which can execute Shell Command connecting to database. Following is a sample code which can be use for executing shell command using plperl function.

Pl Perl Program to execute Shell Command:

CREATE OR REPLACE FUNCTION execute_shell(text) returns setof text
as
$$
$output=`$_[0] 2>&1`;
@output=split(/[\n\r]+/,$output);
foreach $out (@output)
{ return_next($out);
}
return undef;
$$ language plperlu;

Example:

edb=# select execute_shell('ls -ltr');
                               execute_shell
---------------------------------------------------------------------------
 total 100
 -rw------- 1 enterprisedb enterprisedb     4 Apr 12 11:18 PG_VERSION
 drwx------ 2 enterprisedb enterprisedb  4096 Apr 12 11:18 pg_twophase
 drwx------ 4 enterprisedb enterprisedb  4096 Apr 12 11:18 pg_multixact
 -rw------- 1 enterprisedb enterprisedb  1631 Apr 12 11:18 pg_ident.conf
 drwx------ 2 enterprisedb enterprisedb  4096 Apr 12 11:18 pg_subtrans
 drwx------ 2 enterprisedb enterprisedb  4096 Apr 12 11:18 pg_clog
 drwx------ 3 enterprisedb enterprisedb  4096 Apr 12 11:18 pg_xlog
 drwx------ 7 enterprisedb enterprisedb  4096 Apr 12 11:21 base
 -rw------- 1 enterprisedb enterprisedb   963 Apr 13 11:54 server.key.org
 -rw------- 1 enterprisedb enterprisedb   887 Apr 13 11:54 server.key
 -rw------- 1 enterprisedb enterprisedb   749 Apr 13 11:54 server.csr
 -rw-r--r-- 1 enterprisedb enterprisedb 19474 Apr 13 11:55 postgresql.conf
 -rw------- 1 enterprisedb enterprisedb   928 Apr 13 11:58 server.crt
 -rw------- 1 enterprisedb enterprisedb    77 Apr 13 12:01 postmaster.opts
 -rw------- 1 enterprisedb enterprisedb  3452 Apr 13 12:29 pg_hba.conf
 drwx------ 2 enterprisedb enterprisedb  4096 Apr 16 02:47 pg_tblspc
 drwx------ 3 enterprisedb enterprisedb  4096 Apr 16 02:47 tblspc_1
 -rw------- 1 enterprisedb enterprisedb    55 Apr 16 04:32 postmaster.pid
 drwxr-xr-x 2 enterprisedb enterprisedb  4096 Apr 17 00:00 pg_log
 drwx------ 2 enterprisedb enterprisedb  4096 Apr 17 02:33 global
 drwx------ 2 enterprisedb enterprisedb  4096 Apr 17 13:20 pg_stat_tmp
(22 rows)
edb=# select execute_shell('ls -lt t');
                 execute_shell
------------------------------------------------
 ls: cannot access t: No such file or directory
(1 row)

ssh: connect to host xxx.xx.xx.xxxx port 22: connection refused

While configuring Ubuntu for PostgreSQL, when I was trying to connect Ubuntu server, I got following error message:

edbs-MacBook-Pro:~ vibhor$ ssh vibhor@172.16.82.130  
ssh: connect to host 172.16.82.130 port 22: Connection refused  

To resolve this issue, I have found that installation of Ubuntu 10.10 doesn’t have openssh-server installed due to which Message was coming, so I have installed openssh-server using following command as given below:

root@ubuntu:~# apt-get install openssh-server  
Reading package lists... Done  
Building dependency tree         
Reading state information... Done  
The following packages were automatically installed and are no longer required:  
  linux-headers-2.6.35-22 linux-headers-2.6.35-22-generic  
Use 'apt-get autoremove' to remove them.  
Suggested packages:  
  rssh molly-guard openssh-blacklist openssh-blacklist-extra  
The following NEW packages will be installed:  
  openssh-server  
0 upgraded, 1 newly installed, 0 to remove and 7 not upgraded.  
Need to get 302kB of archives.  
After this operation, 819kB of additional disk space will be used.  
Get:1 http://us.archive.ubuntu.com/ubuntu/ maverick-updates/main openssh-server i386 1:5.5p1-4ubuntu5 [302kB]  
Fetched 302kB in 5s (52.0kB/s)           
Preconfiguring packages ...  
Selecting previously deselected package openssh-server.  
(Reading database ... 147353 files and directories currently installed.)  
Unpacking openssh-server (from .../openssh-server_1%3a5.5p1-4ubuntu5_i386.deb) ...  
Processing triggers for ureadahead ...  
Processing triggers for ufw ...  
Processing triggers for man-db ...  
Setting up openssh-server (1:5.5p1-4ubuntu5) ...  
Creating SSH2 RSA key; this may take some time ...  
Creating SSH2 DSA key; this may take some time ...  
 * Stopping OpenBSD Secure Shell server sshd                                                             [ OK ]   
ssh start/running, process 17665  

After installation, I was able to connect to Ubuntu Server:

edbs-MacBook-Pro:~ vibhor$ ssh vibhore@172.16.82.130  
vibhore@172.16.82.130's password:   
Linux ubuntu 2.6.35-28-generic #49-Ubuntu SMP Tue Mar 1 14:40:58 UTC 2011 i686 GNU/Linux  
Ubuntu 10.10  
Welcome to Ubuntu!  
 * Documentation:  https://help.ubuntu.com/  
7 packages can be updated.  
0 updates are security updates.  
  
 
The programs included with the Ubuntu system are free software;  
the exact distribution terms for each program are described in the  
individual files in /usr/share/doc/*/copyright.  
  
Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by  
applicable law.  

To fix the above ssh connection issue, user can execute following command to check if ssh daemon is running or not:

ps -eaf|grep sshd  

If sshd is running fine, user can also check if firewall is suppressing the port 22.

netstat -nav|grep :22 

After fixing the issue from firewall, user can restart the services as given below:

/etc/init.d/ssh restart