All posts by Denish

Preventing Human Errors in Postgres

Whenever discussion on Disaster Recovery Planning for the databases comes up, “Preventing & Recovering from Human Error” topic arises because human tends to make mistakes and engineers are humans :)  Having best DBAs around and with restrictions &  best practices/policies in-place,  you still can’t afford to ignore the fact – “an engineer could make mistake”.  For disaster recovery planning , this factor should be taken into consideration.

Fixing or recovering from human errors, specially inside databases, is costly , which is   accepted fact :(  Having better access, restrictions and good practices in-place could help to avoid some of the silly mistakes but you never know !! What if the Superuser (DBAs or anyone having super privileges) makes the mistake?

While dealing with databases, there are number of unknown and unimaginable ways, the engineer could make  mistakes, but I’m going to talk about couple of known and specific mistakes and how Postgres features could be used  to prevent those mistakes:


Last week, Payal, one of my colleagues , asked me ; Is there anyway to prevent superuser from dropping  production database? My immediate response was: superuser(DBAs) can’t afford to make that mistake !!!!

She asked me another question; What if I convert production db to template database so it can _NOT_ be dropped even you are superuser? As you can see below, template database can not be dropped:

testdb=# drop database template1;
ERROR: cannot drop a template database

My  immediate response was;  hrmm.. there  could be security issue. The topic looked interesting to me so I started digging further into the topic on making production db to template database. As the Document suggests,  datistemplate can be set to indicate that a database is intended as a template for CREATE DATABASE. If this flag is set, the database can be cloned by any user with CREATEDB privileges; if it is not set, only superusers and the owner of the database can clone it.

As you might have realized now , there is a little risk because a user with CREATEDB can close the production database. Fortunately, CREATEDB is not default permission, you have to grant it explicitly.  The gain of preventing DROP DATABASE is much higher than underlying security issue. This is still experimental thought !! If you see any problems setting production db to template database, please shout out!  The below example converts the “proddb” to template database by setting datistemplate flag to true and try to drop the database using superuser:

testdb=# update pg_database set datistemplate=true where datname=’proddb';
testdb=# drop database proddb;
ERROR: cannot drop a template database

Preventing DROP TABLE

After coming up possible fix on DROP DATABASE, we decided to investigate further on similar ways of preventing dropping table accidentally !!

Actually, Postgres 9.3 introduced new feature “System level Trigger“. This can be used to prevent table drop in production. The below example shows how to prevent specific table to be dropped from the database:

Function to prevent single table drop

RETURNS event_trigger
LANGUAGE plpgsql
AS $$
obj record;
FOR obj in SELECT * from pg_event_trigger_dropped_objects()
IF obj.object_name = ‘users’ AND obj.schema_name = ‘testing’ THEN
RAISE EXCEPTION ‘command % is disabled for this table’, tg_tag;

Place System Level Trigger

CREATE EVENT TRIGGER confirm_drop on sql_drop WHEN TAG IN (‘DROP TABLE’) EXECUTE PROCEDURE confirm_drop();

Let’s try to drop USERS table…

postgres=# drop table testing.users;
ERROR: command DROP TABLE is disabled for this table
STATEMENT: drop table testing.users;
ERROR: command DROP TABLE is disabled for this table

The above output shows even superuser can not drop the table. You could use similar approach to prevent TABLE DROP on SCHEMA or ALL TABLES in the production database.

However,  I can think off two possible downside of this approach ; (i) The system level trigger feature is not available until Postgres 9.3 (ii) You have to take the system level trigger into consideration while  actually dropping individual table or  any table in schema or databases.

I’d still rely on best practices for making sure the human errors can be avoided by putting required restrictions in place and following best practices for schema changes . Though, PostgreSQL allows us technical capability to prevent these mistakes by providing very low overhead technical solutions. Is it worth to take advantage of it? If you are planning on attending PGConf NYC or PGCon in Ottawa this year, let’s discuss in-person?

Thoughts/Comments/Concerns are welcome.

Postgres Security & PGConf NYC 2014

As per my knowledge, I haven’t heard or attended  Postgres conference with dedicated “Security” track . Rarely, you find one or two talks about security over couple of Postgres conferences but  never a dedicated track !  It is  indisputable to say that  Security is one of the most critical areas to consider for selecting  database for the application. I’m really excited  to attend and speak  about Postgres Security features at PGConf  NYC 2014 conference, which is going to be held on April 3-4, 2014 in New York and will have dedicated “security” track ! Hats off to organizers  :)

My employer, OmniTI, is one of the Gold sponsors for the event.  If you are planning to attend the conference, please use code “OMNITINYC14” at checkout, it will allow you to receive a 10% discount on both the Early Bird and Regular ticket prices – the code is valid through March 15th.

If you want to make friends in Postgres community and eager to learn  advance features  in Postgres & more specifically Security features,  you probably don’t want to miss out this rare opportunity!

Send me note, if you want to catchup in-person in NYC.  Looking forward to see you there !

Can Postgres 9.2 upgrade cause “Out of memory” error?

Earlier this week, we got an email from with the need of “a serious Postgres expert” to solve the problem they have been experiencing and blocker for number of projects at hand. They heard good things about OmniTI from technology clubs and communities in NYC. is an online charity that makes it easy for anyone to help students in need. Public school teachers from every corner of America post classroom project requests on their site, and you can give any amount to the project that most inspires you.

This year, in beginning of July , they migrated Postgres database server from virtual hardware to high capacity bare-metal server and upgraded their databases from Postgres 8.2 to Postgres 9.2. As everyone hope after upgrade, website was much faster in response time and they should be happy after upgrading their database to Postgres 9.2. That is the case for them as well :)  Yes, they are happy Postgres user except some of the queries used to run without any issue  are causing Out of Memory errors now ! Sometimes, the queries were causing segmentation fault by Signal 11 :(

Weird, right ?

Here is the email received that describes the problem:

We’ve been happy Pg users for years now and have a pretty good command of what’s going on. We recently upgraded to 9.2.x and moved onto new hardware at the same time. Everything’s screaming fast as we’d hoped and working well, but… Now our most-intensive queries are failing with an “out of memorySQL state: 53200″ error. Not in production mind you, these are long-running queries we execute manually against a slave to infrequently do big exports.

On our old Pg version 8.2.x and much skimpier hardware, the job would take forever but complete, which was fine for the purpose. Now it’s on newer software with much more memory and CPU, but failing to complete. It seems to be failing on reports that use temporary tables that weren’t analyzed, and large queries during “hash join” and “merge join” operations. We’ve surely got something configured wrong, but we’ve been banging our heads against the wall and are out of ideas, eg. we’ve tried cranking work_mem way up, disabling hashjoin, no dice.

We requested to have conference call to get more details but we couldn’t able to attend conference call next day because of next day scheduled visit to NYC .  When we mentioned that we are visiting NYC tomorrow, they requested us, if we could stop by their Office ! We said yes because their office is on the way to OmniTI’s NYC office :)

In late evening, a day before visit, we sent out email to get some background and more details about environment so we could come up with some ideas before meeting:

1. OS version : uname -a
2. Postgres Version(please provide output of this query): select version();
3. Total Memory on server : free -m
4.  Actual error with query from pg_log. If you could provide log file with contents 1 hour before and after actual error, that will be helpful.
5. postgresql.conf file(s) for pre and post upgrade. Postgrs 8.2 and Postgres 9.2
6. Output of show commands in production  : show all;
7. Explain output of query on pg9.2 . If you have explain output from earlier Postgres 8.2 version available, that will be helpful as well.

While riding on the train next day, we received answers on the questions via email.
Excerpts from log file…

2013-10-08 18:24:43 EDT [13131]: [4-1] user=XXX,db=dc_query ERROR: out of memory
2013-10-08 18:24:43 EDT [13131]: [5-1] user=XXX,db=dc_query DETAIL: Failed on request of size 24.
2013-10-08 19:21:12 EDT [2001]: [2527-1] user=,db= LOG: server process (PID 17415) was terminated by signal 11: Segmentation fault
2013-10-08 19:21:12 EDT [2001]: [2528-1] user=,db= DETAIL: Failed process was running:

We analyzed answers and it looked like the problem characteristics kind of matches with the similar situation that we came across for other client.
Problem characteristics:

  1. Upgraded database from Postgres 8.2 to Postgres 9.2
  2. Query is failing with Out of Memory
  3. Explain plan is damn big ! The query is doing lots of joins !

The only but big difference was that we did not come across segmentation fault. We discussed about the problem internally in the team. Everyone kind of agree that it’s issue with max_locks_per_transaction. max_locks_per_transaction controls the average number of object locks allocated for each transaction; individual transactions can lock more objects as long as the locks of all transactions fit in the lock table. By default, max_locks_per_transaction is set to 64, which means that Postgres is prepared to track up to ( 64 X number of open transactions) locks. The problem is that Postgres 8.2 with same setting query runs without any issue but query fails with Out of Memory error on Postgres 9.2.  Not good, right?

After morning meetings in NYC, we visited DonorsChoose office and as we planned to test the first thing is to change max_locks_per_transaction parameter with higher value and restart database to reload the settings, we just gave it a try on QA database server.

You know what, the change solved the problem !

They couldn’t able to run query since they upgraded to Postgres 9.2, but now they could :)  While running the query, we noticed that the query required around 150 locks but the default value is only 64. The query is joining 50+ tables because it is data-warehosue query and having couple of unions!

You must be thinking that why the query worked on Postgres 8.2 but it doesn’t work on Postgres 9.2 ? My theory behind is that max_locks_per_transaction value wasn’t being honored until the implementation of true serialization isolation level and inclusion of max_pred_locks_per_transaction parameter in Postgres 9.1. If you know any concrete reason, I will be happy to learn more about it.

In any case, if you come across similar problem of “Out of Memory” error  after upgrading your database server to Postgres 9, you know the first thing to check is max_locks_per_transaction!

A week(s) of Conferences!

This week, I will be attending Surge (Scalability Conference) in Washington D.C. Next week, I will be speaking/attending  PgOpen (Postgres conference)  in Chicago. If you are planning to attend any of these conferences, it will be nice opportunity to meet in-person and catch up on technologies & stories specifically about databases !

See you soon!!

Postgres 9.2: monitoring temp files generation in real time

With other great new features, PostgreSQL 9.2 makes DBAs life easy by providing more metrics  in statistics collector views. Out of those , pg_stat_database caught my eyes because it provides a lot more details compare to Postgres 9.1 and other previous versions.

Postgres 9.1

postgres=# \d pg_stat_database
View "pg_catalog.pg_stat_database"
Column | Type | Modifiers
datid | oid |
datname | name |
numbackends | integer |
xact_commit | bigint |
xact_rollback | bigint |
blks_read | bigint |
blks_hit | bigint |
tup_returned | bigint |
tup_fetched | bigint |
tup_inserted | bigint |
tup_updated | bigint |
tup_deleted | bigint |
conflicts | bigint |
stats_reset | timestamp with time zone |

Postgres 9.2

postgres=# \d pg_stat_database
View "pg_catalog.pg_stat_database"
Column | Type | Modifiers
datid | oid |
datname | name |
numbackends | integer |
xact_commit | bigint |
xact_rollback | bigint |
blks_read | bigint |
blks_hit | bigint |
tup_returned | bigint |
tup_fetched | bigint |
tup_inserted | bigint |
tup_updated | bigint |
tup_deleted | bigint |
conflicts | bigint |
temp_files | bigint |
temp_bytes | bigint |
deadlocks | bigint |
blk_read_time | double precision |
blk_write_time | double precision |
stats_reset | timestamp with time zone |

As you have noticed above, pg_stat_database has number of  new columsn in  Postgres 9.2 compare to previous versions.  I’m here to discuss temp_files and temp_bytes columns.

Documentation is self explanatory :
                  temp_files : Number of temporary files created by queries in this database. All temporary files are counted, regardless of why the temporary file was created (e.g., sorting or hashing), and regardless of the log_temp_files setting.

                  temp_bytes: Total amount of data written to temporary files by queries in this database. All temporary files are counted, regardless of why the temporary file was created, and regardless of the log_temp_files setting.

postgres=# select datname,temp_files,temp_bytes from pg_stat_database where datname='XXX;
datname | temp_files | temp_bytes
XXX | 107309 | 4650188504602
(1 rows)

As you are aware that temp_files generation could cause performance issues . Historically, it was possible to track temp files generation by enabling log_temp_files setting in postgresql.conf and analyze logs using pg_badger. Now, Postgres 9.2 allows you to monitor temp files generation in real time! It is trivial to add above query to start monitoring, trending and alerting using Circonus (See below graph) .

Left Y:  counter of number of files generated
Right Y: counter of temp files size (in MB) generated


Inserting JSON data into Postgres using JDBC driver

EDIT:  Marcus(1st comment provider) helped me to write much cleaner and secure code. It doesn’t require CAST function and uses  PGobject with jdbc’s setObject. You could download updated code from git-repo. Thanks Marcus !!

One of the clients of OmniTI requested help to provide sample application to insert JSON data into Postgres using Java JDBC driver . I’m not Java expert so it took a while for me to write a simple java code to insert data. TBH, I took help to write test application from one of our Java engineers at OmniTI. Now, test application is ready and next step is to make it work with JSON datatype ! After struggling a little to find out work around for string escaping in JAVA code, I stumbled upon data type issue! Here is the test application code to connect to my local Postgres installation and insert JSON data into sample table:

postgres=# \d sample
Table "public.sample"
Column | Type | Modifiers
id | integer |
data | json |
denishs-MacBook-Air-2:java denish$ java -cp $CLASSPATH PgJSONExample
-------- PostgreSQL JDBC Connection Testing ------------
PostgreSQL JDBC Driver Registered!
You made it, take control your database now!
Something exploded running the insert: ERROR: column "data" is of type json but expression is of type character varying
Hint: You will need to rewrite or cast the expression.
Position: 42

After some research , I found out that there is no standard JSON type on java side so adding support for json to postgres jdbc is not straight forward ! StackOverflow answer helped me for testing out the JSON datatype handling at psql level. As Craig mentioned in the answer that the correct way to solve this problem is to write a custom Java mapping type that uses the JDBC setObject method. This can be a tricky though.  A simpler workaround is to tell PostgreSQL to cast implicitly from text to json:
postgres=# create cast (text as json) without function as implicit;

The WITHOUT FUNCTION clause is used because text and json have the same on-disk and in-memory representation, they’re basically just aliases for the same data type. AS IMPLICIT tells PostgreSQL it can convert without being explicitly told to, allowing things like this to work:

postgres=# prepare test(text) as insert into sample (data) values ($1);
postgres=# execute test('{}');
postgres=# select data from sample;
(1 row)

Awesome ! That worked :-) Let’s try similar approach in Java application code.

denishs-MacBook-Air-2:java denish$ export CLASSPATH=/usr/share/postgresql/java/postgresql-9.2-1002.jdbc4.jar:
denishs-MacBook-Air-2:java denish$ javac -classpath $CLASSPATH
denishs-MacBook-Air-2:java denish$ java -cp $CLASSPATH PgJSONExample
-------- PostgreSQL JDBC Connection Testing ------------
PostgreSQL JDBC Driver Registered!
You made it, take control your database now!
postgres=# select * from sample;
id | data
1 | {"username":"denish","posts":10122,"emailaddress":""}
(1 row)

Yay! It worked as well :-)

Next in my list to figure out installing PL/Java on Mac and/or Linux !! Let me know, if you have instructions for installation and test application using PL/Java.

When was the database created in Postgres cluster ?

Continuous Integration (CI)  using automated open source tools such as Jenkins and Hudson  is getting adoption rapidly. These tools help developers to gain confidence for creating more robust code rapidly by improving testing and QA process. The flexibility of these softwares add other challenges for the DBAs!

One of our client came across challenge to cleanup databases after X number of days from the Jenkins CI database because each run create seperate database and database names are not standard because they are provided by users. If they don’t cleanup old database, the cluster will have hundreds of databases at the end of the week. We tried to standardize database names but you can’t control users to make mistakes or input db names :-) On the other hand, Postgres’s system catalog view doesn’t provide database creation date.  How can I find out databases older than X days and drop them?

I came across this blog entry that answers my question but I was looking for easier way! I found easier way to get the database creation time with single query! Yay  :-) Following is the query that can be used to find the database creation time. The query should return correct created_date as long as you haven’t run pg_upgrade on the data directory.  I thought to share here so it will be useful for others!

SELECT datname, (pg_stat_file(‘base/’||oid||’/PG_VERSION’)).modification AS datcreated
FROM pg_database;


postgres=# SELECT datname, (pg_stat_file(‘base/’||oid||’/PG_VERSION’)).modification AS datcreated
postgres-# FROM pg_database;
datname | datcreated
template1 | 2013-03-28 16:04:13-04
template0 | 2013-03-28 16:04:14-04
postgres | 2013-03-28 16:04:14-04
rangetypes | 2013-03-28 16:14:42-04
puppet | 2013-03-28 16:23:13-04
omniti | 2013-04-20 10:02:22-04
(6 rows)


Ideally,  pg_database system catalog view should include database_created timestamp ! Hopefully, that day will come sooner than later :-)

Feel free to comment, if you have any other ideas for getting this details or you see any corner cases with above query that I haven’t mentioned here  :-)