All posts by Denish Patel

An easy way to reduce outage window for PostgreSQL Upgrade!

PostgreSQL 9.2 release provides lots of great features. Recently, one of the clients at OmniTI required upgrade of their couple of PostgreSQL production databases running on PostgreSQL version 9.0 to PostgreSQL 9.2. The client is running database servers on Amazon EC2 instances. For the failover purpose, they run 3 instances in the setup, one instance is master database and next two instances are slave of master database. Alike others, they were looking for zero outage solution for PostgreSQL upgrade but unfortunately there isn’t one exist now!

There are couple of options for the PostgreSQL upgrade:

  1. pg_dump/pg_restore entire database
  2. Use pg_upgrade for in-place upgrade
  3. Use 3rd party replication system i.e Slony/Bucardo

Evaluation of options:

Option #1 :

Whenever, the upgrade requirement with minimum outage come to us, we always check option #2 . pg_upgrade provides ability to upgrade database without dump/restore all the data. Unfortunately, pg_upgrade –check test failed because the databases are using ltree data type. One of the limitations of pg_upgrade is that it does not work if the ltree contrib module is installed in a database.

Meanwhile, the requirement came from client that they want to consolidate both environment into one server so eliminate cost and maintenance for running 6 instances for 2  production databases and similar number of instances for stage environments. This new requirement eliminated option of even think about using pg_upgrade.

Option #2 :

Next option is to optimize dump/restore process in such a way that the total outage window can be minimized. I started collecting stats about their database size and large tables in the database and dump/restore timing.

1st production database (X DB) :

  • Total database size : 22GB
  • Top 2 large total tables size: 12GB
  • Dump/restore duration  : 25 minutes ( dropping indices before restore and creating at the end , with 4 parallel workers  for restore)
  • Estimated outage required with application testing: 30 minutes

2nd production database  (Y DB)  :

  • Total database size size:  50GB
  • Top 10  large total tables size: 45GB
  • Dump/restore duration   : 1.5 hours (dropping indices before restore and creating at the end , with 4 parallel workers  for restore)
  • Estimated outage required with application testing : 1.45 hours

Both of the above estimations are providing accurate dump/restore timing  because  I ran dump on existing prod servers and restore on proposed new server.

Option # 3:

I could use Slony or Bucardo replication systems for replicating tables for upgrade purpose but even though Slonly/Bucardo systems are around for a while , they are very complicated to setup, manage and debug in-case of the problems. It might be only me but I did not want to introduce complex replication system for upgrade purpose!

Further looking tables/schema usage:

Now, I started looking into more into optimizing dump/restore options. Digging into more details for large tables using pg_stat_all_tables and schema details , I collected following facts:

  1. For X DB, top 2 largest tables are Insert only tables. yay!
  2. For Y DB, same top 2 tables are insert only but there are some more large tables with insert/update/delete
  3. No Foreign Keys on the tables for both databases.

So, I started looking into Simple table level replication options. Let me make it clear that table replication still works with FKs but its is important to know because I don’t have to worry about disabling foreign keys during replication at table level and enable later.


I didn’t have to look far because Keith Fiske,one of my colleagues, recently came up with Mimeo extension. Mimeo is very simple OmniTi’s home grown replication system for replicating databases at table level over dblink between two PostgreSQL databases. Usually, Mimeo helps us replicating production instance to DataWarehouse system but in this case I decided to give a try to use for replicating tables temporarily during upgrade process.

Why Mimeo?

Mimeo is extremely easy to setup and understand because all code resides as sql/pgsql functions. Mimeo provides very good documentation but I will give you overall idea. Mimeo is installed as extension. All the code related to replication resides under mimeo schema and it tracks all the replication functions using another extension called pg_jobmon to keep track of functions executions.  For now, Mimeo supports  following replications:

  • Inserters : Replicate Insert only tables.
  • Updaters : Replicate tables based on updated_tsz. You could place trigger on source table on production to keep the updated_tsz updated but most of the times your application is already taking care of. This replication method does not support DELETEs on tables.
  • DML: This method supports Insert/Update/Delete on the table but the table should have Primary Key to keep track of the changes. Mimeo places trigger on source table on production database to keep track of rows into mimeo.tablename_pgq tables . A pull request from destination (replicated) table to fetch these rows from queue table to grab latest data only for changed rows and apply them on destination table.
  • Snap : Grab entire table from source table and truncate destination table to refresh completely. it’s very useful method for small tables.

That was brief overview of the Mimeo. The tool is still under development and looking for more testers and contributors.

For now, Let’s get back to upgrade !

Upgrade X DB:

First production environment was easy because there are only 2 large tables and both are INSERT-only tables. After you have packages installed for dblink, jobmon and mimeo , you could install them into database as extension.

On new database server on PostgreSQL 9.2:

create schema dblink;
create schema jobmon;
create schema mimeo;
create extension dblink schema dblink ;
create extension pg_jobmon schema jobmon ;
create  extension mimeo schema mimeo;

After mimeo installation and setting up new production DB servers on PostgreSQL 9.2 with master-slave setup, I followed following steps to upgrade X DB:

  1. Freeze schema changes on X DB production database server running on PostgreSQL 9.0.
  2. pg_dump entire database schema dump and restored on PostgreSQL 9.2 database
  3. pg_dump two large tables : t1 & t2 and restored on PostgreSQL 9.2 database
  4. Setup Mimeo replication for t1 and t2 tables using refresh_updater method.
  5. pg_dump all but t1 & t2 tables data and pg_restore with 4 parallel processes (-j 4) on PostgreSQL 9.2 with  (~15 minutes). To expedite the restore process, I dropped indices on couple of large tables before the restore and put it back after the restore.
  6. Reset sequences for t1 and t2 on PostgreSQL 9.2
  7. Open up upgraded database for applications!!

Keep in mind that only step 5, 6 & 7 needs to be executed during outage period. The total outage for upgrade of this production database environment was ~ 15 minutes .

Upgrade Y DB:

Second production environment is using same PostgreSQL cluster on new database server but different database name. This database is larger than first one and have more tables with all kinds of transactions.

After analyzing table stats, I came up with group of tables:

relation | size | n_tup_ins | n_tup_upd | n_tup_del
<< Replicate tables based on incremental Primary Key, No trigger required . inserter replication >>

1. t1 | 15 GB | 22310924 | 0 | 0 <——– insert only
2. t2 | 789 MB | 3176894 | 0 | 0 <——– insert only
3. t3  | 13 MB | 7379 | 0 | 0 <——– insert only

<<  Insert/Update/Delete on tables , DML replication>>

4. t4 | 4515 MB | 1233555 | 17966613 | 0 <–insert /update/delete
5. t5 | 2520 MB | 5004129 | 21599077 | 0 <–insert /update/delete
6. t6 | 1310 MB | 4041253 | 519 | 0 <–insert /update/delete
7. t7 | 1123 MB | 1020479 | 2050512 | 0 <–insert /update
8. t8 | 75 MB | 875275 | 19047 | 0 <–insert /update
9. t9 | 43 MB | 30509 | 8976539 | 0 <–insert /update
10. t10 | 22 MB | 12338 | 16201 | 0 <–insert /update
11. t11 | 12 MB | 11574 | 199 | 0 <–insert /update
12. t12 | 10168 kB | 12283 | 11192 | 0 <–insert /update

<< static tables >>

13.  t13 | 43 MB | 0 | 0 | 0 <– static tables, never changes
14. t14 | 39 MB | 0 | 0 | 0 <– static tables, never changes
15. t15 | 18 MB | 0 | 0 | 0 <– static tables, never changes
16. t16 | 16 MB | 0 | 0 | 0 <– static tables, never changes
17. t17 | 14 MB | 0 | 0 | 0 <– static tables, never changes

I followed same procedure to install mimeo on this database as described above but for DML replication you need to execute an extra step by creating mimeo schema on source (production database server running on PostgreSQL 9.0) with proper permissions for mimeo replication role. All _pgq tables and trigger functions on source tables reside under this mimeo schema on source database.

On source database server :

CREATE schema mimeo;
ALTER SCHEMA mimeo OWNER TO <mimeo_role>;
GRANT TRIGGER ON <source_table> TO <mimeo_role>;

After mimeo installation and creating up new production DB on same cluster of PostgreSQL 9.2 with master-slave setup, I followed following steps to upgrade Y DB:

  1. Freeze schema changes on Y DB production database server running on PostgreSQL 9.0.
  2. pg_dump entire database schema dump and restored on PostgreSQL 9.2 database
  3. Disable triggers on replicated table on destination database
  4. Setup replication trigger on DML group replicated tables using mimeo.dml_maker function by executing on PostgreSQL 9.2 database server.
  5. pg_dump 17 tables : t1 – t17  from source database and pg_restore on PostgreSQL 9.2 database
  6. Setup Mimeo replication for t1-t3 using refresh_updater and t4-12 tables using refresh_dml. I did not setup replication for static tables.
  7. pg_dump all but t1to t17 tables data and pg_restore with 4 parallel processes (-j 4) on PostgreSQL 9.2 with  (~5 minutes).
  8. Enable triggers and reset sequences for t1 to t17 on PostgreSQL 9.2
  9. Compare and Verify count and/or max(id) for t1 to t17 tables between PostgreSQL 9.0 database and upgraded PostgreSQL 9.2 database server
  10. Open up upgraded database for applications !!

As above, only steps 7-10  need to be executed during outage period. The total outage for upgrade of this production database environment was about 15 minutes.

In conclusion, Mimeo helped our client to upgrade their database servers with minimal outage.  Hopefully, it will help you on your next production database upgrade to reduce outage window.

2 PG Days in 2 days apart !!

Want to learn more about PostgreSQL? Are you at driving distance from Washington DC and/or Newyork area? If any of anwers is yes then you shound’t miss this opportunity to attend 2 PostgreSQL Days within 2 days apart in your area with plenty of knowledge sharing talks and networking events!br

1. Friday,March 30th,2012 PG Day in Washington DC

2. Monday,April 2nd,2012  PG day in Newyork

I will be attending both of the these days. If you are attending, will love to meet you in-person. br /br /See you soon!!

monitor bucardo replication lag using circonus

         I have been using circonus for monitoring, trending and alerting for any database metrics for quite a long time now. The circonus interface makes the monitoring, trending and alerting setup painless and you can see graph flowing in minutes. Another good thing about Circonus is that you can monitor anything that you can query from database ! This week, the task at my hand was to find a way to monitor bucardo replication lag.  `bucardo_ctl status sync_name` provides very important information that you can rely for trending and alerting purposes.

$ bucardo_ctl status my_slave
Sync name: my_sync
Current state: WAIT:22s (PID = 19500)
Type: pushdelta
Source herd/database: slave_herd / master_herd
Target database: my_slave
Tables in sync: 318
Last good: 23s (time to run: 1m 21s)
Last good time: Feb 29, 2012 15:27:14 Target: my_slave
Ins/Upd/Del: 142 / 0 / 0
Last bad: 1h 45m 9s (time to run: 19m 57s)
Last bad time: Feb 29, 2012 13:42:29 Target: my_slave
Latest bad reason: MCP removing stale q entry
PID file: /var/run/bucardo/
PID file created: Wed Feb 29 13:42:33 2012
Status: active
Limitdbs: 0
Priority: 0
Checktime: none
Overdue time: 00:00:00
Expired time: 00:00:00
Stayalive: yes Kidsalive: yes
Rebuild index: 0 Do_listen: no
Ping: yes Makedelta: no
Onetimecopy: 0

All the information provided by `bucardo_ctl status` command is important but most interesting thing to monitor is “Last good:”. Last good shows bucardo replication lag on slave server.

Trending in circonus:

Now, I have metric identified for the monitoring. Next step is to find the best way to put the metric into the monitoring tool. After some poking around the output and ways to monitor stuff, I decided to grab the SQL query from bucardo_ctl perl script and stick it into circonus monitoring. Most of the time spent for setting up this monitor was to grab right query from big perl script (bucardo_ctl) and mapping the metric required from the query. After that, here is the query that I plugged  into Circonus in no-time.

‘bucardo_last_good’ , round(extract(epoch FROM now()-ended))
(SELECT * FROM bucardo.q WHERE sync = ‘my_sync’ AND cdate >= now() – interval ‘3 days’
SELECT * FROM freezer.master_q
WHERE sync = ‘my_sync’ AND cdate >= now() – interval ‘3 days’) AS foo
WHERE ended is NOT NULL AND aborted IS NULL

Alerting in circonus:

bucardo_ctl status shows Last good status to “unknown” if replication is broken.

Name Type State PID Last_good Time I/U/D Last_bad Time
my_sync| P |WAIT:35s|7620| unknown | | |36s |1m58s

       In circonus, you could setup rules and relevant severity levels. The most important part is that if the query doesn’t return any row it should page(“unknown” condition) . Circonus provides rule for alerts in case of a metric is absent. Now, I am all set with alerts as well.

Yay! bucardo replication is under monitoring and trending without any hassle! Hopefully, this post will help you next time when you try to put bucardo replication lag under monitoring.

Half a decade with OmniTi, PostgreSQL, FOSS …

By end of this month, I am completing 5 years with OmniTI. Before joining OmniTI, I was an Oracle DBA and worked mostly with closed source databases. I am grateful & fortunate that Theo and Robert provided me opportunity and guidance to work and contribute with open source technologies and most importantly an open source database PostgreSQL ! At OmniTI, if you don’t have hesitation to learn new technlogy, you will have smooth learning experience alongside working with your known technology for the clients. Most importantly, you can have access to technology leaders in the office. Ironically, I never worked for single company longer than a year before joining OmniTI in my career but now I am looking forward to learn and contribute to FOSS community working with OmniTI.

Since joining, I haven’t spent a day without learning a new thing . Are you bored at routine work? Come join OmniTI’s adventure, here I work!

Bravo to OmniTI & PosgreSQL and FOSS community!!

What is pg_extractor ?

In my recent blog post, I wrote about PostgreSQL DBA Handyman toolset. In the list of tools, getddl is one of them. If you are using getddl to get DDL schema and track the daily changes in SVN for production databases, you should consider moving that process to use pg_extractor instead. pg_extractor is the more advance and robust tool for extracting schema as well data using pg_dump. Keith Fiske, an author of the tool, described tool in detail in his blog post. Thanks to Keith for making the schema extraction tool more robust and taking it to next level !

Hopefully, it will help you to have more control over your database in smarter way!

“Shrink” MS SQL Server transaction logfile

          Microsoft SQL Server transactions log file “too large” is common problem. We need to manually shrink the transaction log file as one time operation and after that setting up proper recovery policy and/or keeping mirror server available all the time , we could  avoid  infinite growth of transactions log in future.
          Transaction log file growth is dependent on one of the following Recovery method used for the instance:
 1. Simple
 2. Full
 3. Bulk-Logged

          If your database is in the Simple recovery mode, then the transaction log is truncated of inactive transaction after the checkpoint process occurs.  The checkpoint process writes all modified data pages from memory to disk.  When the checkpoint is performed, the inactive portion of the transaction log is marked as reusable.
          SQL Server 2005 databases are set to the Full recovery model by default.  With the Full or Bulk-Logged recovery mode, inactive transactions remain in the transaction log file until after a Checkpoint is processed and a transaction log backup is made.  It’s important to note that a full backup does not remove inactive transactions from the transaction log.  If  database recovery model is set to Full or Bulk-Logged, then it is absolutely IMPORTANT that we make transaction log backups to go along with  full backups because full backup does not remove inactive transactions from the transaction log. The transaction log backup performs a truncation of the inactive portion of the transaction log and allow them to used for future transactions. Note that truncation of log file does not  shrink already bloated log file that’s the operation we have to do manually one time  and if we set proper procedure in future, we could avoid unnecessary growth of log file. Other factors affecting transaction log to grow are:

  1. Replicated mirrored server falls behind master server
  2. Long running transactions

To solve #1:
          Database mirroring is uses transaction logs for replication in that it requires that the transactions remain in the log until the record has been written to disk on the mirror server. If the mirror server  falls behind the master server , the amount of active log space will grow. In this case, you might need to stop database mirroring, take a log backup that truncates the log, apply that log backup to the mirror database and restart mirroring.

To solve #2: ( it’s non-issue for most of the OLTP servers) :
          If there are open transactions, DBCC OPENTRAN will provide a session_id or SPID of the connection that has the transaction open.  You can pass this session_id to sp_who2 to determine which user has the connection open.

              Following queries can be ued to detemine recovery model and transaction log truncation interval:

SELECT name, recovery_model_desc  FROM sys.databases;

master SIMPLE
tempdb SIMPLE
model FULL
ReportServer SIMPLE
ReportServerTempDB SIMPLE

SELECT name, log_reuse_wait_desc FROM sys.databases;

ReportServer NOTHING
ReportServerTempDB NOTHING

          Results from above queries reveals that PRODUCTION_DB database is setup for FULL database backup and transaction log will not be truncated until LOG_BACKUP.  Make sure that you take full backup followed by log backup that will make transactions log file’s inactive blocks to be available for re-write and prevent from growing further. Keep in mind that it will not shrink transaction log file . To shrink the transaction log file you can follow steps described in official docs  OR following is the simple example of identifying and shrinking log file after taking log file backup:

                SELECT name FROM sys.database_files WHERE type_desc = ‘LOG’

         Once I have log file name, I can use the DBCC command to shrink the file. In the following command I try to shrink my log file down to 1GB.
                 DBCC SHRINKFILE (’PRODUCTION_DB_log.log’, 1000)     
       Run the shrink command couple of times during the off-peak hours. Also, please  make sure that your databases are NOT set to auto-shrink. Auto-shrink database might encounter performance problems.

          Hopefully, it will help you next time you come across transaction log file “too large” issue on MS SQL server installation.

Faster & Better VACUUM FULL

                In presentation, I discussed in detail about Bloat issue in PostgreSQL and methods to remove Bloat from the tables/indexes. Now a days, PostgreSQL9.0 is common and  the widely used version for the production use and it’s vital to remind about changes in most important bloat removal tool called “VACUUM FULL”. Until PostgreSQL 9.0, VACUUM FULL was tardy and DBA always stayed away from it and used CLUSTER instead. (Checkout presentation for difference between CLUSTER vs VACUUM FULL)
                The VACUUM FULL statement recovers free space from a table to reduce its size from bloated tables, mostly when VACUUM itself hasn’t been run frequently enough. Before PostgreSQL 9.0 , it was tardy and slow because of the way it was executed: records were read and moved one by one from their source block to a block closer to the beginning of the table. Once the end of the table was emptied, this empty part was removed. This method was very inefficient: moving records one by one creates a lot of random IO.  Additionally, during this reorganization, indexes had to be maintained, making everything even more costly, and fragmenting indexes. It was therefore advised to reindex a table just after a VACUUM FULL.
                Now, the VACUUM FULL statement, as of PostgreSQL 9.0, creates a new table from the current one, copying all the records sequentially. Once all records are copied, index are created back, and the old table is destroyed and replaced. This has the advantage of being much faster. VACUUM FULL still needs an EXCLUSIVE LOCK  during entire operation. The only drawback of this method compared to the old one, is that VACUUM FULL can use as much as two times the size of the table and indexes on disk, as it is creating a new versions of it.

Let’s compare run-time of VACUUM FULL on PostgreSQL 8.4 vs PostgreSQL 9.0

postgres=# create table vacuumtest(id int primary key);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index “vacuumtest_pkey” for table “vacuumtest”
postgres=# insert into vacuumtest select generate_series(1,10000000);
INSERT 0 10000000
postgres=# delete from vacuumtest where id%4=0;
DELETE 2500000
postgres=# vacuum vacuumtest;

On 8.4:
postgres=# vacuum full vacuumtest ;
Time: 61418.197 ms
postgres=# reindex table vacuumtest;
Time: 12212.815 ms

On 9.0:
postgres=# vacuum full vacuumtest ;
Time: 32640.714 ms

                Above results show that VACCUM FULL on PostgreSQL 9.0 is way faster than previous versions. Moreover, VACUUM FULL has couple of advantages over CLUSTER :  it’s faster than CLUSTER because it doesn’t have to build new table using ORDER by clause & you can run VACUUM FULL on tables on which there isn’t any index.
                If you are running any bloat removal tool on the production database, i would recommend you to revisit vacuum parameters and tighten them up a little that makes regular vacuum to run more frequent so it will help to reduce frequency of running more intrusive bloat removal tools!!