Last week, I received request to increase PostgreSQL’s config parameter default_statistics_target on one of the very large DB clusters. The developers filled a ticket to increase the default_statistics_target to 10,000 at cluster level because one of the queries has started full sequencial scan instead of index scan.
By default, Postges cluster sets default_statistics_target to 100 but you can set up to 10,000. This parameter defines how many values are to be stored in the list of most common values, and also indicates the number of rows to be inspected by ANALYZE process.
Out of curiosity, I dig into Postgres ANALYZE code to figure out how many rows is being scanned by ANALYZE based on default_statistics_target value. As you can see in line 1763, ANALYZE inspects rows up to multiple of 300 & default_statistics_target.
1763 stats->minrows = 300 * attr->attstattarget;
If the default_statistics_target value is 100, ANALYZE could scan up to 300 * 100= 30,000 rows to gather accurate statistics.
As per Postgres documentation on ANALYZE, if you increase the value blindly at cluster level it could hurt the smaller queries because query planning time could increase significantly by setting the default_statistics_target value to maximum, which can eventually degrade the peformance for simple queries. The next logical step for me to find out optimal value for the default_statistics_target for the specific table with slow query. I will probably start with increasing value to 1000 and increase/decrease value based on EXPLAIN ANALYZE query results.
When you try to tune default_statistics _target for your database server, hopefully this finding will help. Let me know, if you have alternate views or idea for tuning the setting. Thanks!