One of the databases that support our application has some tables where data is updated very frequently. In PostgreSQL world, this means the table must be vacuumed very frequently to remove the old versions of the updated rows, clearing up space for new rows or new versions of existing rows. On very frequently updated tables, where insertions occur occasionally, but updates very often (several per second), recent versions of PostgreSQL should trigger their auto-vacuum daemon at a very high rate, like every 2 or 3 minutes, for those tables.
Despite this, our database kept growing like crazy, with no apparent reason. I decided to investigate what was going on.
The vacuum verbose
is a neat command you can use in the PostgreSQL console to check what’s going on with the built-in cleaning service. This will run a vacuum on every table, giving you extensive report on how many used and unused rows exist in each table, and how many of the unused were deleted and their space reclaimed by PostgsreSQL.
The most interesting output of this command are the lines (one per table) like: DETAIL: 2942 dead row versions cannot be removed yet.
. This means you have, in this example, 2942 rows on that table that are not used any more, but for some reason, vacuum was not able to delete them.
Browsing the web, I found out the most probable cause of this issue is the existence of a transaction that started a long time ago, and never finished. PostgreSQL will only delete rows older than the start timestamp of the oldest transaction still running. Newer versions might be needed by those transactions, so they are kept in place, even if their death sentence is already declared.
So, I started looking at what was going on inside the database engine, searching for those pending transactions. A lot of useful information about the current status of a PostgreSQL database can be found in many tables that exist in memory during PostgreSQL execution, that can be queried as any regular table. The one that was important for me was pg_stat_activity
, which contains a list of all the current transactions.
By running the select * from pg_stat_activity;
SQL command, I saw that each instance of my application had create two database connections, and for each instance, one of the connections had a transaction that started at the time the application instance was launched, but never committed or rolled back.
After some time of poking around, reading code, and talking to the right people, the problem was revealed. By default, WebObjects applications will open two connections to each database they talk to. One for the actual data queries, and the other to ask the database, at startup time, for what’s called the jdcb2info. This is a bunch of information needed to map between PostgreSQL and Java data types. And, due to one of those very old bugs that strangely nobody fixed yet that make WO fun, the transaction where that information is requested is never committed. So, the transaction will remain open, and PostgreSQL won’t vacuum a thing until you quit your application.
Although there’s not yet a solution for the problem itself (hopefully there will be, as I filled a bug in Radar), you can avoid it, by using the jdcb2info bundled in the PostgreSQL plugin instead of requesting it from the DB. To do this, simply append ?useBundledJdbcInfo=true
to your DB connection URL, like jdbc:postgresql://localhost/dbname?useBundledJdbcInfo=true
. This way, the second connection is not created at all, and no pending transaction will occur.
And of course, after doing this, the DB happily vacuumed hundreds of thousands of dead rows and stopped growing like crazy. Also, a tip: if you really need to vacuum the database right now, but you can’t stop your apps and interrupt service, simply obtain the pids of the PostgreSQL processes that are maintaing the offending connections from the pg_stat_activity
table, and kill them, using the unix command kill <pid>
command. The connections will die, but your apps will not.