Jay Taylor's notes
back to listing indexPablowe » Blog Archive » Approximating Postgres Replication Delay
[web search]Approximating Postgres Replication Delay
Coming from the MySQL world, I’m used to being able to easily determine the replication delay (in seconds) via the SHOW SLAVE STATUS command:
mysql> show slave status\G *************************** 1. row *************************** ... Seconds_Behind_Master: 0 ... 1 row in set (0.00 sec)
Unfortunately, there is no such comparable command in PostgreSQL. The official docs propose the following approach:
$ psql -c "SELECT pg_current_xlog_location()" -h192.168.0.10 (primary host) pg_current_xlog_location -------------------------- 0/2000000 (1 row) $ psql -c "select pg_last_xlog_receive_location()" -h192.168.0.20 (standby host) pg_last_xlog_receive_location ------------------------------- 0/2000000 (1 row) $ psql -c "select pg_last_xlog_replay_location()" -h192.168.0.20 (standby host) pg_last_xlog_replay_location ------------------------------ 0/2000000 (1 row)
The problem with this is that it’s not an atomic unit of work and is GUARANTEED to be inaccurate (it can be done quickly, minimizing the margin of error). Also, it involves remote connections to multiple hosts, which I’m not interested in for my use case. If we want to calculate this difference from the master we can simply issue the following command:
postgres=# SELECT pg_stat_replication.*, pg_current_xlog_location() FROM pg_stat_replication; -[ RECORD 1 ]------------+------------------------------ procpid | 15839 usesysid | 16384 usename | repl application_name | postgres client_addr | 10.0.0.2 client_hostname | client_port | 50353 backend_start | 2012-08-30 00:52:48.957179+00 state | streaming sent_location | 0/5530528 write_location | 0/5530528 flush_location | 0/5530528 replay_location | 0/5530528 sync_priority | 0 sync_state | async pg_current_xlog_location | 0/5530528
and compare pg_current_xlog_location with the write location. This approach works, but will become a bit more interesting in 9.2, with the addition of the pg_xlog_location_diff function, which calculates the difference between two transaction log locations.
Getting the replication delay from only the slave, though, is a bit more circuitous. We’ll need to use the pg_last_xact_replay_timestamp() function, which gets time stamp of last transaction replayed during recovery. This by itself won’t be terribly interesting, but we can approximate the delay as follows:
postgres=# SELECT NOW() - pg_last_xact_replay_timestamp() AS replication_delay; replication_delay ------------------- 00:00:00.893494 (1 row)
Here, the replica is less than one second behind the master. In a very busy system, with many writes per second, this number will remain fairly accurate. However, in a system where there are few writes, the “replication_delay” will continually grow because the last replayed transaction timestamp isn’t increasing (this is generally the same limitation as MySQL’s SHOW SLAVE STATUS output). We can get around this, though, by introducing an external heartbeat, which will minimize the drift that will manifest with this approach.