PostgreSQL comes with the ability to do incremental backups and point-in-time recovery out of the box. Read on to learn more about the settings and procedures to achieve this.
It Starts With WAL Files
WAL stands for Write Ahead Log. WALs are used in nearly all modern RDBMS systems to provide durable and atomic transactions.
Changes to the data contained in a PostgreSQL database cluster managed by a single PostgreSQL server process is possible only via transactions. The modifications made to the data by transactions are recorded as an ordered sequence of WAL records. These records are written into fixed-length files called WAL segment files, or simply WAL files.
WAL files live in $PGDATA/pg_wal
, where $PGDATA
is the data directory for
the database cluster. On a default Debian installation for example, the WAL file
directory for the main cluster is /var/lib/postgresql/10/main/pg_wal
. Here is
how it looks like:
WAL files are generated incrementally, in sequence, starting from cluster creation. They keep getting generated for as long as modifications happen to the cluster. The WAL file mechanism is essential to the working of PostgreSQL, and cannot be turned off.
After the changes are first written out as WAL records, they have to be applied to the on-disk representation of the data itself. This process is called checkpointing, and happens in the background automatically (it can also be forced manually). The point until which checkpointing was done is called the REDO point. Checkpointing is also an essential part of Postgres architecture and cannot be turned off.
WAL File Retention
In normal course of PostgreSQL server operation, WAL files will keep getting
written into the pg_wal
directory. But why have them around?
One reason is crash recovery. If the PostgreSQL server crashes and restarts, it starts applying changes from WAL records into the data files (checkpointing) since the last REDO point. This guarantees that the data files are consistent with the last completed transaction.
Another reason is related to streaming replication. Streaming replication works by sending WAL records over to standby servers, which store these locally and perform checkpoints. Standbys can lag behind the server they are replicating from (called the primary). For example, if the primary has generated 100 WAL records and the standby has received and applied the first 80, the most recent 20 are required to be available so that the standby can receive and apply from record 81 onwards.
But surely the very old WAL files can be deleted? Yes. PostgreSQL can be instructed to retain the most recent WAL files and delete the older ones. There are three relevant configuration options:
- wal_keep_segments - sets the minimum number of recentmost WAL files to be retained in the WAL file directory
- max_wal_size - specifies the maximum total size of WAL files in the WAL file
directory. If this is exceeded, older ones are deleted. However, there might
be reasons (including a high value for
wal_keep_segments
) that can prevent this setting from being honored. - min_wal_size - specifies a minimum total size for WAL files. As long as the actual size stays below this value, no files will be deleted.
In real life it is not possible, or required, to store all previous WAL files
under the pg_wal
directory.
WAL File Archival
The real value of WAL files is that they are a stream of changes that can be recorded and replayed to get a consistent replica of a PostgreSQL cluster. PostgreSQL provides a way by which we can copy out (or “archive”) each WAL file after it gets created – the archive_command configuration option.
This option specifies a shell command string that is invoked after each WAL file is created. Here are some examples:
There are 2 other options also, that must be set:
WAL Compression
You can compress the WAL files before copying them into a long-term/safe storage location. However, there an option called wal_compression. Turning this on will cause PostgreSQL to compress the individual WAL records within the WAL files. The WAL files itself will be of the same size (typically 16 MB), but will contain a sequence of compressed records rather than plain records.
Continuous Archiving
WAL archiving is also called continuous archiving and is in effect, incremental backup.
Before starting this process of incremental backup, a full backup is required. This establishes a baseline upon which WAL files can be incrementally restored. A full backup can be taken either by:
- shutting down the Postgres server process and copying the cluster data directory (while preserving permissions), or
- using the
pg_basebackup
on a running Postgres server.
Point-In-Time-Recovery (PITR)
PITR refers to PostgreSQL’s ability to start from the restore of a full backup, then progressively fetch and apply archived WAL files up to a specified timestamp.
To do this, we have to create a file called “recovery.conf” in the restored cluster data directory and start up a Postgres server for that data directory. The recovery.conf file contains the target timestamp, and looks like this:
The restore_command specifies how to fetch a WAL file required by PostgreSQL. It is the inverse of archive_command. The recovery_target_time specifies the time until when we need the changes.
When a PostgreSQL server process starts up and discovers a recovery.conf file in the data directory, it starts up in a special mode called “recovery mode”. When in recovery mode, client connections are refused. Postgres fetches WAL files and applies them until the recovery target (in this case, changes up to the specified timestamp) is achieved. When the target is achieved, the server by default pauses WAL replay (other actions are possible). At this point, you are supposed to examine the state of the restore and if everything looks ok, unpause to exit recovery mode and continue normal operation.
Putting It All Together
All that was a whole bunch of theory and text, let’s try it out to see how it all works in practice.
First let’s initialize a new cluster:
We’ll also create a directory that will serve as our safe storage location. Let’s call this “archive”.
We need to configure the archive settings we discussed earlier, before we can
start the server. So let’s add the following to the end of
clus1/postgres.conf
:
Our archive command simply copies the WAL file to the archive directory we created earlier.
We have also added the archive_timeout setting. Usually, a WAL file is created only when there are enough WAL records to fill a 16 MB WAL file. This means that for servers with few writes, you may have to wait a long time for a WAL file to be created. The setting archive_timeout tells Postgres that it must create a WAL file every so many seconds, irrespective of whether it is full or not.
Here we have set this to 60 (seconds), but this is only for the demo! You’d typically never want to keep it this low.
Let’s also make a copy of “clus1”. This is the equivalent of a full backup.
Now we can start the cluster:
Let’s add in some data.
Note that the time is now 14:05. Let’s check if our archive command is working:
Yes, we have one single archive file. Our last change was at 14:05, let’s now wait for a few minutes and then make some more changes.
So now we have added 100 more rows, at 14:16. Let’s stop the server:
and check our archive again:
Looks good. Now we’ll attempt to do a PITR recovery of clus2 up to the time 14:10.
First let’s edit clus2’s postgres.conf and add these lines at the end:
Inorder to replay the WAL files, we have to put the PostgreSQL server for clus2 (which we haven’t started yet) into recovery mode. To do this, create the file called “recovery.conf” in clus2:
This contains the restore_command which does the opposite of the earlier archive_command, namely copying the requested file from the archive directory to the pg_wal directory.
We’ve also set the recovery_target_time to 14:10.
Now we start clus2:
To see what happened, let’s examine the log file:
The recovery was quick (in real life, it may take hours or days) and the log states that it has stopped before a particular transaction (that has a timestamp of > 14:10). It also says that the recovery is paused and must be manually continued.
Let’s examine the data:
We see that there are only 10000 rows. At 14:16, we’d added 100 more, which haven’t appeared in the table.
This looks good, so let’s resume:
The log file now reports that the recovery is complete and normal operations are restored:
And we have successfully recovered the cluster up until a specified time!
Further Reading
Here are a few starting points to discover more about WAL archiving, recovery mode and PITR:
- Docs: Recovery Configuration
- Docs: Continuous Archiving and PITR
- Chapter 9 from “The Internals of PostgreSQL” book
- Tools: WAL-E, WAL-G, Barman
About pgDash
pgDash is a modern, in-depth monitoring solution designed specifically for PostgreSQL deployments. pgDash shows you information and metrics about every aspect of your PostgreSQL database server, collected using the open-source tool pgmetrics. pgDash provides core reporting and visualization functionality, including collecting and displaying PostgreSQL information and providing time-series graphs, detailed reports, alerting, teams and more.
pgDash can monitor replication, locks, queries and more. It can also provide quick diagnostics and extensive alerts. Learn more here or signup today for a free trial.