Tablespaces is a simple, useful and arguably under-appreciated feature of PostgreSQL. They allow DBAs to accomodate growing databases, tables with data different from others, indexes that need to be super fast and so on. Read on to learn more about how to you can use tablespaces in Postgres.
What is a Tablespace?
Let’s start by initialising a PostgreSQL data directory. Here’s how to do that:
This creates a directory, named “data”, that contains everything needed to start a PostgreSQL server. The contents of this directory look like this:
This is self-contained, including the configuration files. All the data for all the databases, as well as items like WAL files, notification queues and so on live in this directory. Needless to say, these files must be managed only via the PostgreSQL server. This directory is, or rather can be used as, the famous $PGDATA.
To bring up a server to serve off this directory, you can:
By default, when you create objects (like tables, indexes) which need on-disk storage, the Postgres server creates the required files somewhere inside $PGDATA. Turns out, this is actually a two step process, and actually goes like this: By default the Postgres server creates the required files in the default tablespace called pg_default, the location of which is the data directory $PGDATA.
Tablespaces are, in short, the way to tell the Postgres server where to place the physical files for SQL objects.
Tablespaces do not affect the logical SQL namespaces that the objects live in (like databases, schema, table, index) that the application developer sees. You cannot, for example, have 2 tables with the same name in the same schema just because they are in different tablespaces.
The Default Tablespaces
psql has a “\db+” command which lists tablespace in detail. Let’s have a look:
This says there are two tablespaces, owned by the user who did the initdb. But why two? Let’s refer to the fine manual:
Two tablespaces are automatically created when the database cluster is initialized. The pg_global tablespace is used for shared system catalogs. The pg_default tablespace is the default tablespace of the template1 and template0 databases (and, therefore, will be the default tablespace for other databases as well, unless overridden by a TABLESPACE clause in CREATE DATABASE).
The location of the default tablespaces is the same as the data directory, or $PGDATA.
Basically then, by default the server is going to place objects into the pg_default tablespace, which maps to the data directory itself.
Let’s try creating a new tablespace and see what happens – but first, we need to create a directory somewhere in the filesystem:
And then do:
We now have a new tablespace, of 0 size. Internally, within $PGDATA, Postgres stores this as a symlink:
The number 16385 is the OID of the tablespace.
Most “CREATE” SQL commands come with a “TABLESPACE” option using which you can specify the tablespace in which to create that SQL object. Let’s try a few:
Here’s what happened:
- We created a database called “foo” in the tablespace “space2”. The default tablespace for all objects in the database also becomes space2.
- The tables “footab1” and “footab2” are created in space2. You can explicitly specify the tablespace for the table, or use the database’s default.
- The table “footab3” is created in the pg_default tablespace. It is possible to have only some objects in another tablespace.
You can also move existing objects, from existing databases into a different tablespace. Here’s how we can move one of our tables into another tablespace:
You can also move all tables (or indexes) in one tablespace into another:
Affected tables are locked while being relocated.
A common use for tablespaces is to move indexes or tables onto a faster filesystem, for example, a new fast NVMe SSD or a EBS volume with higher IOPS. Informing the PostgreSQL query planner about how fast your new tablespace is lets it do a better job at estimating query execution times.
Typically, you’ll benchmark the sequential and random disk access performance for your tablespaces, and assign them relative numbers. For example, if your new tablespace is twice as fast for sequential and random disk access as compared to your “regular” tablespace, you can:
There is also one more option, effective_io_concurrency, which can speed up bitmap heap scans if set.
Temporary tables and indexes are created by PostgreSQL either when explicitly asked to (“CREATE TEMP TABLE..”) or when it needs to hold large datasets temporarily for completing a query.
It is possible to tell PostgreSQL to place such objects in a separate tablespace. For example, if too many temporary tables are created in normal course, it might be possible to speed up your queries by placing such objects in a tablespace with faster hardware, faster/unjournaled/uncompressed filesystems, or even in-memory filesystems.
Use the option temp_tablespaces to tell Postgres which tablespace to use for creating temporary tables.
You can make clients create objects in a specific tablespace by default. This can be set as an option in a connection string or a client startup script, so that all newly created objects go into a new tablespace.
default_tablespace is the option you want to use for this.
pg_basebackup to back up a PostgreSQL cluster that has multiple
tablespaces needs a couple of extra steps.
If you’re using tarball-format backup, each tablespace comes out as it’s own tarball (with the tablespace OID as the filename). While restoring, this must be restored to the same path (like “/tmp/space2”) that used to be present while restoring. This is probably a bit of a pain because the backup script needs to store this additional information also somewhere alongside the backup.
Here’s how the tarball backup happens:
For plain format backups, it is possible to specify a new location for each tablespace. The data from each tablespace is written out into a new location. Every tablespace (other than pg_default and pg_global) must be mapped to a new location in the command-line, like this:
Setting up a new standby for a primary that already has tablespaces involves bringing over the main data directory and each tablespace directories over to the standby. If you’re using pg_basebackup to do this, then use the plain format backup to also specify appropriate new locations for the tablespaces on the standby.
Creating a tablespace on the primary of a replicated server is bit tricker, because the paths for the new tablespace go over unmodified to the standby. The standby server expects an existing directory at the same location as in the primary, and creates a tablespace at that location. Typically, you’d want to:
- prepare and mount filesystems at both primary and standby, mount points have to be the same
- create empty directories within mount points if needed
- create tablespace at primary
It is not possible to alter the location of a tablespace directly – you can’t for example, change “/tmp/space2” to “/mnt/space2” through any command. However, it is possible to:
- stop the PostgreSQL server process
- update the symlink in
- start the PostgreSQL server process
What all can you use tablespaces for? Here are some issues that can be overcome with tablespaces:
- Database growth: If for whatever reasons you can’t grow the filesystem that your database lives in, you can create new tablespaces in other mounted filesystems, and move existing objects and/or use the new tablespace as future default for objects.
- Compression: Have a table with very compressible data? Try moving it to a tablespace on a ZFS filesystem that has compression enabled.
- Fast indexes: You can move indexes or tables onto another filesystem that is faster/has more provisioned IOPS to speed up important queries. Be sure to adjust the tablespace options to tell the query planner how fast the new tablespace is!
Monitoring Using pgDash
pgDash is an in-depth monitoring solution designed specifically for PostgreSQL deployments. pgDash shows you information and metrics about every aspect of your PostgreSQL database server, collected using the open-source tool pgmetrics.
Among other things, pgDash can display your tablespace metrics and their changes over time:
pgDash provides core reporting and visualization functionality, including collecting and displaying PostgreSQL information and providing time-series graphs and detailed reports. We’re actively working to enhance and expand pgDash to include alerting, baselines, teams, and more.