PostgreSQL comes with the tools needed to move parts of your database off to other disks, other PostgreSQL servers, or even on to other systems like Oracle or MySQL.
Read on to see how you can handle common data distribution use cases with PostgreSQL.
Moving Tables to Other Disks
Want to move a few tables or indexes off to another disk because they’re getting too big? How about moving some indexes to a higher-IOPS EBS volume to speed up some of those queries? Tablespaces provide a way to do this, without having to let the application (or even the application developers!) know.
Initially, all of a Postgres server’s data lies within a single directory (called the data directory). Any SQL objects you create, including databases, live here. This “space” is default tablespace pg_default. You can tell PostgreSQL that it can use another location in the filesystem, say “/mnt/disk2/pgdata” also for creating SQL objects, like this:
Having created the tablespace, you can move tables and indexes into this space:
PostgreSQL will lock the table or index for the duration of the move, and will physically move over the data from the old tablespace to the new. How long the operation will take will depend mainly on the read/write rates of the source and destination disks.
All regular SQL statements work fine regardless of which tablespace the objects belong to. You can mix objects from any tablespace in a transaction, there are no restrictions or reduction in ACID guarantees.
You can read more about tablespaces in this article and in the docs.
Replicating an Entire Server
Need a read replica for your entire server? Basically another server that is in realtime sync with the main server but can only be read from? Be able to run heavy daily analytics jobs without impacting the app? You need streaming replication.
This feature allows you to maintain a hot standby (Postgres jargon for read replica) of a primary (master) server. Streaming replication is an all-or-nothing proposition though – you get the entire database cluster and cannot pick and choose parts of it. This also means its ideal for maintaining a failover candidate, which can be promoted to master on short notice.
Here’s how to configure the PostgreSQL (v10) of an Ubuntu 18.04 to be a hot standby:
First, let’s create a replication slot at the primary:
And then on the slave server, delete the old data and take a fresh backup from the master:
Assuming the user called “alice” can login with the replication protocol on the primary, you now have a consistent snapshot of the master. If we now startup the server on the slave, it should start, catch up and remain in sync with the master:
You can read more about streaming replication here.
Streaming replication is a heavyweight solution, suitable for cases when you definitely want the whole data replicated. If you want only a part of it though, there are other easier options.
Accessing Tables in Other Postgres Servers
Let’s say you need to move a few tables to another server because they have become too big. Or maybe all the objects in a certain schema should be moved to another server as part of a migration. Foreign Data Wrappers (FDW) can be a solution, or an intermediate step in a longer migration plan.
FDW allows you to create virtual tables on a PostgreSQL server, that are proxies of the real ones on another. For example if there is a database “demodb” with the table “demotable” on the server 192.168.0.100, we can do this in another server:
First, the extension “postgres_fdw” is created with the CREATE EXTENSION command (this extension is present by default). Then a foreign server, known locally as “demoserver” is setup with it’s connection options. A user mapping is created to map the local user “alice” to the remove user “alice”. Password and other connection options can also be specified here. Finally, the table “demotable” is imported, and data is queries from it.
No special changes are required on the “source” server for FDW. Applications running locally will be able to execute regular SQL transactions without being aware that “public.demotable” is in fact physically present in another server.
The FDW feature is a generic way of accessing data stored in other systems. (Fun fact: it’s based on a SQL standard). You can use FDW to push and pull data from other systems too.
Pushing Data to Other Systems
Let’s say you have triggers that create records in an audit trail table, but you’d like those records to be pushed to say, a Cassandra cluster. Or you have some source-of-truth data in an Oracle database but the app can talk only to your Postgres server.
There are many existing FDW extensions which you can use to interface with systems like Cassandra or Oracle. You can also write your own to suit your needs.
Replicating Selected Tables
What if you want to replicate only one table to a different server? Or maybe a few tables to different datacenters for improved latency for local apps? You can do this with logical replication.
Logical replication was introduced in v10. It’s a lightweight replication mechanism, and can be used independently of streaming replication. This allows tables in independent servers to be kept in sync logically. The “logically” implies that the effect of replication is similar to what you’d get if you replayed the SQL commands at the destination servers – this allows features like the destination table having more columns that the source table.
Here’s how it works. First, at the source end, create a publication:
The publication can be subscribed to anyone, and they can receive updates to the SQL objects (in this case the table “planets”) being made at the publisher’s end.
Here’s how you can subscribe to the publication:
Logical replication allows for simpler, lightweight replication, and can be used in addition to streaming replication if required. Read more about it in the docs.
About pgDash
pgDash is an in-depth monitoring solution designed specifically for PostgreSQL deployments. pgDash shows you information and metrics about every aspect of your PostgreSQL database server, collected using the open-source tool pgmetrics.
pgDash provides core reporting and visualization functionality, including collecting and displaying PostgreSQL information and providing time-series graphs, detailed reports, alerting and more.