PostgreSQL, while being an modern and versatile RDBMS, is not the easiest of beasts to setup and get going while you’re looking to develop an application. Read on to learn about how you can get started with the latest version of PostgreSQL on the LTS version of Ubuntu.
Ubuntu 18.04 comes with PostgreSQL 10, but we can instead use the APT repository hosted by the PostgreSQL team to install the latest version, PostgreSQL 11.
You can setup the repository using these commands:
And then install the software itself, using:
The installation does a few things:
- It installs the PostgreSQL server, utilities and a command-line client called psql.
- It creates a Linux system user called postgres. All data files are owned by this user, and all processes run as this user.
- It creates one database, also called postgres.
- It creates one PostgreSQL user (not the Linux system user), also called postgres.
You can see this is beginning to get confusing!
Your newly installed PostgreSQL server consists of a set of processes that manage what is known as a “database cluster”. You can see the processes here:
The main process (here with PID 4737) is the main process which further spawns the child processes – this is conventionally called the “postmaster” process.
PostgreSQL’s usage of the term “cluster” predates modern-day fancy distributed clustering jargon – here it just means a set of databases being managed on a single machine by a single postmaster. Read more about clusters here.
A cluster contains databases (for now we have just one, “postgres”) and PostgreSQL users (again, just one for now, also called “postgres”) among other things. Just so you know, the cluster is associated with a whole bunch of data files, all of which live under one single directory – in this case under /var/lib/postgresql/11/main. Did you notice this path in the postmaster command line above?
When your app, or psql, connects to Postgres, it has to do so in the context of a PostgreSQL user. There is always a PostgreSQL user associated with a connection. But, as you might have guessed by now, a PostgreSQL user may or may not correspond to a system user.
System Users and PostgreSQL Users
When any application tries to connect to Postgres, it needs to supply a PostgreSQL user name. Let’s see what happens when you launch a PostgreSQL client like psql:
Here, “alice” is your Linux system user name. psql takes this name and uses it as the Postgres user name. A role (roles are a sort generic name for “user” or “group”, BTW) by that name does not exist, which is what Postgres is complaining about.
We know that there is a role by the name “postgres”, so let’s try that. We can use the “-U” parameter of psql for specifying the username:
OK, we’re getting closer – the role/user “postgres” exists, but “peer authentication” has failed. What is this “peer authentication”?
Peer and Password Authentication
PostgreSQL clients like psql or your app can connect to the PostgreSQL server over one of these IPC mechanisms:
- Unix domain sockets
- TCP sockets
Unlike TCP sockets, Unix domain sockets offer the possibility of validating the client connection’s system user id. The Postgres server can examine an incoming connection over a Unix domain socket and determine the client’s system user ID and then decide whether to grant it access or not.
By default, your sever only listens for connections over unix domain sockets and not TCP/IP.
Let’s see what happens if we try to start psql as the system user postgres:
That worked! (Use “\q”, “\quit” or
^D to exit psql, BTW.)
In peer authentication, if the client connection is made using a Unix domain socket and the client process has the same system user name as the PostgreSQL user it is trying to connect as, then the authentication is considered successful.
PostgreSQL users also can be optionally assigned a password, and you can ask PostgreSQL to validate incoming connections using the password. But how? That is the next piece of the puzzle.
It’s now time to open the (in)famous pg_hba.conf configuration file, located at
HBA stands for host-based authentication. Basically, this file is used to control how PostgreSQL users are authenticated. This file is probably the most non-intutive part of the PostgreSQL learning curve. The reference documentation is here, you should read it later.
The first (non-comment) line here is:
which tells Postgres to accept “local” (unix domain) connections to “all” databases, authenticating as user “postgres” using “peer” authentication. This is why connecting as the system user “postgres” works out of the box.
The order of lines in this file are important, the first matching line wins. Let’s see another line:
This line allows “all” users to login using TCP/IP (“host”) from the localhost (“127.0.0.1/32”) to “all” databases, if they succeed in password authentication using the “md5” method.
There are more password authentication methods (md5, scram-sha-256, gss, ldap, …) than we can cover, so let’s just get back to simpler examples.
But first, we need to make sure PostgreSQL is accepting TCP/IP connections also. For that, we need to edit the main configuration file.
The file /etc/postgresql/11/main/postgresql.conf is the main configuration file for your PostgreSQL cluster. This file contains a lot of settings, and understanding what all of those mean is no easy task at all. For now, let’s see the very first setting:
This line is commented by default, let’s uncomment it to make it read:
This will let PostgreSQL listen for incoming TCP/IP connections on localhost, port 5432 (the default). Save the changes (you’ll have to be “root” to do this), and restart the Postgres server for the changes to take effect:
(Note that some for most settings changes you just have to “reload”, not “restart”, but this requires a “restart”).
Now we can see Postgres listening on port 5432, bound to 127.0.0.1:
Now let’s setup a new user and database for use by an app.
Let’s connect as the superuser “postgres” to make the changes:
We have now created a database called
myapp and a user called
with the password
s3cr3t. The database is empty, and will be owned by the user
myapp_user, which means that by connecting as
myapp_user the client will be
able to do execute most all DDL/DML commands.
Let’s connect to the app database as this app user now:
It worked! You’re now connected to “myapp” (see the prompt), using SSL over a TCP/IP connection to 127.0.0.1. Note that we specified the database name also on the command-line for psql. For historical reasons, if this is omitted, the database name is also assumed to be the same as the system user name (“alice” here), which is not what we want. The PostgreSQL user name is also specified (“-U myapp_user”).
If you need to connect from other machines, you’ll need to edit
to add lines like so:
and reload PostgreSQL (“sudo systemctl reload postgresql”) for changes to take effect.
With this in place, you can now use database connection strings like these in your apps:
This should get you setup with a dedicated database and user for your app. Your app development framework (like Django, Drupal etc.) should be capable of creating objects (like tables, views) and managing the data in this database.
pgDash is a modern, in-depth monitoring solution designed specifically for PostgreSQL deployments. pgDash shows you information and metrics about every aspect of your PostgreSQL database server, collected using the open-source tool pgmetrics. pgDash provides core reporting and visualization functionality, including collecting and displaying PostgreSQL information and providing time-series graphs, detailed reports, alerting, teams and more.
The Diagnostics feature in pgDash examines your PostgreSQL server and databases scanning for potential issues that can impact the health and performance of the deployment. No additional setup is required for Diagnostics - you just need to be actively sending in data to pgDash. Learn more here or signup today for a free trial.