# What's actually inside your Postgres data directory

> PGDATA looks like a folder full of cryptic short names. It is in fact the entire database. A tour, directory by directory, and the reasons a backup company ended up reading every one of them.

URL: https://bkpdb.com/blog/inside-pgdata/
Date: 2026-05-12
Author: Rahul
Section: Engineering

---


When you install Postgres, run `initdb`, and start it up, all of your data lives inside one directory. The Postgres documentation calls it "the data directory." The environment variable points at it: `PGDATA`. On Debian it's at `/var/lib/postgresql/16/main`. On macOS via Homebrew it's `/opt/homebrew/var/postgresql@16`. Inside a Docker image it's wherever the volume is mounted. Wherever you find it, the contents look about the same: a couple of dozen short, cryptic directory names, a handful of config files, and a little file called `PG_VERSION` that's there to keep you honest.

For most application developers, this folder is a black box. You connect on port 5432, you write SQL, you trust that the bytes are safe somewhere. That's how it should be most of the time.

But when you start running backup software, the abstraction leaks. The first time it leaks for you, it usually hurts. A restore "succeeds" but the application can't connect because a role is missing. An index is reachable but rows come back in a different order, and a query plan flips. A tablespace symlink points at a directory that doesn't exist. The dump finished; the restore finished; the database is wrong.

We hit all of these in the first six months of building bkpdb. We got out of them by reading the data directory file by file until each name made sense. This is a tour through what we found, what each folder is for, and the lesson each one taught us.

## A first look

Here's what `ls $PGDATA` looks like on a fresh Postgres 16 install:

```text
$PGDATA/
├── PG_VERSION
├── base/
├── current_logfiles
├── global/
├── pg_commit_ts/
├── pg_dynshmem/
├── pg_hba.conf
├── pg_ident.conf
├── pg_logical/
├── pg_multixact/
├── pg_notify/
├── pg_replslot/
├── pg_serial/
├── pg_snapshots/
├── pg_stat/
├── pg_stat_tmp/
├── pg_subtrans/
├── pg_tblspc/
├── pg_twophase/
├── pg_wal/
├── pg_xact/
├── postgresql.auto.conf
├── postgresql.conf
├── postmaster.opts
└── postmaster.pid
```

Twenty four entries. Some are huge (`base/` and `pg_wal/`); some are routinely empty (`pg_snapshots/`, `pg_replslot/` on a database with no replicas). All of them matter, in the sense that if you delete the wrong one, your database is broken.

Postgres calls this whole thing a "cluster." Not in the distributed-systems sense. In Postgres-speak, a cluster is one running server with one set of system tables, one set of users, and as many databases as you've created inside it. One PGDATA is one cluster.

Now let's open the boxes.

## `base/`: where your tables actually live

If your Postgres holds 200 GB of data, somewhere around 198 GB of it is in `base/`. This is the directory that holds your tables, your indexes, your TOAST chunks, and almost everything you'd think of as "the data."

Inside `base/` is one subdirectory per database, named not by your database's name but by its OID:

```text
base/
├── 1/        (template1)
├── 4/        (template0)
├── 5/        (postgres)
├── 16384/    (your databases)
└── 16386/
```

The big numbers are your actual databases. To find out which is which:

```sql
SELECT oid, datname FROM pg_database;
```

Inside each database directory is another set of numbered files. Each is a "relation": a table, an index, a sequence, or a materialized view. The number is again an OID, this time of the relation. `pg_class` maps OIDs to names. Each relation is split across files of up to 1 GB:

```text
base/16384/
├── 16432       (heap, segment 0)
├── 16432.1     (heap, segment 1)
├── 16432.2
├── 16432.3
├── 16432.4
├── 16432_fsm   (free space map)
└── 16432_vm    (visibility map)
```

The `.N` suffix is the segment number. Each file is a sequence of 8 KB pages. The `_fsm` is the free space map; the `_vm` is the visibility map. Both are derived; both can be rebuilt; both should still be backed up if you care about not paying to rebuild them.

The lesson `base/` taught us was about how backups *don't* see this. `pg_dump` walks the catalogs and emits SQL. It does not copy a single byte of `base/`. The bytes get reconstructed on restore from `INSERT` statements (or, with `--format=custom`, from a Postgres-flavored binary format that's still logical, not physical). That's a feature most of the time. It also means your backup is at the mercy of `pg_dump` understanding everything in your schema, including custom types, including extensions, including PL/Python functions nobody has touched in two years.

In practice: dump and restore into a clean Postgres of the same major version once a quarter, before you have to. The dumps that worked last year may not work this year.

## `global/`: the things shared across all databases

`global/` is the smaller, weirder cousin of `base/`. It holds the catalog tables that aren't per-database: `pg_database` itself, `pg_authid` (the table behind `pg_roles`, where every user and role lives), `pg_tablespace`, and a few others. It also holds the cluster control file, `pg_control`, which we'll come back to.

The directory is small, often a few megabytes. The information in it is critical. Roles live here. Delete `global/` and your databases still exist on disk, but Postgres has no idea who's allowed to log into them.

`pg_dump` doesn't dump `global/`. There's a separate tool, `pg_dumpall`, that does. So if you depend on `pg_dump` for backups and you restore into a fresh cluster, you'll discover that none of your application's roles exist. The first time you discover this is when the application can't log in.

This was the second-most common failure mode in our early customer restores. The fix is to dump roles separately:

```bash
pg_dumpall --roles-only > roles.sql
pg_dump --format=custom mydb > mydb.dump
```

Then on restore:

```bash
psql -f roles.sql
pg_restore -d mydb mydb.dump
```

We do both as part of every backup now. If you're rolling your own backup script, do the same. Roles are not optional infrastructure.

## `pg_wal/`: the part that scares people

The Write-Ahead Log is, depending on your role, the most beautiful or the most terrifying thing about Postgres. Beautiful, because it's the mechanism that makes `COMMIT` mean what it says. Terrifying, because if you fill the disk with WAL files, your database stops accepting writes, and if you delete the wrong WAL files, you can't recover.

`pg_wal/` (called `pg_xlog/` before Postgres 10) holds 16 MB segment files with names like `000000010000000000000037`. The 24-character name encodes the timeline, the log file number, and the segment number. Postgres flushes them to disk on every commit, modulo `synchronous_commit` settings. Once a checkpoint has flushed all the dirty pages in shared buffers down to `base/`, older WAL files can be recycled or removed.

Two settings control how much WAL accumulates: `max_wal_size` (a soft cap that triggers a checkpoint) and `wal_keep_size` (a hard floor that retains WAL even past a checkpoint, for replicas). If you have a replica that's fallen behind, or a replication slot that no consumer is reading, WAL grows without bound and fills your disk. This is the single most common way that healthy-looking Postgres servers go down at 3 a.m.

For backups, `pg_wal/` is what makes point-in-time recovery possible. A base backup plus the stream of WAL files from that point lets you restore to any moment. Without WAL, you have only the base backup, which is the database state as of when the base backup ended.

The agent we ship today doesn't touch WAL. It runs `pg_dump`, which is logical and timestamped. Point-in-time recovery is on the roadmap for v5.

One thing worth knowing: `pg_wal/archive_status/` is bookkeeping for `archive_mode`. If files pile up there with `.ready` suffixes and never become `.done`, your archive command is failing silently. Set up an alert.

## `pg_xact/` and `pg_multixact/`: the commit log

Every transaction in Postgres has a transaction ID (XID), and every XID has a state: in progress, committed, aborted, or sub-committed. That state lives in `pg_xact/`, which used to be called `pg_clog/` (commit log) before Postgres 10. The directory holds tiny files (256 KB each), each containing the state of about a million transactions.

When you read a row, Postgres needs to know whether the transaction that wrote it committed. The visibility check goes to `pg_xact/`. Without it, the database has no idea what's real and what's a half-finished write.

`pg_multixact/` holds the same kind of data for transactions sharing locks on a row. Most workloads don't generate much of it; high-contention `SELECT FOR SHARE` patterns can.

Both directories are vacuumed and truncated as old transactions age out. Long-running transactions hold the cleanup back, and `pg_xact/` (along with a number of other things) grows. That's part of why a forgotten `BEGIN` in a psql session can wreck a database over a weekend.

The lesson, even though we don't take physical backups today: if you copy `base/` while the database is running, you have to copy `pg_xact/` from a coordinated point in time too. That's part of what `pg_basebackup` handles. A naive `cp -r` of PGDATA on a live server will eventually restore a state where `base/` references XIDs that `pg_xact/` doesn't know about, and the database will refuse to start.

## `pg_tblspc/`: symlinks, and the trouble they cause

Tablespaces let you put specific tables on specific filesystems. Fast NVMe for hot data, cheaper rotating disk for cold, two tablespaces, tables assigned to whichever you want.

`pg_tblspc/` doesn't actually hold any data. It holds *symlinks*, one per tablespace, named by OID:

```text
pg_tblspc/
├── 16500 -> /mnt/fast/postgres
└── 16501 -> /mnt/slow/postgres
```

This is a beautiful feature when it works and a nightmare when it doesn't. `pg_dump` knows about tablespaces; on restore, the target server has to have the same tablespace paths available, or you remap with `--tablespaces-to-restore`, or use `--no-tablespaces` and accept that everything goes into the default.

The first customer who showed us a database with a tablespace was the customer who taught us this. Their dump restored into our verifier, the verifier didn't have `/mnt/fast/postgres`, and `pg_restore` failed loudly. We added tablespace remapping the same week.

In practice: if you use tablespaces, document what they're for. If you don't, don't start. The default tablespace is fine for almost every workload, and the operational complexity rarely pays for itself.

## `pg_logical/` and `pg_replslot/`: the replication machinery

Logical replication uses the WAL stream, decodes it into row changes, and ships them to a subscriber. The state lives in two places. `pg_logical/` holds metadata about logical decoding: snapshot files, mapping files, origin progress. `pg_replslot/` holds one subdirectory per replication slot.

If you've ever set up logical replication and then deleted the subscriber without dropping the slot first, you'll have learned what we learned: the slot pins WAL on the publisher, indefinitely, until you either drop the slot or the disk fills up. The disk filling up usually wins.

Physical backups capture slot state; dumps don't. After a logical restore, any logical replication you had set up has to be rebuilt on the restored side. We probably should record whether replication slots exist on a database we're backing up and warn the operator. Adding it to the agent is a four-line patch we haven't got to yet.

## The small directories

These exist for completeness and you can mostly ignore them in day-to-day operations. Knowing what they are saves a moment of panic next time you `ls $PGDATA`.

- **`pg_subtrans/`** is the subtransaction log, written when you use `SAVEPOINT` or PL/pgSQL exception blocks.
- **`pg_serial/`** holds state for serializable transaction snapshots. Empty unless you use `SERIALIZABLE` isolation.
- **`pg_notify/`** holds in-flight `LISTEN` / `NOTIFY` queues.
- **`pg_snapshots/`** holds exported snapshots used by `pg_dump --jobs` to coordinate parallel workers.
- **`pg_stat/`** and **`pg_stat_tmp/`** are the statistics collector's persistent and live copies. As of Postgres 15, stats live in shared memory and `pg_stat_tmp/` is mostly vestigial.
- **`pg_dynshmem/`** holds dynamic shared memory segments for parallel query workers. Transient.
- **`pg_commit_ts/`** is the commit timestamp log. Empty unless you've enabled `track_commit_timestamp`.
- **`pg_twophase/`** holds prepared transactions. Empty unless you use `PREPARE TRANSACTION`.

The pattern: every Postgres feature that needs persistent on-disk state gets its own directory. Most are empty most of the time. The ones that matter, you've already met.

## The configuration files

`postgresql.conf` is the main settings file. `pg_hba.conf` is the host-based authentication file (who can connect from where, and how). `pg_ident.conf` maps OS usernames to Postgres usernames for the few authentication methods that need it.

Two surprises about these files. First, `postgresql.auto.conf` exists and overrides `postgresql.conf`. It's where `ALTER SYSTEM SET` writes. If you've been editing `postgresql.conf` by hand and wondering why your changes don't take effect, check whether someone has been running `ALTER SYSTEM` on the same key.

Second, `pg_hba.conf` is read top to bottom and the first matching line wins. A strict `reject` rule at the bottom does nothing if a permissive line above matches first. We've watched a customer lock themselves out by getting this exactly wrong.

Logical backups don't dump these files. They are settings, not data. If you've spent hours tuning `postgresql.conf` and you want those settings to survive a restore into a fresh host, copy the file separately.

In practice: keep `postgresql.conf` and `pg_hba.conf` in version control alongside your application code. Treat them as code, because they are.

## `pg_control` and the cluster's identity

Inside `global/` is a small file called `pg_control`. It holds the cluster's identity and a bunch of state Postgres needs at startup: the system identifier, the catalog version, the WAL level, the latest checkpoint location. If `pg_control` is missing or corrupted, the cluster will not start.

`pg_controldata` prints it in human-readable form. Run it once on a live Postgres and read the output:

```text
$ pg_controldata $PGDATA
pg_control version number:            1300
Catalog version number:               202307071
Database system identifier:           7387012345678901234
Database cluster state:               in production
Latest checkpoint location:           1/A234B0C8
...
Database block size:                  8192
Bytes per WAL segment:                16777216
Data page checksum version:           1
```

Two lines on that list are load-bearing for backups. The "Database system identifier" is unique to this cluster; if you back up cluster A and try to apply WAL from cluster B, the identifiers won't match and recovery refuses to proceed. The "Data page checksum version" tells you whether checksums were enabled at `initdb` time. If they were, every page on disk has a checksum that gets verified on read. This is one of the cheapest things you can do for your data integrity, and it cannot be turned on after the fact without a long, ugly migration. If you're spinning up a new cluster, run `initdb --data-checksums`. Future you will thank present you.

{{< pullquote attrib="the short version of the whole post" >}}Most backup tools assume the database tells the truth. The database mostly does. The data directory tells you when it isn't.{{< /pullquote >}}

## Why we read all of this, even though we ship logical backups

bkpdb ships logical backups today. We pipe `pg_dump --format=custom` through zstd and `age` encryption straight to object storage. We don't read most of `PGDATA` directly. So why have we spent so much time learning it?

Three reasons.

First, every restore failure we've debugged has come back to something that wasn't in the dump. Roles in `global/`. Locales in `pg_control`. Extensions whose underlying shared libraries weren't installed at the same version. Tablespaces pointing at directories the verifier didn't have. The dump captured the application's view of the database. The data directory was the source of truth about everything else.

Second, our customers ask us what to do when something on their host is misbehaving. Disk filling up: look at `pg_wal/` and `pg_replslot/`. Backups slow on a pre-15 install: check whether `pg_stat_tmp/` is on a slow disk. Worried about corruption: enable `data_checksums` next time. We can't help if we don't know.

Third, physical backups and point-in-time recovery are on the [roadmap](/databases/) for v5. The day we ship them, every assumption we make about the data directory will be load-bearing for restores that happen at the worst moments in our customers' lives. Better to over-prepare now.

## What you can do this evening

If you've never opened your own `PGDATA`, here's a short list. None of these write anything.

1. Find the directory: `psql -c "SHOW data_directory;"`.
2. Run `du -sh $PGDATA/*` (as the postgres user, or with sudo). Note which directories are large. `base/` and `pg_wal/` will dominate. Anything else unexpectedly large is worth investigating.
3. Run `pg_controldata $PGDATA`. Note the system identifier, the WAL level, and whether data page checksums are on.
4. Look at `pg_tblspc/`. If it's empty, you don't have to worry about tablespaces on restore. If it has symlinks, follow them and check the targets exist.
5. Run `ls -la $PGDATA/pg_replslot/`. If there are subdirectories, you have replication slots; check the consumers are still there.
6. Read `pg_hba.conf` top to bottom. Out loud, if you can.

That's an hour, and you'll come out of it knowing things about your own database that you didn't know this morning.

## Closing

The data directory isn't magic. It's a couple of dozen folders, well-named once you know what they're for, holding the state of a system that's been refined for thirty years. Backup software that doesn't look inside is backup software that gets surprised. Operators who don't look inside get surprised at 3 a.m. when something has filled up.

We built [bkpdb](/) because we wanted backups that don't get surprised. That meant reading the directory until we knew what every name was for. We've shared what we found here in case it saves you the same archaeology.

If you operate a Postgres in production and you want backups that are honest about what they are (verified, encrypted on the host, restored once a week into a clean cluster of the same major version), we're building it. There's a beta. Drop us a line at `hello@bkpdb.com` and we'll get you set up.

Stuck? Open an issue on our GitHub or send us mail. We answer.


