WAL - Aesa's Notes

# Technical Reference: Write-Ahead Logging (WAL) **Write-Ahead Logging (WAL)** is the standard method for ensuring data integrity and durability in PostgreSQL. It ensures that changes to data files (tables and indexes) are logged before the changes are actually applied to the data files themselves. For the narrative explanation of this concept, see [[Manuscript/05 - Durability & Transactions/5.1 - WAL & fsync (The Pocket Diary)|5.1 Write-Ahead Logging (The Pocket Diary)]]. ## Core Principles 1. **Safety First**: Ensure that a transaction's operations are permanently recorded on persistent storage before it is confirmed as successful (Atomic Commits). 2. **Sequential I/O**: Writing to the WAL is a sequential operation, which is significantly faster than the random I/O required to update various pages across large data files. 3. **Atomic Commits**: A transaction is considered "committed" only after its WAL records have been successfully flushed to the data files (via `fsync`). ## The Anatomy of the Log ### 1. The LSN (Log Sequence Number) The **LSN (Log Sequence Number)** is the absolute coordinate system of the Postgres universe. It is a 64-bit integer (`uint64`) that represents a specific byte-offset within the infinite stream of WAL data generated since the database cluster was initialized. - Because Postgres only ever appends to the WAL, the LSN is permanently increasing. It never goes backward. - Every single change to the database is stamped with an LSN. - Data pages store the LSN of the last WAL record that modified them in their header (`pd_lsn`). This allows the system to compare the physical Page's LSN against the WAL's LSN during crash recovery to determine if a replay is necessary, guaranteeing **Idempotent Recovery**. ### 2. The XLogRecord Every individual change (a "scribble") is written as an **`XLogRecord`** C-struct. This is the fundamental unit of the WAL. - It contains the Resource Manager ID (RmgrId), which tells Postgres which part of the database system (Heap, B-Tree, etc.) generated the record. - It contains the length, the transaction ID, and the actual change payload. ### 3. Log Segments The WAL is not a single file, but a series of **Log Segments** (typically 16MB each) stored in the `pg_wal` directory. - Segments are named with 24-character hexadecimal IDs. - Postgres recycles these segments after a checkpoint ensure that the `pg_wal` directory doesn't grow indefinitely. ## Maintenance and Operations - **Checkpoint**: The process of flushing all "dirty" data pages from memory to the database files and recording a safe point in the WAL. This allows old WAL segments to be recycled. - **Full Page Writes (FPW)**: To prevent data corruption from **partial page writes** (torn pages), the first change to a page after a checkpoint requires the entire 8KB page to be logged in the WAL. - **The Risk**: Filesystems (like ext4 or XFS) typically write in 4KB blocks. If the power fails between the first and second 4KB write of an 8KB Postgres page, the data file is left with a corrupted, "half-old, half-new" page. - **The Solution**: During recovery, if a page is torn, Postgres overwrites it with the full-page image from the WAL before replaying the remaining scribbles. ## Monitoring - **`pg_current_wal_lsn()`**: Get the current write location in the WAL. - **`pg_walfile_name(lsn)`**: Convert an LSN into the corresponding filename on disk. - **`pg_stat_wal`**: View statistics about WAL generation and writes.