Index - Aesa's Notes

# Learn You a Postgres for Great Good! <img src="assets/lyapfgg.png" width="250" style="float: left; margin: 0 20px 20px 0;" /> What actually happens when you run `SELECT`? Where does the row live on disk? Why does `UPDATE` not update in place? What is Postgres doing when your query is "slow" and `top` shows nothing? You already know how to write SQL. You can `JOIN` three tables, add an index when something is slow, and copy a connection string from a README. That is enough to build things — but it is not enough to understand why they break. The answers live in the architecture of Postgres itself — and once you see them, every tuning decision, every `EXPLAIN` plan, and every mysterious production incident suddenly makes sense. **Learn You a Postgres for Great Good!** is a narrative-driven tour of PostgreSQL internals, inspired by *Learn You a Haskell for Great Good!* by Miran Lipovača. It uses a single, unified data model — the **Elephant Cafe** — to ground every concept in working SQL you can run yourself. ## Who This Book Is For This is not a book about learning SQL from scratch. We will not spend much time explaining what a `SELECT` is, why joins exist, or how to spell `GROUP BY`. We assume you can already talk to Postgres well enough to get an answer back. This book is about what happens after the query leaves your hands. Where does the row live? Why did the planner ignore your index? Why did an `UPDATE` make the table bigger? Why is CPU idle while users are waiting? Why does `COMMIT` survive a crash? Why does "permission denied" sometimes mean three different things? You do not need to know C, kernel internals, or database theory. But you do need curiosity about the machine. If you can already use Postgres and want to understand how it stores, finds, plans, locks, updates, recovers, and protects data, you are in the right place. ## The Lazy Elephant To understand the architecture, you need to accept a single organizing principle: **Postgres is profoundly, deliberately lazy.** Not lazy in the pejorative sense. Lazy in the way a physicist is lazy — every design decision in Postgres is an optimization to **avoid unnecessary work**. Once you internalize what the system considers "expensive," the entire architecture clicks into place: - **Deleting a row?** Postgres doesn't actually remove it from disk. It crosses out the old version ([[Manuscript/02 - Physical Storage & MVCC/2.5 - MVCC (The Sharpie Ledger)|MVCC]]) and leaves the cleanup to a background process. Removing data in place would require locking readers out — too expensive. - **Finding a specific row?** Postgres typically refuses to read the whole table if it can avoid it. It maintains pre-sorted shortcut structures ([[Manuscript/03 - Access Paths & Indexing/3.0 - Indexes (The Mighty Indexes)|Indexes]]) so it can jump directly to the answer. Scanning everything is the strategy of last resort. - **Ensuring crash safety?** Rather than writing data to its final location immediately, Postgres first records a sequential log of what it *intends* to do. This allows the engine to be fast while remaining indestructible—sorting the data into its proper home later, on its own schedule. - **Answering a complex query?** Postgres often doesn't just execute your SQL in the order you wrote it. It evaluates dozens of possible execution strategies and selects the cheapest one. The pattern is remarkably consistent: defer work, batch work, or avoid work entirely. By understanding what Postgres considers "expensive" (random disk reads, lock contention, memory pressure), these architectural quirks become rational, predictable consequences. ## What You Will Learn Every concept is illustrated with SQL you can run against the book's sample database (`docker compose up -d` and you're connected), and every `EXPLAIN` output in the book was generated against real data. The book follows the data from the bottom of the stack to the top: 1. **How data is physically stored** — bytes, tuples, 8KB pages, and why Postgres has an opinion about all of them (Chapters 1–2) 2. **How data is found efficiently** — B-Trees, GIN, BRIN, vector indexes, and the write-amplification trade-off (Chapter 3) 3. **How queries are planned and executed** — the optimizer, scan types, join strategies, and how to read `EXPLAIN` output (Chapter 4) 4. **How data survives crashes** — the Write-Ahead Log, transactions, isolation levels, and the mechanics of XID wraparound (Chapter 5) 5. **How Postgres manages resources** — shared buffers, work memory, autovacuum, and why bloat happens (Chapter 6) 6. **Why queries get slow under concurrency** — wait events, locks, latches, and the "sweat vs. wait" diagnostic framework (Chapter 7) 7. **How Postgres scales beyond one server** — replication, partitioning, cloud storage, and connection pooling (Chapter 8) 8. **How access control actually works** — roles, privileges, row-level security, and debugging "Permission Denied" (Chapter 9) ## Table of Contents ### [[Manuscript/01 - Foundations & Data Modeling/1.0 - Relations & Normalization (The Cafe Layout)|Chapter 1 - Foundations & Data Modeling]] > **The Purpose**: To establish the "Elephant Cafe" world, define the sample data model, and introduce SQL as the primary language of the engine. > **The Takeaway**: Data is a physical asset that must be guarded. Normalization is not about academic purity; it is about ensuring that the engine never has to look in two places for the same truth. - [[Manuscript/01 - Foundations & Data Modeling/1.0 - Relations & Normalization (The Cafe Layout)|1.0 Relations & Normalization (The Cafe Layout)]] - [[Manuscript/01 - Foundations & Data Modeling/1.1 - SQL (The Waiter's Pad)|1.1 SQL (The Waiter's Pad)]] - [[Manuscript/01 - Foundations & Data Modeling/1.2 - Relational Model (An academic detour)|1.2 Relational Model (An academic detour)]] - [[Manuscript/01 - Foundations & Data Modeling/1.3 - Summary (Foundations & Data Modeling)|1.3 Summary (Foundations & Data Modeling)]] ### [[Manuscript/02 - Physical Storage & MVCC/2.0 - Storage Foundations (The Building Blocks of Storage)|Chapter 2 - Physical Storage & MVCC]] > **The Purpose**: To ground the reader in the physical reality of storage, moving from bit-level data types up to the 8KB containers that hold our world. > **The Takeaway**: A "Row" is a logical abstraction; a "Tuple" is the physical suitcase. Postgres lives in 8KB blocks and uses MVCC to ensure that readers and writers never have to wait for each other. - [[Manuscript/02 - Physical Storage & MVCC/2.1 - Data Types (Knicks, knacks, bits, and bobs)|2.1 Data Types (Knicks, knacks, bits, and bobs)]] - [[Manuscript/02 - Physical Storage & MVCC/2.1.1 - System Catalogs (The Engine Queries Itself)|2.1.1 System Catalogs (The Engine Queries Itself)]] - [[Manuscript/02 - Physical Storage & MVCC/2.2 - Tuple (The Suitcase)|2.2 Tuple (The Suitcase)]] - [[Manuscript/02 - Physical Storage & MVCC/2.2.1 - Visibility & System Columns|2.2.1 Visibility & System Columns]] - [[Manuscript/02 - Physical Storage & MVCC/2.2.2 - Storage Optimizations|2.2.2 Storage Optimizations]] - [[Manuscript/02 - Physical Storage & MVCC/2.3 - The Page (The Shipping Container)|2.3 The Page (The Shipping Container)]] - [[Manuscript/02 - Physical Storage & MVCC/2.4 - Relation (The Table)|2.4 Relation (The Table)]] - [[Manuscript/02 - Physical Storage & MVCC/2.5 - MVCC (The Sharpie Ledger)|2.5 MVCC (The Sharpie Ledger)]] - [[Manuscript/02 - Physical Storage & MVCC/2.6 - TOAST (The Separate Trailer)|2.6 TOAST (The Separate Trailer)]] - [[Structures/_Structures|Technical Reference: Structures]] - [[Architecture/MVCC|Technical Reference: Architecture]] - [[Manuscript/02 - Physical Storage & MVCC/2.7 - Summary (Physical Storage & MVCC)|2.7 Summary (Physical Storage & MVCC)]] ### [[Manuscript/03 - Access Paths & Indexing/3.0 - Indexes (The Mighty Indexes)|Chapter 3 - Access Paths & Indexing]] > **The Purpose**: To move from "Scanning Everything" to "Following Shortcuts," providing a comprehensive guide to indexing strategies and their hidden costs. > **The Takeaway**: Performance is the art of avoiding the disk. An index is a specialized cheat sheet that trades expensive write-time labor for lightning-fast read-time results. - [[Manuscript/03 - Access Paths & Indexing/3.1 - B-Tree (The Balanced Bookshelf)|3.1 B-Tree (The Balanced Bookshelf)]] - [[Manuscript/03 - Access Paths & Indexing/3.2 - GIN & GiST (The Word Scavenger)|3.2 GIN & GiST (The Word Scavenger)]] - [[Manuscript/03 - Access Paths & Indexing/3.3 - BRIN (The Industrial Label)|3.3 BRIN (The Industrial Label)]] - [[Manuscript/03 - Access Paths & Indexing/3.4 - HNSW & IVFFlat (The Similarity Map)|3.4 HNSW & IVFFlat (The Similarity Map) — Vector Search]] - [[Manuscript/03 - Access Paths & Indexing/3.5 - Constraints & Triggers (The Integrity Layer and the Chain Reaction)|3.5 Constraints & Triggers (The Integrity Layer and the Chain Reaction)]] - [[Manuscript/03 - Access Paths & Indexing/3.6 - Index Maintenance (The Cost of Fame)|3.6 Index Maintenance (The Cost of Fame)]] - [[Structures/Index|Technical Reference: Index Architecture]] - [[Manuscript/03 - Access Paths & Indexing/3.7 - Summary (Access Paths & Indexing)|3.7 Summary (Access Paths & Indexing)]] ### [[Manuscript/04 - Query Planning & Execution/4.0 - Query Planning & Operations (The Strategy of Execution)|Chapter 4 - Query Planning & Execution]] > **The Purpose**: To explain how a declarative riddle is transformed into an execution plan, highlighting the calculus of the Query Planner. > **The Takeaway**: The Planner is a cost-based optimizer. It selects the most efficient execution strategy based on a rigorous model of resource costs, often rearranging operations to avoid unnecessary work. - [[Manuscript/04 - Query Planning & Execution/4.1 - Query Planner (The Blueprint of Execution)|4.1 Query Planner]] - [[Manuscript/04 - Query Planning & Execution/4.2 - Query Algebra (The Execution Tree)|4.2 Query Algebra]] - [[Manuscript/04 - Query Planning & Execution/4.3 - Scans (The Full Table Walk)|4.3 Scans]] - [[Manuscript/04 - Query Planning & Execution/4.4 - Joins (The Pairing Dance)|4.4 Joins]] - [[Manuscript/04 - Query Planning & Execution/4.5 - Aggregations (The Running Receipt)|4.5 Aggregations]] - [[Manuscript/04 - Query Planning & Execution/4.6 - Memory Operations (Sort Hash and Spill)|4.6 Memory Operations]] - [[Manuscript/04 - Query Planning & Execution/4.7 - Mutation Path (The Write Pipeline)|4.7 Mutation Path]] - [[Manuscript/04 - Query Planning & Execution/4.8 - Parallel & Distributed (The Worker Pool)|4.8 Parallel & Distributed]] - [[Manuscript/04 - Query Planning & Execution/4.9 - Common Table Expressions (The Temporary Station)|4.9 Common Table Expressions]] - [[Manuscript/04 - Query Planning & Execution/4.10 - Sargability (The Art of Not Opening Every Box)|4.10 Sargability]] - [[Manuscript/04 - Query Planning & Execution/4.11 - Summary (Query Planning & Execution)|4.11 Summary (Query Planning & Execution)]] ### [[Manuscript/05 - Durability & Transactions/5.0 - Write-Ahead Log (Safety Without Sweating)|Chapter 5 - Durability & Transactions]] > **The Purpose**: To explain how Postgres survives sudden death — power failures and kernel panics — without losing a byte of confirmed work. > **The Takeaway**: Safety in Postgres is dual-layered: the Write-Ahead Log gives you physical durability, and Transactions give you logical all-or-nothing semantics. Together they let you trust the engine with the kind of data you can't afford to apologize for. - [[Manuscript/05 - Durability & Transactions/5.1 - WAL & fsync (The Durable Ledger)|5.1 WAL & fsync (The Durable Ledger)]] - [[Manuscript/05 - Durability & Transactions/5.2 - Crash Recovery (The Recovery Parade)|5.2 Crash Recovery (The Recovery Parade)]] - [[Manuscript/05 - Durability & Transactions/5.3 - Logical Replication (The Relay Chain)|5.3 Logical Replication (The Relay Chain)]] - [[Manuscript/05 - Durability & Transactions/5.3.1 - Replication Slots (The Reserved Parking Space)|5.3.1 Replication Slots (The Reserved Parking Space)]] - [[Manuscript/05 - Durability & Transactions/5.4 - Transactions (The Atomic Seal)|5.4 Transactions (The Atomic Seal)]] - [[Manuscript/05 - Durability & Transactions/5.5 - Isolation (The Looking Glass Windows)|5.5 Isolation (The Looking Glass Windows)]] - [[Manuscript/05 - Durability & Transactions/5.6 - Transaction ID Wraparound (The Infinite Calendar)|5.6 Transaction ID Wraparound (The Infinite Calendar)]] - [[Architecture/WAL|Technical Reference: WAL Anatomy & LSNs]] - [[Architecture/Transactions|Technical Reference: Transactions]] - [[Manuscript/05 - Durability & Transactions/5.7 - Summary (Durability & Transactions)|5.7 Summary (Durability & Transactions)]] ### [[Manuscript/06 - Resource Management & Processes/6.0 - Memory & Disk (The Hierarchy of Inertia)|Chapter 6 - Resource Management & Processes]] > **The Purpose**: To explain how Postgres consumes physical resources (RAM, CPU, Disk) and how to decide which process gets fed first when contention is inevitable. > **The Takeaway**: Postgres can't make hardware faster — it can only decide what gets to live in which layer of memory. Tuning is the art of choosing what's worth keeping warm and what can afford to wait on the disk. - [[Manuscript/06 - Resource Management & Processes/6.1 - Process Family (The Process Family)|6.1 Process Family (The Process Family)]] - [[Manuscript/06 - Resource Management & Processes/6.1.1 - Connection Mechanics (Long running connections and sessions)|6.1.1 Connection Mechanics]] - [[Manuscript/06 - Resource Management & Processes/6.2 - Shared Buffers (The Page Cache)|6.2 Shared Buffers (The Page Cache)]] - [[Manuscript/06 - Resource Management & Processes/6.3 - Work Mem (Private Working Memory)|6.3 Work Mem (Private Working Memory)]] - [[Manuscript/06 - Resource Management & Processes/6.4 - Vacuum & Freezing (The Housekeepers)|6.4 Vacuum & Freezing (The Housekeepers)]] - [[Manuscript/06 - Resource Management & Processes/6.5 - Tuple Bloat (Garbage Collection)|6.5 Tuple Bloat (Garbage Collection)]] - [[Manuscript/06 - Resource Management & Processes/6.6 - RAM, CPU & Disk (The Physical Machine)|6.6 RAM, CPU & Disk (The Physical Machine)]] - [[Resources/_Resources|Technical Reference: Resources Overview]] - [[Manuscript/06 - Resource Management & Processes/6.7 - Summary (Resource Management & Processes)|6.7 Summary (Resource Management & Processes)]] ### [[Manuscript/07 - Wait Events & Concurrency/7.0 - Why Slow Queries Lie (The Waiting Game)|Chapter 7 - Wait Events & Concurrency]] > **The Purpose**: To demystify the "Slow Query" by moving from vague timing to precise instrumentation, explaining how to use wait events to differentiate between active execution and physical stalls. > **The Takeaway**: A query is never just "slow"—it is either working or waiting. By reading wait events, you can identify exactly which resource (CPU, Disk, or Lock) is the bottleneck, turning mysterious production incidents into solvable engineering problems. - **Part 1: The Diagnostic Model** - [[Manuscript/07 - Wait Events & Concurrency/7.0 - Why Slow Queries Lie (The Waiting Game)|7.0 Why Slow Queries Lie (The Waiting Game)]] - [[Manuscript/07 - Wait Events & Concurrency/7.1 - The Diagnostic Views (Sweat vs. Sigh)|7.1 The Diagnostic Views (Sweat vs. Sigh)]] - [[Manuscript/07 - Wait Events & Concurrency/7.2 - The Investigative Workflow|7.2 The Investigative Workflow]] - **Part 2: The Three Failure Classes** - [[Manuscript/07 - Wait Events & Concurrency/7.3 - CPU Saturation (Active Execution)|7.3 CPU Saturation (Active Execution)]] - [[Manuscript/07 - Wait Events & Concurrency/7.4 - Storage Latency (Physical IO Stalls)|7.4 Storage Latency (Physical IO Stalls)]] - [[Manuscript/07 - Wait Events & Concurrency/7.5 - WAL Pressure (Durability Bottlenecks)|7.5 WAL Pressure (Durability Bottlenecks)]] - [[Manuscript/07 - Wait Events & Concurrency/7.6 - Lock Contention (Concurrency & Blocking)|7.6 Lock Contention (Concurrency & Blocking)]] - **Part 3: Advanced Coordination** - [[Manuscript/07 - Wait Events & Concurrency/7.7 - Cluster Pressure (Background Workers & Replication)|7.7 Cluster Pressure (Background Workers & Replication)]] - [[Manuscript/07 - Wait Events & Concurrency/7.8 - Internal Coordination (Parallelism & Shared Memory)|7.8 Internal Coordination (Parallelism & Shared Memory)]] - **Part 4: Human-Caused Stalls** - [[Manuscript/07 - Wait Events & Concurrency/7.9 - Human-Caused Stalls (Administrative Blocking)|7.9 Human-Caused Stalls (Administrative Blocking)]] - [[Manuscript/07 - Wait Events & Concurrency/7.10 - High-Precision Lab (pg_wait_tracer)|7.10 High-Precision Lab (pg_wait_tracer)]] - [[Workloads/_Workloads|Technical Reference: Workloads Overview]] - [[Manuscript/07 - Wait Events & Concurrency/7.11 - Summary (Wait Events & Concurrency)|7.11 Summary (Wait Events & Concurrency)]] ### [[Manuscript/08 - Scaling & Architectural Coordination/8.0 - The Three Coordination Problems|Chapter 8 - The Scaling Playbook]] > **The Purpose**: To explain how Postgres scales beyond a single server by reducing coordination across storage, writes, and connections. > **The Takeaway**: Scaling is the deliberate reduction of coordination. Storage scaling (Neon) separation inverts the role of pages and logs, write scaling (Partitioning) isolates physical workloads, and connection scaling (Pooling) multiplexes concurrent backends. - [[Manuscript/08 - Scaling & Architectural Coordination/8.0 - The Three Coordination Problems|8.0 The Three Coordination Problems]] - [[Manuscript/08 - Scaling & Architectural Coordination/8.1 - Scaling Storage (Compute-Storage Separation)|8.1 Scaling Storage (Compute-Storage Separation)]] - [[Manuscript/08 - Scaling & Architectural Coordination/8.2 - Scaling Writes (Declarative Partitioning)|8.2 Scaling Writes (Declarative Partitioning)]] - [[Manuscript/08 - Scaling & Architectural Coordination/8.3 - Scaling Reads (Streaming Replication)|8.3 Scaling Reads (Streaming Replication)]] - [[Manuscript/08 - Scaling & Architectural Coordination/8.4 - Scaling Connections (Connection Pooling)|8.4 Scaling Connections (Connection Pooling)]] - [[Manuscript/08 - Scaling & Architectural Coordination/8.5 - Scaling Failure (Consensus & Failover)|8.5 Scaling Failure (Consensus & Failover)]] - [[Architecture/Cluster|Technical Reference: Cluster Architecture]] - [[Manuscript/08 - Scaling & Architectural Coordination/8.6 - Summary (Scaling & Architectural Coordination)|8.6 Summary (Scaling & Architectural Coordination)]] ### [[Manuscript/09 - Identity & Access Control/9.0 - Access Control (The Bouncers and the VIP List)|Chapter 9 - Access Control]] > **The Purpose**: To demystify the PostgreSQL security model (Roles, Privileges, ACLs, Row-Level Security) and map every "Permission Denied" message back to a concrete row in a system catalog. > **The Takeaway**: Security in Postgres is not a separate product bolted on top — it is just rows in `pg_authid`, `pg_class.relacl`, and `pg_policy`. Once you know which catalog to read, the engine stops being mysterious and starts being auditable. - [[Manuscript/09 - Identity & Access Control/9.1 - Roles & Privileges (The Name Tags)|9.1 Roles & Privileges (The Name Tags)]] - [[Manuscript/09 - Identity & Access Control/9.2 - Default Privileges (The Manager's Orders)|9.2 Default Privileges (The Manager's Orders)]] - [[Manuscript/09 - Identity & Access Control/9.3 - Row-Level Security (The VIP List)|9.3 Row-Level Security (The VIP List)]] - [[Manuscript/09 - Identity & Access Control/9.4 - Security Definers (The Manager's Override)|9.4 Security Definers (The Manager's Override)]] - [[Manuscript/09 - Identity & Access Control/9.5 - DDL Audit Logging (The Paper Trail)|9.5 DDL Audit Logging (The Paper Trail)]] - [[Manuscript/09 - Identity & Access Control/9.6 - Summary (Identity & Access Control)|9.6 Summary (Identity & Access Control)]] ### [[Manuscript/10 - Summary & Epilogue/Epilogue|Chapter 10 - Summary & Epilogue]] <div style="page-break-after: always;"></div> %% --- [[Manuscript/01 - Foundations & Data Modeling/1.0 - Relations & Normalization (The Cafe Layout)|Start Reading: Chapter 1 — Relations & Normalization (The Cafe Layout) →]] %%