# 🐘 Learn You a Postgres for Great Good β€” Agent Context This file provides essential context for any AI agent working in this repository. > [!IMPORTANT] > **SQL Accuracy Policy**: All SQL examples in the book **must** be valid against the real `scripts/init.sql` schema. All `EXPLAIN` and `EXPLAIN ANALYZE` outputs **must** be run against the live Docker Compose database and pasted verbatim β€” no fabricated outputs. See the **SQL Accuracy** section below. --- ## What Is This Project? **"Learn You a Postgres for Great Good!"** is a narrative-driven technical book about PostgreSQL internals, inspired by *Learn You a Haskell for Great Good* by Miran Lipovača. The book is written in **Markdown** and published via **Obsidian**. It uses a custom sample data model (the **Elephant Cafe**) to provide allusory support for complex technical concepts. The tone is **dry technical precision with approachable wit**: calm, matter-of-fact, never mystical. Think less "fortune-cookie wisdom" and more *Hitchhiker's Guide to the Galaxy*. **PostgreSQL architecture is always the primary subject.** The Elephant Cafe and its animal characters are secondary allusions that provide physical intuition β€” they must never become the "main character" of a section. When in doubt, call Postgres "Postgres." ### Intended Audience The reader is a **developer or data engineer** who: - Has used PostgreSQL (or any SQL database) in production - Can write a `JOIN` and knows what an index is, roughly - Wants to understand *why* Postgres makes the decisions it does β€” not just *how* to configure it - Does **not** need to know C, operating systems, or database internals to start Write for someone who is **technically capable but architecturally curious**. They can handle a C-struct if you show them where it lives; they just haven't been shown yet. Never condescend. Never skip the "why." ## Repository Structure ``` LearnYouAPostgres/ β”œβ”€β”€ Chapter 1/ # Introduction & Data Model β”œβ”€β”€ Chapter 2/ # Storage: Tuples, Pages, MVCC, TOAST β”œβ”€β”€ Chapter 3/ # Indexes: B-Tree, GIN, BRIN, Vectors β”œβ”€β”€ Chapter 4/ # Query Planning & Execution (Scans, Joins, Aggregations) β”œβ”€β”€ Chapter 5/ # WAL, Crash Recovery, Transactions, Isolation β”œβ”€β”€ Chapter 6/ # Resources: Shared Buffers, Work Mem, Vacuum β”œβ”€β”€ Chapter 7/ # Performance: Wait Events, Locks, CPU vs I/O β”œβ”€β”€ Chapter 8/ # Scaling: Replication, Partitioning, Pooling β”œβ”€β”€ Chapter 9/ # Access Control: Roles, RLS, Security Definers β”œβ”€β”€ Architecture/ # Technical references: WAL, MVCC, Transactions, Cluster β”œβ”€β”€ Workloads/ # Per-wait-event reference cards (IO, IPC, Lock, LWLock, etc.) β”œβ”€β”€ Structures/ # Physical structure references: Tuple, Page, Index β”œβ”€β”€ Operations/ # Query operator references: Scans, Joins, Aggregations β”œβ”€β”€ Resources/ # System resource references: CPU, Disk, Memory, Network β”œβ”€β”€ FEEDBACK/ # Agent-generated audit reports and iteration notes β”‚ └── Narrative_Audit/ # Chapter-by-chapter tone and style audits β”œβ”€β”€ assets/ # All images (arch_*, wl_*, ex_*, str_*, bits_*, etc.) β”œβ”€β”€ scripts/ # init.sql and seed data for the Elephant Cafe data model β”œβ”€β”€ SKILL.md # Image generation skill for wait event metaphors β”œβ”€β”€ agents.md # This file └── Learn You a Postgres for Great Good.md # Master TOC ``` --- ## The Elephant Cafe Data Model The book's sample database is a fictional cafe that serves animals. All SQL examples use this schema. **Core Tables:** - `species` β€” Blueprint for animal categories (e.g., Capybara, Herbivore) - `animals` β€” Individual patrons; Foreign-keyed to `species` - `ingredients` β€” Raw pantry materials - `flavors` β€” Vector embeddings and scent profiles per ingredient - `suppliers` β€” Supply chain - `supply_deliveries` β€” The Infinite Ledger (high-volume, partition candidate) - `orders` + `order_items` β€” The heartbeat of the system; exercises MVCC and Locking - `dishes` + `dish_ingredients` β€” Menu items and their composition - `animal_favorites` β€” Many-to-Many between animals and ingredients The full schema lives in `scripts/init.sql` and is referenced as the **"Architectural Inventory."** --- ## Narrative Style Guidelines This is the most critical section. All writing β€” new chapters, expansions, and refactors β€” must conform to these standards. ### The Core Analogy Map (Load-Bearing Allusions) These are the only permitted allusions. They provide physical intuition for abstract concepts but must never replace technical definitions. Introduce each allusion **once** in its defining chapter, then use the technical term with an occasional parenthetical reminder. | Technical Concept (Primary) | Allusion (Secondary) | Defining Chapter | | :-------------------------- | :----------------------------- | :--------------- | | Database Engine | The Elephant | Introduction | | SQL | The Waiter's Pad | 1.1 | | Relational model (formal) | An academic detour | 1.2 | | Tuple | The Physical Suitcase | 2.2 | | Page (8KB) | The Shipping Container | 2.3 | | Table | The Table | 2.4 | | Index | The Cheat Sheet / Bookshelf | 3.0 | | MVCC | The Sharpie Ledger | 2.5 | | WAL | The Pocket Diary | 5.1 | | Autovacuum | The Housekeepers | 6.4 | | Query Planner | The Query Optimizer | 4.1 | | Shared Buffers | The Warming Rack | 6.2 | | TOAST | The Separate Trailer | 2.6 | | Connection Pooler | The Maitre D' | 8.4 | | LWLock Contention | The Turnstile | 7.0 | | Heavyweight Locks | The Exclusive Key | 7.7 | | Wait Events | The Stopwatch | 7.1 | | Roles / Privileges | Name Tags / Room Keys | 9.1 | | Row-Level Security | The VIP List / Bouncer | 9.3 | | Transaction | The Atomic Seal | 5.4 | | Isolation Levels | Looking Glass Windows | 5.5 | | XID Wraparound | The Infinite Calendar | 5.6 | | Read Replicas | The Many Shouting Elephants | 8.1 | | Partitioning | Table Partitioning | 8.3 | **Metaphor Lifecycle Rule β€” "Introduce Once, Then Drop"**: Each allusion is introduced **once** in its defining chapter (the cute filename serves as the first introduction). After that first introduction, the allusion is **never used again** β€” not even as a parenthetical reminder. Only the technical term exists from that point forward. - βœ… In Chapter 2.2: "A Tuple is the physical realization of a row β€” the **Suitcase** packed into the engine's Heap." - βœ… In Chapter 4.1: "The planner estimates the cost of fetching each tuple." *(no "suitcase")* - ❌ In Chapter 4.1: "The planner estimates the cost of fetching each tuple (the physical suitcase)." *(parenthetical callback β€” never do this)* --- ### The Metaphor Diet: Load-Bearing vs. Superfluous **Only use Load-Bearing Metaphors.** A load-bearing metaphor maps 1:1 to a core architectural concept in the Analogy Map above. These are introduced once in their defining chapter and then retired. **Avoid Superfluous Fluff.** Do not invent new, throwaway analogies (e.g., shoeboxes, hacksaws, mountains, toothpicks, stretchy fabric). If an analogy does not exist in the Core Analogy Map, do not create it. **The "Remove the Metaphor" Test**: If you remove all metaphors from a paragraph, the core technical statement must still be accurate and complete. If it breaks without the metaphor, the metaphor has replaced the technical truth β€” and that is unacceptable. --- ### The Whimsy Budget per Chapter Editing passes have a tendency to over-correct toward dryness. Use this budget as a counterweight: the book is supposed to be *fun to read*. The voice is dry, but the prose should never be lifeless. Per chapter: 1. **Defining chapter only β€” introduce the metaphor exactly once.** A Load-Bearing metaphor is named in a single short paragraph in its defining chapter (per the Analogy Map), following the Expository Sandwich (technical lead β†’ allusion β†’ payoff). After that paragraph, prose is technical-only, with **at most one** closing callback line that gestures back at the metaphor. The image and chapter title may carry the metaphor name; the body prose otherwise stays technical. 2. **Non-defining chapters β€” never name the metaphor.** Other chapters use only the technical term. They are still allowed *one* hyperbolic / absurdist opener line (no Load-Bearing allusion required β€” humor lives in the gap between bureaucratic engine behavior and calm description), and color is reserved for `> [!NOTE]` / `> [!TIP]` callouts. 3. **Named-Cafe examples are sample data, not metaphors.** Babu the Elephant, the orders table walkthrough, the Peanut insert, the Capybara JSON blob β€” these are flavor on top of real SQL. They stay everywhere they ground a concrete example. Do not depersonalize working examples in the name of "tone." 4. **Playful section subheaders are the cheapest whimsy carrier.** Prefer `### The Sequential Scribble` over `### Section 1`, `### The Ticket Gate` over `### The Solution: Connection Pooling`, `### The Hardware Handshake` over `### Calling fsync`. These subheaders carry the chapter's voice without ever appearing in the body prose; they pay for themselves. **The "Introduce Once, Then Ground" Test**: After the metaphor's first appearance in its defining chapter, can the rest of the chapter be read by someone who skipped the introduction and still get the technical content? If yes, the metaphor is doing its job. If the body prose keeps re-naming the metaphor, you are overwriting the reader. **Anti-pattern to watch for**: a "drying pass" that strips the metaphor from the *defining chapter itself* (leaving the chapter image and title pointing at a metaphor that the prose never establishes). If the chapter image is `arch_mvcc_sharpie_v3.png`, the prose must introduce the Sharpie exactly once. --- ### The "Expository Sandwich" (Technical Frame First) Every technical definition must follow this three-part structure. Do not skip any step. ``` 1. Lead with the technical mechanism β†’ "HeapTupleHeaderData is a 23-byte fixed header." 2. Introduce the allusion (optional) β†’ "Think of it as the tuple's passport." 3. Explain the architectural payoff β†’ "This header enables MVCC visibility without locking." ``` **The "Technical Frame First" Principle**: Chapter openings and section leads must always begin with the physical reality β€” what Postgres does, what data structure is involved, what syscall is made. The allusion arrives *after* the ground is established, never before. Never open a section with "In the Elephant Cafe..." β€” open with "Postgres does X because Y." The payoff is the most important part. It answers the reader's implicit question: *"Why does the engine do it this way instead of the obvious way?"* --- ### Voice Calibration β€” The Tone Axis The book lives at a precise point between two failure modes. Both are wrong: | ❌ Too "Zen" (Wrong) | βœ… The Target | ❌ Too "Roleplay" (Wrong) | | :--- | :--- | :--- | | "To destroy X, one must first become X." | "Postgres has a very specific philosophy about data: erasers are strictly forbidden." | "The elephant SCRIBBLES furiously and runs away!" | | Fortune-cookie wisdom | Dry, deadpan, matter-of-fact | Character acting | | Feels mystical | Feels mildly absurd and inevitable | Feels like a children's book | **Rules of thumb:** - State the absurdities of Postgres as plain, undeniable facts. Do not editorialize. - The humor comes from the *gap* between how bureaucratic the engine is and how calmly you describe it. - Never animate the metaphor characters beyond casual allusion. They do not speak, feel, or act. - Assume the reader is a capable engineer. They can handle a C-struct; they just don't know where it lives yet. **Example β€” the right tone:** > "Postgres is incredibly stubborn about its 8KB page limit, but it also understands that occasionally you need to store a 10MB JSON blob detailing the complete life history of a capybara. Instead of panicking or breaking the laws of physics, the engine quietly delegates the problem." --- ### The Feynman Principle β€” Earn Every Technical Term Apply the Feynman Technique to every section: 1. **Start with the naive model** β€” What would a reasonable person assume before knowing better? 2. **Show where it breaks** β€” What failure mode does the naive model produce? 3. **Introduce the real model** β€” The technical depth, now earned. 4. **State the payoff** β€” What does knowing this unlock in a later chapter? **The "What Would Break Without This?" test:** For every major mechanism, be able to answer this in one sentence. If you cannot, the section is not ready. | Concept | Without It | | :--- | :--- | | MVCC | Writers block readers; every UPDATE locks the row for all concurrent transactions. | | WAL | A crash mid-write leaves data in an undefined, unrecoverable state. | | B-Tree Index | Every query scans the entire table regardless of selectivity. | | TOAST | A single large field makes the entire 8KB page unusable. | **The Analogy Stress Test:** Every analogy should explicitly note where it breaks down. Flagging the limit of a metaphor is one of the most trust-building things a technical book can do. Use a `> [!NOTE]` callout for this. --- ### Callout Semantics Obsidian callouts are used with strict semantic intent. Do not use them interchangeably. | Callout | Use for | | :--- | :--- | | `> [!NOTE]` | Director's commentary, absurdist worldbuilding, analogy limits, fun asides | | `> [!TIP]` | Practical optimization hints, performance rules of thumb | | `> [!WARNING]` | Common mistakes, production gotchas, things that cause bugs | | `> [!CAUTION]` | Data-loss risks, destructive operations, security hazards | | `> [!IMPORTANT]` | Load-bearing rules the reader must not skip | The main paragraph body should be kept clean and technical. Jokes, color, and worldbuilding belong inside callouts. --- ## Image Assets Images follow these naming conventions: | Prefix | Category | Examples | | :------- | :------------------------------------ | :---------------------------------------- | | `arch_` | Architectural concept illustrations | `arch_mvcc_sharpie_v2.png`, `arch_tuple_suitcase.png` | | `wl_` | Workload wait event illustrations | `wl_io_datafileread.png`, `wl_lock_relation.png` | | `ex_` | Query execution node illustrations | `ex_seqscan.png`, `ex_hashjoin.png` | | `str_` | Data structure illustrations | `str_idx_btree.png`, `str_idx_gin.png` | | `bits_` | Data type illustrations | `bits_giraffe.png` | | `chap_` | Chapter opener illustrations | `chap_1_blocks.png`, `chap_2_indexes.png` | | `res_` | Resource type icons (small) | `res_cpu.png`, `res_disk.png` | Images are generated using the **Nano Banana Prompting Strategy** defined in `SKILL.md`. Key rules: - Use diverse animal subjects (raccoons, capybaras, frogs, axolotls, crows). **Avoid over-indexing on elephants.** - Art style: "Learn You a Haskell for Great Good" β€” wobbly, organic, thick outlines, flat pastel fills, white background. - No text or labels in images. - Structurally sound anatomy (correct limb counts, exactly two eyes). --- ## Workload Reference Cards The `Workloads/` directory contains one file per Postgres wait event. Each card follows this structure: 1. **Image** β€” Wait event metaphor illustration 2. **Description** β€” Plain-language explanation of what the event means 3. **Operations** β€” Which query operations trigger this event 4. **Resources** β€” Which system resources are involved Categories: `Activity`, `BufferPin`, `Client`, `CPU`, `Extension`, `IO`, `InjectionPoint`, `IPC`, `Lock`, `LWLock`, `Timeout` --- ## Feedback & Audit Files The `FEEDBACK/Narrative_Audit/` directory contains chapter-by-chapter tone audits. These are the primary reference for any refactoring work: - `Summary.md` β€” Master guidelines: Expository Sandwich, Allusory Metaphor strategy, Approachable Absurdism - `Chapter_0_1_Audit.md` β€” Foundations, Tuples, Pages, Storage - `Chapter_2_3_Audit.md` β€” Indexing and the Query Planner - `Chapter_4_5_Audit.md` β€” WAL, Crash Recovery, Resource Management - `Chapter_6_Audit.md` β€” Wait Events and Locking - `Chapter_7_8_Audit.md` β€” Distributed Scaling and Access Control --- ## SQL Accuracy & Validation All SQL in this book is held to a strict accuracy standard. Fabricated or illustrative-only SQL is not acceptable. ### The Ground Truth Schema The canonical schema lives in `scripts/init.sql`. All table names, column names, types, and constraints must be sourced from this file. **Do not invent columns or tables.** The real schema, for quick reference: ```sql -- Custom ENUMs CREATE TYPE diet_category AS ENUM ('Herbivore', 'Carnivore', 'Omnivore'); CREATE TYPE ingredient_category AS ENUM ('Fruit', 'Vegetable', 'Nut', 'Fungus', 'Herb', 'Spice'); CREATE TYPE preference_type AS ENUM ('Loves', 'Hates', 'Allergic'); CREATE TYPE order_status AS ENUM ('Pending', 'Cooking', 'Served', 'Cancelled'); CREATE TYPE scent_primary AS ENUM ('Flowery', 'Fruity', 'Resinous', 'Spicy', 'Putrid', 'Burnt'); -- Core tables (abbreviated) species (id, name TEXT UNIQUE, diet_type diet_category) animals (id, name TEXT, species_id INT β†’ species, created_at TIMESTAMPTZ) ingredients (id, name TEXT UNIQUE, category ingredient_category, base_cost_per_kg NUMERIC) flavors (ingredient_id β†’ ingredients, sweetness_1_to_10, sourness_1_to_10, bitterness_1_to_10, scent_notes scent_primary[], flavor_vector VECTOR(3)) animal_favorites (animal_id β†’ animals, ingredient_id β†’ ingredients, preference preference_type) dishes (id, name TEXT UNIQUE, price NUMERIC, is_active BOOLEAN) dish_ingredients (dish_id β†’ dishes, ingredient_id β†’ ingredients, quantity_grams NUMERIC) orders (id BIGINT, animal_id β†’ animals, order_time TIMESTAMPTZ, status order_status) order_items (order_id β†’ orders, dish_id β†’ dishes, quantity INT) suppliers (id, name TEXT UNIQUE, contact_email TEXT) supply_deliveries (id BIGINT, supplier_id β†’ suppliers, ingredient_id β†’ ingredients, delivery_time TIMESTAMPTZ, quantity_kg NUMERIC) ``` ### Running the Database The local development database is a **Docker Compose** service using `pgvector/pgvector:pg16`. The schema is auto-applied on first boot via `docker-entrypoint-initdb.d/`. ```bash # Start the database docker compose up -d # Connect via psql psql -h localhost -U postgres -d elephant_cafe # Password: password # Or via docker exec docker exec -it elephant_cafe_db psql -U postgres -d elephant_cafe ``` ### Validating SQL Examples Before adding any `SELECT`, `JOIN`, or DML example to the book, the agent **must**: 1. **Start the database**: `docker compose up -d` 2. **Run the query** against `elephant_cafe` to confirm it executes without error 3. **For `EXPLAIN` outputs**: run `EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)` and paste the output verbatim into the chapter β€” never invent plan nodes or cost estimates ```bash # Example: validate a query from the book psql -h localhost -U postgres -d elephant_cafe -c " EXPLAIN (ANALYZE, BUFFERS) SELECT a.name, s.diet_type FROM animals a JOIN species s ON a.species_id = s.id WHERE s.diet_type = 'Herbivore'; " ``` > [!WARNING] > **Never fabricate EXPLAIN output.** Cost estimates, row counts, and plan nodes vary with data volume and table statistics. A fabricated plan teaches the wrong mental model. If the database is not available, note the query as `-- EXPLAIN output pending database validation` and return to validate it before publishing. ### Column Name Traps Some columns have non-obvious names β€” double-check before writing SQL: | Table | Correct Column | Common Mistake | | :---- | :------------- | :------------- | | `animals` | `created_at` | `first_visit` | | `ingredients` | `base_cost_per_kg` | `cost`, `price` | | `flavors` | `flavor_vector VECTOR(3)` | `embedding`, `vector` | | `supply_deliveries` | `quantity_kg`, `delivery_time` | `quantity`, `delivered_at` | | `order_items` | `quantity` (INT) | `amount` | | `orders` | `order_time` | `created_at`, `placed_at` | --- ## Publication Pipeline The book is published via **Obsidian Publish** using `sync_to_obsidian.sh`. All internal links use Obsidian-style wikilinks, for example `[[Manuscript/01 - Foundations & Data Modeling/1.1 - SQL (The Waiter's Pad)|1.1 SQL]]` or `[[Manuscript/02 - Physical Storage & MVCC/2.2 - Tuple (The Suitcase)|2.2 The Suitcase]]`. > [!WARNING] > Never break wikilinks. When renaming or reorganizing files, always audit cross-references. --- ## Key Conventions - All chapter files use **front matter** with `publish: true`, `image:`, and `description:` fields. - Navigation footers use a standard `| ← | ↑ TOC | β†’ |` table pattern. - Diagrams use **Mermaid** (`erDiagram direction LR` for schemas, `graph LR` for flows). - SQL code blocks use `-- comments` to label operations in the Cafe metaphor. - All callouts (`> [!NOTE]`, `> [!TIP]`, etc.) are used semantically β€” see the Callout Semantics section above.