# 🐘 Learn You a Postgres for Great Good β€” Agent Context This file provides essential context for any AI agent working in this repository. > [!IMPORTANT] > **SQL Accuracy Policy**: All SQL examples in the book **must** be valid against the real `scripts/init.sql` schema. All `EXPLAIN` and `EXPLAIN ANALYZE` outputs **must** be run against the live Docker Compose database and pasted verbatim β€” no fabricated outputs. See the **SQL Accuracy** section below. --- ## What Is This Project? **"Learn You a Postgres for Great Good!"** is a narrative-driven technical book about PostgreSQL internals, inspired by *Learn You a Haskell for Great Good* by Miran Lipovača. The book is written in **Markdown** and published via **Obsidian**. It uses a custom sample data model (the **Elephant Cafe**) to frame all technical concepts as a coherent, unified metaphor. The tone is **approachable absurdism**: calm, matter-of-fact, with a dry wit. Think less "fortune-cookie wisdom" and more *Hitchhiker's Guide to the Galaxy*. The engine (Postgres) is always the subject. Animal characters are allusory props, not active roleplay actors. ### Intended Audience The reader is a **developer or data engineer** who: - Has used PostgreSQL (or any SQL database) in production - Can write a `JOIN` and knows what an index is, roughly - Wants to understand *why* Postgres makes the decisions it does β€” not just *how* to configure it - Does **not** need to know C, operating systems, or database internals to start Write for someone who is **technically capable but architecturally curious**. They can handle a C-struct if you show them where it lives; they just haven't been shown yet. Never condescend. Never skip the "why." ## Repository Structure ``` LearnYouAPostgres/ β”œβ”€β”€ Chapter 0/ # Introduction & Data Model β”œβ”€β”€ Chapter 1/ # Storage: Tuples, Pages, MVCC, TOAST β”œβ”€β”€ Chapter 2/ # Indexes: B-Tree, GIN, BRIN, Vectors β”œβ”€β”€ Chapter 3/ # Query Planning & Execution (Scans, Joins, Aggregations) β”œβ”€β”€ Chapter 4/ # WAL, Crash Recovery, Transactions, Isolation β”œβ”€β”€ Chapter 5/ # Resources: Shared Buffers, Work Mem, Vacuum β”œβ”€β”€ Chapter 6/ # Performance: Wait Events, Locks, CPU vs I/O β”œβ”€β”€ Chapter 7/ # Scaling: Replication, Partitioning, Pooling β”œβ”€β”€ Chapter 8/ # Access Control: Roles, RLS, Security Definers β”œβ”€β”€ Architecture/ # Technical references: WAL, MVCC, Transactions, Cluster β”œβ”€β”€ Workloads/ # Per-wait-event reference cards (IO, IPC, Lock, LWLock, etc.) β”œβ”€β”€ Structures/ # Physical structure references: Tuple, Page, Index β”œβ”€β”€ Operations/ # Query operator references: Scans, Joins, Aggregations β”œβ”€β”€ Resources/ # System resource references: CPU, Disk, Memory, Network β”œβ”€β”€ FEEDBACK/ # Agent-generated audit reports and iteration notes β”‚ └── Narrative_Audit/ # Chapter-by-chapter tone and style audits β”œβ”€β”€ assets/ # All images (arch_*, wl_*, ex_*, str_*, bits_*, etc.) β”œβ”€β”€ scripts/ # init.sql and seed data for the Elephant Cafe data model β”œβ”€β”€ SKILL.md # Image generation skill for wait event metaphors β”œβ”€β”€ agents.md # This file └── Learn You a Postgres for Great Good.md # Master TOC ``` --- ## The Elephant Cafe Data Model The book's sample database is a fictional cafe that serves animals. All SQL examples use this schema. **Core Tables:** - `species` β€” Blueprint for animal categories (e.g., Capybara, Herbivore) - `animals` β€” Individual patrons; Foreign-keyed to `species` - `ingredients` β€” Raw pantry materials - `flavors` β€” Vector embeddings and scent profiles per ingredient - `suppliers` β€” Supply chain - `supply_deliveries` β€” The Infinite Ledger (high-volume, partition candidate) - `orders` + `order_items` β€” The heartbeat of the system; exercises MVCC and Locking - `dishes` + `dish_ingredients` β€” Menu items and their composition - `animal_favorites` β€” Many-to-Many between animals and ingredients The full schema lives in `scripts/init.sql` and is referenced as the **"Architectural Inventory."** --- ## Narrative Style Guidelines This is the most critical section. All writing β€” new chapters, expansions, and refactors β€” must conform to these standards. ### The Core Analogy Map | Technical Concept | Elephant Cafe Metaphor | | :---------------------- | :----------------------------- | | Database Engine | The Elephant | | SQL | The Waiter's Pad | | Tuple | The Physical Suitcase | | Page (8KB) | The Shipping Container | | Table | The Table | | Index | The Cheat Sheet / Bookshelf | | MVCC | The Sharpie Ledger | | WAL | The Pocket Diary | | Autovacuum | The Housekeepers / Roomba | | Query Planner | The Head Chef | | Shared Buffers | The Warming Rack | | TOAST | The Separate Trailer | | Connection Pooler | The Maitre D' | | Lock Contention | The Narrow Bridge | | Wait Events | The Stopwatch | | Roles / Privileges | Name Tags / Room Keys | | Row-Level Security | The VIP List / Bouncer | | Transaction | The Pinky Swear | | Isolation Levels | Looking Glass Windows | | XID Wraparound | The Infinite Calendar | | Read Replicas | The Many Shouting Elephants | | Partitioning | Table Partitioning | --- ### The "Allusory" Rule β€” Postgres is Always the Subject **The metaphors are background color, not the foreground subject.** Postgres architecture must be the grammatical subject of every technical sentence. Characters and allusions exist only to make that subject vivid, not to replace it. βœ… **Good**: "Postgres marks the old tuple as dead and writes a fresh version elsewhere β€” what the engine calls MVCC." ❌ **Bad**: "The elephant furiously scribbles on the old suitcase and ignores it forever!" The test: if you removed every metaphor from a paragraph, would the core technical statement still be accurate and complete? If yes, the metaphors are allusory. If no, they are load-bearing β€” and that is wrong. --- ### The "Expository Sandwich" Every technical definition must follow this three-part structure. Do not skip any step. ``` 1. Lead with the technical term β†’ "HeapTupleHeaderData" 2. Introduce the allusion β†’ "Think of it as the suitcase's bureaucratic passport." 3. Explain the architectural payoff β†’ "This header enables MVCC visibility without locking." ``` The payoff is the most important part. It answers the reader's implicit question: *"Why does the engine do it this way instead of the obvious way?"* --- ### Voice Calibration β€” The Tone Axis The book lives at a precise point between two failure modes. Both are wrong: | ❌ Too "Zen" (Wrong) | βœ… The Target | ❌ Too "Roleplay" (Wrong) | | :--- | :--- | :--- | | "To destroy X, one must first become X." | "Postgres has a very specific philosophy about data: erasers are strictly forbidden." | "The elephant SCRIBBLES furiously and runs away!" | | Fortune-cookie wisdom | Dry, deadpan, matter-of-fact | Character acting | | Feels mystical | Feels mildly absurd and inevitable | Feels like a children's book | **Rules of thumb:** - State the absurdities of Postgres as plain, undeniable facts. Do not editorialize. - The humor comes from the *gap* between how bureaucratic the engine is and how calmly you describe it. - Never animate the metaphor characters beyond casual allusion. They do not speak, feel, or act. - Assume the reader is a capable engineer. They can handle a C-struct; they just don't know where it lives yet. **Example β€” the right tone:** > "Postgres is incredibly stubborn about its 8KB page limit, but it also understands that occasionally you need to store a 10MB JSON blob detailing the complete life history of a capybara. Instead of panicking or breaking the laws of physics, the engine quietly delegates the problem." --- ### The Feynman Principle β€” Earn Every Technical Term Apply the Feynman Technique to every section: 1. **Start with the naive model** β€” What would a reasonable person assume before knowing better? 2. **Show where it breaks** β€” What failure mode does the naive model produce? 3. **Introduce the real model** β€” The technical depth, now earned. 4. **State the payoff** β€” What does knowing this unlock in a later chapter? **The "What Would Break Without This?" test:** For every major mechanism, be able to answer this in one sentence. If you cannot, the section is not ready. | Concept | Without It | | :--- | :--- | | MVCC | Writers block readers; every UPDATE locks the row for all concurrent transactions. | | WAL | A crash mid-write leaves data in an undefined, unrecoverable state. | | B-Tree Index | Every query scans the entire table regardless of selectivity. | | TOAST | A single large field makes the entire 8KB page unusable. | **The Analogy Stress Test:** Every analogy should explicitly note where it breaks down. Flagging the limit of a metaphor is one of the most trust-building things a technical book can do. Use a `> [!NOTE]` callout for this. --- ### Callout Semantics Obsidian callouts are used with strict semantic intent. Do not use them interchangeably. | Callout | Use for | | :--- | :--- | | `> [!NOTE]` | Director's commentary, absurdist worldbuilding, analogy limits, fun asides | | `> [!TIP]` | Practical optimization hints, performance rules of thumb | | `> [!WARNING]` | Common mistakes, production gotchas, things that cause bugs | | `> [!CAUTION]` | Data-loss risks, destructive operations, security hazards | | `> [!IMPORTANT]` | Load-bearing rules the reader must not skip | The main paragraph body should be kept clean and technical. Jokes, color, and worldbuilding belong inside callouts. --- ## Image Assets Images follow these naming conventions: | Prefix | Category | Examples | | :------- | :------------------------------------ | :---------------------------------------- | | `arch_` | Architectural concept illustrations | `arch_mvcc_sharpie_v2.png`, `arch_tuple_suitcase.png` | | `wl_` | Workload wait event illustrations | `wl_io_datafileread.png`, `wl_lock_relation.png` | | `ex_` | Query execution node illustrations | `ex_seqscan.png`, `ex_hashjoin.png` | | `str_` | Data structure illustrations | `str_idx_btree.png`, `str_idx_gin.png` | | `bits_` | Data type illustrations | `bits_giraffe.png` | | `chap_` | Chapter opener illustrations | `chap_1_blocks.png`, `chap_2_indexes.png` | | `res_` | Resource type icons (small) | `res_cpu.png`, `res_disk.png` | Images are generated using the **Nano Banana Prompting Strategy** defined in `SKILL.md`. Key rules: - Use diverse animal subjects (raccoons, capybaras, frogs, axolotls, crows). **Avoid over-indexing on elephants.** - Art style: "Learn You a Haskell for Great Good" β€” wobbly, organic, thick outlines, flat pastel fills, white background. - No text or labels in images. - Structurally sound anatomy (correct limb counts, exactly two eyes). --- ## Workload Reference Cards The `Workloads/` directory contains one file per Postgres wait event. Each card follows this structure: 1. **Image** β€” Wait event metaphor illustration 2. **Description** β€” Plain-language explanation of what the event means 3. **Operations** β€” Which query operations trigger this event 4. **Resources** β€” Which system resources are involved Categories: `Activity`, `BufferPin`, `Client`, `CPU`, `Extension`, `IO`, `InjectionPoint`, `IPC`, `Lock`, `LWLock`, `Timeout` --- ## Feedback & Audit Files The `FEEDBACK/Narrative_Audit/` directory contains chapter-by-chapter tone audits. These are the primary reference for any refactoring work: - `Summary.md` β€” Master guidelines: Expository Sandwich, Allusory Metaphor strategy, Approachable Absurdism - `Chapter_0_1_Audit.md` β€” Foundations, Tuples, Pages, Storage - `Chapter_2_3_Audit.md` β€” Indexing and the Query Planner - `Chapter_4_5_Audit.md` β€” WAL, Crash Recovery, Resource Management - `Chapter_6_Audit.md` β€” Wait Events and Locking - `Chapter_7_8_Audit.md` β€” Distributed Scaling and Access Control --- ## SQL Accuracy & Validation All SQL in this book is held to a strict accuracy standard. Fabricated or illustrative-only SQL is not acceptable. ### The Ground Truth Schema The canonical schema lives in `scripts/init.sql`. All table names, column names, types, and constraints must be sourced from this file. **Do not invent columns or tables.** The real schema, for quick reference: ```sql -- Custom ENUMs CREATE TYPE diet_category AS ENUM ('Herbivore', 'Carnivore', 'Omnivore'); CREATE TYPE ingredient_category AS ENUM ('Fruit', 'Vegetable', 'Nut', 'Fungus', 'Herb', 'Spice'); CREATE TYPE preference_type AS ENUM ('Loves', 'Hates', 'Allergic'); CREATE TYPE order_status AS ENUM ('Pending', 'Cooking', 'Served', 'Cancelled'); CREATE TYPE scent_primary AS ENUM ('Flowery', 'Fruity', 'Resinous', 'Spicy', 'Putrid', 'Burnt'); -- Core tables (abbreviated) species (id, name TEXT UNIQUE, diet_type diet_category) animals (id, name TEXT, species_id INT β†’ species, created_at TIMESTAMPTZ) ingredients (id, name TEXT UNIQUE, category ingredient_category, base_cost_per_kg NUMERIC) flavors (ingredient_id β†’ ingredients, sweetness_1_to_10, sourness_1_to_10, bitterness_1_to_10, scent_notes scent_primary[], flavor_vector VECTOR(3)) animal_favorites (animal_id β†’ animals, ingredient_id β†’ ingredients, preference preference_type) dishes (id, name TEXT UNIQUE, price NUMERIC, is_active BOOLEAN) dish_ingredients (dish_id β†’ dishes, ingredient_id β†’ ingredients, quantity_grams NUMERIC) orders (id BIGINT, animal_id β†’ animals, order_time TIMESTAMPTZ, status order_status) order_items (order_id β†’ orders, dish_id β†’ dishes, quantity INT) suppliers (id, name TEXT UNIQUE, contact_email TEXT) supply_deliveries (id BIGINT, supplier_id β†’ suppliers, ingredient_id β†’ ingredients, delivery_time TIMESTAMPTZ, quantity_kg NUMERIC) ``` ### Running the Database The local development database is a **Docker Compose** service using `pgvector/pgvector:pg16`. The schema is auto-applied on first boot via `docker-entrypoint-initdb.d/`. ```bash # Start the database docker compose up -d # Connect via psql psql -h localhost -U postgres -d elephant_cafe # Password: password # Or via docker exec docker exec -it elephant_cafe_db psql -U postgres -d elephant_cafe ``` ### Validating SQL Examples Before adding any `SELECT`, `JOIN`, or DML example to the book, the agent **must**: 1. **Start the database**: `docker compose up -d` 2. **Run the query** against `elephant_cafe` to confirm it executes without error 3. **For `EXPLAIN` outputs**: run `EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)` and paste the output verbatim into the chapter β€” never invent plan nodes or cost estimates ```bash # Example: validate a query from the book psql -h localhost -U postgres -d elephant_cafe -c " EXPLAIN (ANALYZE, BUFFERS) SELECT a.name, s.diet_type FROM animals a JOIN species s ON a.species_id = s.id WHERE s.diet_type = 'Herbivore'; " ``` > [!WARNING] > **Never fabricate EXPLAIN output.** Cost estimates, row counts, and plan nodes vary with data volume and table statistics. A fabricated plan teaches the wrong mental model. If the database is not available, note the query as `-- EXPLAIN output pending database validation` and return to validate it before publishing. ### Column Name Traps Some columns have non-obvious names β€” double-check before writing SQL: | Table | Correct Column | Common Mistake | | :---- | :------------- | :------------- | | `animals` | `created_at` | `first_visit` | | `ingredients` | `base_cost_per_kg` | `cost`, `price` | | `flavors` | `flavor_vector VECTOR(3)` | `embedding`, `vector` | | `supply_deliveries` | `quantity_kg`, `delivery_time` | `quantity`, `delivered_at` | | `order_items` | `quantity` (INT) | `amount` | | `orders` | `order_time` | `created_at`, `placed_at` | --- ## Publication Pipeline The book is published via **Obsidian Publish** using `sync_to_obsidian.sh`. All internal links use Obsidian-style wikilinks: `[[Chapter 1/1.2 - The Physical Suitcase (The Tuple)|1.2 The Tuple]]`. > [!WARNING] > Never break wikilinks. When renaming or reorganizing files, always audit cross-references. --- ## Key Conventions - All chapter files use **front matter** with `publish: true`, `image:`, and `description:` fields. - Navigation footers use a standard `| ← | ↑ TOC | β†’ |` table pattern. - Diagrams use **Mermaid** (`erDiagram direction LR` for schemas, `graph LR` for flows). - SQL code blocks use `-- comments` to label operations in the Cafe metaphor. - All callouts (`> [!NOTE]`, `> [!TIP]`, etc.) are used semantically β€” see the Callout Semantics section above.