# π Learn You a Postgres for Great Good β Agent Context
This file provides essential context for any AI agent working in this repository.
> [!IMPORTANT]
> **SQL Accuracy Policy**: All SQL examples in the book **must** be valid against the real `scripts/init.sql` schema. All `EXPLAIN` and `EXPLAIN ANALYZE` outputs **must** be run against the live Docker Compose database and pasted verbatim β no fabricated outputs. See the **SQL Accuracy** section below.
---
## What Is This Project?
**"Learn You a Postgres for Great Good!"** is a narrative-driven technical book about PostgreSQL internals, inspired by *Learn You a Haskell for Great Good* by Miran LipovaΔa.
The book is written in **Markdown** and published via **Obsidian**. It uses a custom sample data model (the **Elephant Cafe**) to frame all technical concepts as a coherent, unified metaphor.
The tone is **approachable absurdism**: calm, matter-of-fact, with a dry wit. Think less "fortune-cookie wisdom" and more *Hitchhiker's Guide to the Galaxy*. The engine (Postgres) is always the subject. Animal characters are allusory props, not active roleplay actors.
### Intended Audience
The reader is a **developer or data engineer** who:
- Has used PostgreSQL (or any SQL database) in production
- Can write a `JOIN` and knows what an index is, roughly
- Wants to understand *why* Postgres makes the decisions it does β not just *how* to configure it
- Does **not** need to know C, operating systems, or database internals to start
Write for someone who is **technically capable but architecturally curious**. They can handle a C-struct if you show them where it lives; they just haven't been shown yet. Never condescend. Never skip the "why."
## Repository Structure
```
LearnYouAPostgres/
βββ Chapter 0/ # Introduction & Data Model
βββ Chapter 1/ # Storage: Tuples, Pages, MVCC, TOAST
βββ Chapter 2/ # Indexes: B-Tree, GIN, BRIN, Vectors
βββ Chapter 3/ # Query Planning & Execution (Scans, Joins, Aggregations)
βββ Chapter 4/ # WAL, Crash Recovery, Transactions, Isolation
βββ Chapter 5/ # Resources: Shared Buffers, Work Mem, Vacuum
βββ Chapter 6/ # Performance: Wait Events, Locks, CPU vs I/O
βββ Chapter 7/ # Scaling: Replication, Partitioning, Pooling
βββ Chapter 8/ # Access Control: Roles, RLS, Security Definers
βββ Architecture/ # Technical references: WAL, MVCC, Transactions, Cluster
βββ Workloads/ # Per-wait-event reference cards (IO, IPC, Lock, LWLock, etc.)
βββ Structures/ # Physical structure references: Tuple, Page, Index
βββ Operations/ # Query operator references: Scans, Joins, Aggregations
βββ Resources/ # System resource references: CPU, Disk, Memory, Network
βββ FEEDBACK/ # Agent-generated audit reports and iteration notes
β βββ Narrative_Audit/ # Chapter-by-chapter tone and style audits
βββ assets/ # All images (arch_*, wl_*, ex_*, str_*, bits_*, etc.)
βββ scripts/ # init.sql and seed data for the Elephant Cafe data model
βββ SKILL.md # Image generation skill for wait event metaphors
βββ agents.md # This file
βββ Learn You a Postgres for Great Good.md # Master TOC
```
---
## The Elephant Cafe Data Model
The book's sample database is a fictional cafe that serves animals. All SQL examples use this schema.
**Core Tables:**
- `species` β Blueprint for animal categories (e.g., Capybara, Herbivore)
- `animals` β Individual patrons; Foreign-keyed to `species`
- `ingredients` β Raw pantry materials
- `flavors` β Vector embeddings and scent profiles per ingredient
- `suppliers` β Supply chain
- `supply_deliveries` β The Infinite Ledger (high-volume, partition candidate)
- `orders` + `order_items` β The heartbeat of the system; exercises MVCC and Locking
- `dishes` + `dish_ingredients` β Menu items and their composition
- `animal_favorites` β Many-to-Many between animals and ingredients
The full schema lives in `scripts/init.sql` and is referenced as the **"Architectural Inventory."**
---
## Narrative Style Guidelines
This is the most critical section. All writing β new chapters, expansions, and refactors β must conform to these standards.
### The Core Analogy Map
| Technical Concept | Elephant Cafe Metaphor |
| :---------------------- | :----------------------------- |
| Database Engine | The Elephant |
| SQL | The Waiter's Pad |
| Tuple | The Physical Suitcase |
| Page (8KB) | The Shipping Container |
| Table | The Table |
| Index | The Cheat Sheet / Bookshelf |
| MVCC | The Sharpie Ledger |
| WAL | The Pocket Diary |
| Autovacuum | The Housekeepers / Roomba |
| Query Planner | The Head Chef |
| Shared Buffers | The Warming Rack |
| TOAST | The Separate Trailer |
| Connection Pooler | The Maitre D' |
| Lock Contention | The Narrow Bridge |
| Wait Events | The Stopwatch |
| Roles / Privileges | Name Tags / Room Keys |
| Row-Level Security | The VIP List / Bouncer |
| Transaction | The Pinky Swear |
| Isolation Levels | Looking Glass Windows |
| XID Wraparound | The Infinite Calendar |
| Read Replicas | The Many Shouting Elephants |
| Partitioning | Table Partitioning |
---
### The "Allusory" Rule β Postgres is Always the Subject
**The metaphors are background color, not the foreground subject.** Postgres architecture must be the grammatical subject of every technical sentence. Characters and allusions exist only to make that subject vivid, not to replace it.
β
**Good**: "Postgres marks the old tuple as dead and writes a fresh version elsewhere β what the engine calls MVCC."
β **Bad**: "The elephant furiously scribbles on the old suitcase and ignores it forever!"
The test: if you removed every metaphor from a paragraph, would the core technical statement still be accurate and complete? If yes, the metaphors are allusory. If no, they are load-bearing β and that is wrong.
---
### The "Expository Sandwich"
Every technical definition must follow this three-part structure. Do not skip any step.
```
1. Lead with the technical term β "HeapTupleHeaderData"
2. Introduce the allusion β "Think of it as the suitcase's bureaucratic passport."
3. Explain the architectural payoff β "This header enables MVCC visibility without locking."
```
The payoff is the most important part. It answers the reader's implicit question: *"Why does the engine do it this way instead of the obvious way?"*
---
### Voice Calibration β The Tone Axis
The book lives at a precise point between two failure modes. Both are wrong:
| β Too "Zen" (Wrong) | β
The Target | β Too "Roleplay" (Wrong) |
| :--- | :--- | :--- |
| "To destroy X, one must first become X." | "Postgres has a very specific philosophy about data: erasers are strictly forbidden." | "The elephant SCRIBBLES furiously and runs away!" |
| Fortune-cookie wisdom | Dry, deadpan, matter-of-fact | Character acting |
| Feels mystical | Feels mildly absurd and inevitable | Feels like a children's book |
**Rules of thumb:**
- State the absurdities of Postgres as plain, undeniable facts. Do not editorialize.
- The humor comes from the *gap* between how bureaucratic the engine is and how calmly you describe it.
- Never animate the metaphor characters beyond casual allusion. They do not speak, feel, or act.
- Assume the reader is a capable engineer. They can handle a C-struct; they just don't know where it lives yet.
**Example β the right tone:**
> "Postgres is incredibly stubborn about its 8KB page limit, but it also understands that occasionally you need to store a 10MB JSON blob detailing the complete life history of a capybara. Instead of panicking or breaking the laws of physics, the engine quietly delegates the problem."
---
### The Feynman Principle β Earn Every Technical Term
Apply the Feynman Technique to every section:
1. **Start with the naive model** β What would a reasonable person assume before knowing better?
2. **Show where it breaks** β What failure mode does the naive model produce?
3. **Introduce the real model** β The technical depth, now earned.
4. **State the payoff** β What does knowing this unlock in a later chapter?
**The "What Would Break Without This?" test:** For every major mechanism, be able to answer this in one sentence. If you cannot, the section is not ready.
| Concept | Without It |
| :--- | :--- |
| MVCC | Writers block readers; every UPDATE locks the row for all concurrent transactions. |
| WAL | A crash mid-write leaves data in an undefined, unrecoverable state. |
| B-Tree Index | Every query scans the entire table regardless of selectivity. |
| TOAST | A single large field makes the entire 8KB page unusable. |
**The Analogy Stress Test:** Every analogy should explicitly note where it breaks down. Flagging the limit of a metaphor is one of the most trust-building things a technical book can do. Use a `> [!NOTE]` callout for this.
---
### Callout Semantics
Obsidian callouts are used with strict semantic intent. Do not use them interchangeably.
| Callout | Use for |
| :--- | :--- |
| `> [!NOTE]` | Director's commentary, absurdist worldbuilding, analogy limits, fun asides |
| `> [!TIP]` | Practical optimization hints, performance rules of thumb |
| `> [!WARNING]` | Common mistakes, production gotchas, things that cause bugs |
| `> [!CAUTION]` | Data-loss risks, destructive operations, security hazards |
| `> [!IMPORTANT]` | Load-bearing rules the reader must not skip |
The main paragraph body should be kept clean and technical. Jokes, color, and worldbuilding belong inside callouts.
---
## Image Assets
Images follow these naming conventions:
| Prefix | Category | Examples |
| :------- | :------------------------------------ | :---------------------------------------- |
| `arch_` | Architectural concept illustrations | `arch_mvcc_sharpie_v2.png`, `arch_tuple_suitcase.png` |
| `wl_` | Workload wait event illustrations | `wl_io_datafileread.png`, `wl_lock_relation.png` |
| `ex_` | Query execution node illustrations | `ex_seqscan.png`, `ex_hashjoin.png` |
| `str_` | Data structure illustrations | `str_idx_btree.png`, `str_idx_gin.png` |
| `bits_` | Data type illustrations | `bits_giraffe.png` |
| `chap_` | Chapter opener illustrations | `chap_1_blocks.png`, `chap_2_indexes.png` |
| `res_` | Resource type icons (small) | `res_cpu.png`, `res_disk.png` |
Images are generated using the **Nano Banana Prompting Strategy** defined in `SKILL.md`. Key rules:
- Use diverse animal subjects (raccoons, capybaras, frogs, axolotls, crows). **Avoid over-indexing on elephants.**
- Art style: "Learn You a Haskell for Great Good" β wobbly, organic, thick outlines, flat pastel fills, white background.
- No text or labels in images.
- Structurally sound anatomy (correct limb counts, exactly two eyes).
---
## Workload Reference Cards
The `Workloads/` directory contains one file per Postgres wait event. Each card follows this structure:
1. **Image** β Wait event metaphor illustration
2. **Description** β Plain-language explanation of what the event means
3. **Operations** β Which query operations trigger this event
4. **Resources** β Which system resources are involved
Categories: `Activity`, `BufferPin`, `Client`, `CPU`, `Extension`, `IO`, `InjectionPoint`, `IPC`, `Lock`, `LWLock`, `Timeout`
---
## Feedback & Audit Files
The `FEEDBACK/Narrative_Audit/` directory contains chapter-by-chapter tone audits. These are the primary reference for any refactoring work:
- `Summary.md` β Master guidelines: Expository Sandwich, Allusory Metaphor strategy, Approachable Absurdism
- `Chapter_0_1_Audit.md` β Foundations, Tuples, Pages, Storage
- `Chapter_2_3_Audit.md` β Indexing and the Query Planner
- `Chapter_4_5_Audit.md` β WAL, Crash Recovery, Resource Management
- `Chapter_6_Audit.md` β Wait Events and Locking
- `Chapter_7_8_Audit.md` β Distributed Scaling and Access Control
---
## SQL Accuracy & Validation
All SQL in this book is held to a strict accuracy standard. Fabricated or illustrative-only SQL is not acceptable.
### The Ground Truth Schema
The canonical schema lives in `scripts/init.sql`. All table names, column names, types, and constraints must be sourced from this file. **Do not invent columns or tables.**
The real schema, for quick reference:
```sql
-- Custom ENUMs
CREATE TYPE diet_category AS ENUM ('Herbivore', 'Carnivore', 'Omnivore');
CREATE TYPE ingredient_category AS ENUM ('Fruit', 'Vegetable', 'Nut', 'Fungus', 'Herb', 'Spice');
CREATE TYPE preference_type AS ENUM ('Loves', 'Hates', 'Allergic');
CREATE TYPE order_status AS ENUM ('Pending', 'Cooking', 'Served', 'Cancelled');
CREATE TYPE scent_primary AS ENUM ('Flowery', 'Fruity', 'Resinous', 'Spicy', 'Putrid', 'Burnt');
-- Core tables (abbreviated)
species (id, name TEXT UNIQUE, diet_type diet_category)
animals (id, name TEXT, species_id INT β species, created_at TIMESTAMPTZ)
ingredients (id, name TEXT UNIQUE, category ingredient_category, base_cost_per_kg NUMERIC)
flavors (ingredient_id β ingredients, sweetness_1_to_10, sourness_1_to_10, bitterness_1_to_10, scent_notes scent_primary[], flavor_vector VECTOR(3))
animal_favorites (animal_id β animals, ingredient_id β ingredients, preference preference_type)
dishes (id, name TEXT UNIQUE, price NUMERIC, is_active BOOLEAN)
dish_ingredients (dish_id β dishes, ingredient_id β ingredients, quantity_grams NUMERIC)
orders (id BIGINT, animal_id β animals, order_time TIMESTAMPTZ, status order_status)
order_items (order_id β orders, dish_id β dishes, quantity INT)
suppliers (id, name TEXT UNIQUE, contact_email TEXT)
supply_deliveries (id BIGINT, supplier_id β suppliers, ingredient_id β ingredients, delivery_time TIMESTAMPTZ, quantity_kg NUMERIC)
```
### Running the Database
The local development database is a **Docker Compose** service using `pgvector/pgvector:pg16`. The schema is auto-applied on first boot via `docker-entrypoint-initdb.d/`.
```bash
# Start the database
docker compose up -d
# Connect via psql
psql -h localhost -U postgres -d elephant_cafe
# Password: password
# Or via docker exec
docker exec -it elephant_cafe_db psql -U postgres -d elephant_cafe
```
### Validating SQL Examples
Before adding any `SELECT`, `JOIN`, or DML example to the book, the agent **must**:
1. **Start the database**: `docker compose up -d`
2. **Run the query** against `elephant_cafe` to confirm it executes without error
3. **For `EXPLAIN` outputs**: run `EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)` and paste the output verbatim into the chapter β never invent plan nodes or cost estimates
```bash
# Example: validate a query from the book
psql -h localhost -U postgres -d elephant_cafe -c "
EXPLAIN (ANALYZE, BUFFERS)
SELECT a.name, s.diet_type
FROM animals a
JOIN species s ON a.species_id = s.id
WHERE s.diet_type = 'Herbivore';
"
```
> [!WARNING]
> **Never fabricate EXPLAIN output.** Cost estimates, row counts, and plan nodes vary with data volume and table statistics. A fabricated plan teaches the wrong mental model. If the database is not available, note the query as `-- EXPLAIN output pending database validation` and return to validate it before publishing.
### Column Name Traps
Some columns have non-obvious names β double-check before writing SQL:
| Table | Correct Column | Common Mistake |
| :---- | :------------- | :------------- |
| `animals` | `created_at` | `first_visit` |
| `ingredients` | `base_cost_per_kg` | `cost`, `price` |
| `flavors` | `flavor_vector VECTOR(3)` | `embedding`, `vector` |
| `supply_deliveries` | `quantity_kg`, `delivery_time` | `quantity`, `delivered_at` |
| `order_items` | `quantity` (INT) | `amount` |
| `orders` | `order_time` | `created_at`, `placed_at` |
---
## Publication Pipeline
The book is published via **Obsidian Publish** using `sync_to_obsidian.sh`. All internal links use Obsidian-style wikilinks: `[[Chapter 1/1.2 - The Physical Suitcase (The Tuple)|1.2 The Tuple]]`.
> [!WARNING]
> Never break wikilinks. When renaming or reorganizing files, always audit cross-references.
---
## Key Conventions
- All chapter files use **front matter** with `publish: true`, `image:`, and `description:` fields.
- Navigation footers use a standard `| β | β TOC | β |` table pattern.
- Diagrams use **Mermaid** (`erDiagram direction LR` for schemas, `graph LR` for flows).
- SQL code blocks use `-- comments` to label operations in the Cafe metaphor.
- All callouts (`> [!NOTE]`, `> [!TIP]`, etc.) are used semantically β see the Callout Semantics section above.