0.2 - The Relational Model

# 0.2 The Relational Model ![The Relational Model Set](assets/arch_relational_model_set.png) SQL is not the foundation of Postgres. It is a surface dialect — a human-readable layer resting on top of a much older, more rigorous framework called **Relational Algebra**. Understanding the difference between the two changes how you read query plans. The core insight is simple: if you define a database table not as a spreadsheet, but as a **mathematical set of typed tuples**, you gain a clean set of operations that compose predictably. The engine can then rearrange those operations freely, provided the result is algebraically equivalent. Most of the Query Planner's "magic" is just this rule applied recursively. --- ## What a Relation Actually Is A **Relation** is a set of tuples sharing a common attribute header. This sounds like a table, but two important constraints fall out of the mathematical definition that distinguish it from a spreadsheet: **1. No guaranteed ordering.** A set has no inherent sequence. Postgres does not promise that rows come back in the order they were inserted, or in any order at all, unless you explicitly ask for one with `ORDER BY`. This surprises a surprising number of engineers in production. ```sql -- This returns rows in no guaranteed order. SELECT * FROM animals; -- This returns rows in a guaranteed order. SELECT * FROM animals ORDER BY name; ``` **2. No duplicate rows (by default, unenforced).** A true mathematical set prohibits duplicate elements. Postgres does not enforce this constraint automatically — you can insert two identical rows — but a `UNIQUE` or `PRIMARY KEY` constraint brings a table into formal compliance. The decision to not enforce uniqueness by default was deliberate: enforcing it has a cost, and not every table needs it. --- ## The Five Primitive Operations Every `SELECT` statement you have ever written decomposes into exactly five algebraic operations. These five are sufficient to express any query over a relational schema: | Symbol | Operation | What It Does | SQL Equivalent | | :----- | :------------- | :--------------------------------------------------- | :--------------------------------------- | | `σ` | **Selection** | Filter the set — keep only tuples matching predicate | `WHERE diet_type = 'Herbivore'` | | `π` | **Projection** | Trim the header — keep only specified attributes | `SELECT name, diet_type` | | `⋈` | **Join** | Combine two relations on a shared attribute | `JOIN species ON animals.species_id = species.id` | | `∪` | **Union** | Merge two compatible sets, remove duplicates | `UNION` | | `−` | **Difference** | Return tuples in the first set absent from the second | `EXCEPT` | Applied to the Elephant Cafe: ```sql -- A Selection (σ) followed by a Projection (π): -- "Give me only the names of Herbivores." SELECT name -- π (project onto 'name') FROM animals JOIN species ON animals.species_id = species.id -- ⋈ (join) WHERE species.diet_type = 'Herbivore'; -- σ (select) ``` Under the hood, the engine sees this as a composition of `σ ∘ ⋈` applied to the two base relations `animals` and `species`, then projected onto `π(name)`. --- ## Why Composability Matters: The Planner's Freedom Consider a world where this property did not exist. Suppose filtering a table returned something that was *not* a table — maybe a list, or a stream, or some other structure. In that world, you could not take the result of a `WHERE` and feed it into a `JOIN`, because the types would not match. Every operation would need to know about every other operation's output format. The planner could not rearrange anything, because rearranging would break the pipeline. You would be stuck executing queries in exactly the order you wrote them. This is, in fact, how most imperative data processing works. And it is precisely the limitation that relational algebra was designed to eliminate. ### Closure: The One Rule That Makes Everything Else Possible The critical property is **closure**: every relational operation takes a Relation as input and returns a Relation as output. Not a different kind of thing. The same kind of thing — a set of typed tuples with a header. - A `Selection` (`σ`) returns a Relation. - A `Projection` (`π`) returns a Relation. - A `Join` (`⋈`) returns a Relation. - A `Union` (`∪`) returns a Relation. This uniformity means operations are **freely composable**. You can stack them, nest them, and reorder them — and the types never break. The output of any operation is always a valid input to any other. ### What the Planner Does With This Freedom Because the algebra is closed, the **Query Planner** (covered in detail in **[[Chapter 3/3.1 - The Head Chef's Menu (Query Planning)|Chapter 3]]**) can apply algebraic identities to rewrite your query into a cheaper equivalent. The most common rewrite is **predicate pushdown**: moving a filter earlier in the pipeline so it reduces the data volume before an expensive operation. Consider this query: ```sql -- What you wrote: SELECT * FROM orders JOIN animals ON orders.animal_id = animals.id WHERE animals.name = 'Babu'; ``` As written, this says: "Join the entire `orders` table to the entire `animals` table, then filter the result to just Babu." If `orders` has 10 million rows and `animals` has 50,000 rows, that join produces an enormous intermediate result — most of which is immediately thrown away by the `WHERE`. The planner sees the algebraic equivalence: ``` σ(⋈(orders, animals)) = ⋈(orders, σ(animals)) ``` The filter `name = 'Babu'` depends only on `animals`, so the planner can legally push it *before* the join. Now the engine filters `animals` down to a single row first, then joins that one row against `orders`. The result is mathematically identical. The cost is dramatically lower. ```sql -- What the planner actually executes: -- 1. Filter animals to just 'Babu' (1 row) -- 2. Join that 1 row against orders -- The result is identical. The work is not. ``` This is not a special optimization. It is a direct consequence of closure. Every rewrite the planner applies — predicate pushdown, join reordering, projection pruning — is just an algebraic identity that happens to be cheaper to execute. > [!NOTE] > **You are not writing instructions. You are writing a specification.** > SQL is declarative precisely *because* relational algebra is closed. The engine is free to completely ignore the order in which you wrote your clauses and substitute any algebraically equivalent plan. Whether it uses a Sequential Scan, an Index Scan, or a Hash Join is determined by cost estimates, not by your syntax. This is the contract the relational model guarantees. Postgres honors it. --- ## A Note on Normalization The relational model also defines a hierarchy of **Normal Forms** — a set of increasingly strict rules for eliminating redundancy from a schema. The Elephant Cafe schema follows the first three: | Normal Form | Rule | Example Violation | | :---------- | :----------------------------------------------------------- | :--------------------------------------------------------- | | **1NF** | Every attribute is atomic — no nested arrays or comma lists | `diet_type = "Herbivore, Omnivore"` (two values in one cell) | | **2NF** | No partial dependency on a composite key | Storing `species_name` in `animals` alongside `species_id` | | **3NF** | No transitive dependency — non-key attributes depend only on the key | Storing `diet_category` derived from `diet_type` in `animals` | The Cafe enforces 3NF by keeping species attributes in `species` and pointing to them via foreign key. This is not pedantry — it is what allows a single `UPDATE species SET diet_type = 'Plant-Based' WHERE id = 1` to propagate immediately across every related animal record without touching the `animals` table at all. > [!TIP] > Denormalization (intentionally breaking these rules for performance) is a real and sometimes correct choice. We will encounter it in **[[Chapter 5/5.2 - The Private Desk (Work Mem)|Chapter 5]]** when discussing materialized aggregations, and in **[[Chapter 7/7.3 - Table Partitioning|Chapter 7]]** when discussing partition strategies. --- | ← Previous | ↑ Table of Contents | Next → | | :--- | :---: | ---: | | [[Chapter 0/0.1 - The Language of the Cafe (SQL)\|0.1 The Language of the Cafe (SQL)]] | [[Learn You a Postgres for Great Good\|Home]] | [[Chapter 1/1.0 - The Building Blocks of Storage\|Chapter 1 - The Building Blocks of Storage]] |