# 3.2.2 Joins

Once raw suitcases have been fetched from the table heap, they must be **Joined**. In the Query Algebra, this is the responsibility of the **Join Nodes**. Their objective is to cross-reference two distinct streams of data and assemble them into a single **Composite Crate**.
Depending on the scale of the data and the presence of indexes, the planner selects from three primary Join algorithms.
> [!IMPORTANT]
> **The Join Pull**: Remember the **[[Chapter 3/3.2 - The Assembly Line (Query Algebra)|Pull Model]]**! The Sous Chef doesn't just start mashing ingredients. He waits for the station above him to shout, "Give me a partner!" Only then does he reach into his bowls.
## Nested Loop Join: The Iterative Look-up
The **[[Operations/NestedLoop|Nested Loop Join]]** is the base algorithm for joining two relations. It follows a simple iterative logic: for **every single record** in the Outer relation, Postgres performs a scan of the Inner relation to find a match.
If the Outer relation has 100 tuples and the Inner relation has 1,000, the meticulous database engine must perform 100,000 comparisons. This algorithm is computationally expensive ($O(N \times M)$) and is typically selected only when the Outer relation is tiny or when the Inner relation possesses a high-speed index that allows for instant lookups (replacing the inner scan with an Index Scan).
## Hash Join: The Memory Lookup
For large, unsorted datasets, Postgres utilizes a **[[Operations/HashJoin|Hash Join]]**. This is a two-phased, memory-intensive operation:
1. **The Build**: Postgres takes the smaller relation (the "Inner") and constructs a **Hash Table** in memory. Tuples are organized into buckets based on their join key.
2. **The Probe**: The meticulous database engine then streams the larger relation (the "Outer") past the table. For every incoming tuple, it calculates the hash and performs a near-instant lookup in the matching bucket.
> [!IMPORTANT]
> **Multi-batch Hash Joins**: What happens if the Hash Table is larger than **[[Chapter 5/5.3 - The Private Desk (Work Mem)|work_mem]]**? Instead of failing, Postgres partitions both the Inner and Outer relations into **Batches** on disk. It processes one batch in memory at a time, spilling the rest to temporary files. This ensures your join succeeds, but at the cost of high Disk I/O.
## Merge Join: The Parallel March
If both data streams are pre-sorted by the join key, Postgres deploys the **[[Operations/MergeJoin|Merge Join]]**.
Two pointers move down the sorted lists in parallel. Because the data is ordered, the meticulous database engine only ever looks forward—unless it encounters a "Target Run" of identical keys (a many-to-many join). In that case, it must temporarily **Mark** its position in the second stream, iterate through all matches, and then **Restore** the pointer to the mark for the next outer row. This algorithm is $O(N + M)$ and requires minimal memory compared to a Hash Join.
### The Duel of the Chefs
Let's observe how the planner chooses a Join algorithm based on table scale and statistics.
#### 1. Small Result Set Join (Nested Loop)
When joining a single record to its related entities, the planner selects a Nested Loop. The low number of iterations makes the setup cost of other algorithms unnecessary.
```sql
EXPLAIN SELECT dishes.name, ingredients.name
FROM dishes
JOIN dish_ingredients ON dishes.id = dish_ingredients.dish_id
JOIN ingredients ON dish_ingredients.ingredient_id = ingredients.id
WHERE dishes.id = 5;
-- Result (The Pointer Lookup):
-- Nested Loop Join (cost=0.42..12.31 rows=1 width=64)
-- -> Index Scan using dishes_pkey on dishes d (cost=0.14..8.15 rows=1 width=36)
-- -> Index Scan using dish_ingredients_pkey on dish_ingredients (cost=0.14..2.10 rows=1)
```
> [!TIP]
> **Minimal Iteration Overhead**: The cost is low because the Outer relation (Dishes) is filtered to a single row. The Join is effectively reduced to a few pointer dereferences.
#### 2. The Strategy Shift (Hash Join vs. Nested Loop Join)
When joining a large table like `supply_deliveries`, the planner's choice depends on the availability of indexes and the estimated output volume.
**Plan 1: Unindexed Join (Hash Join)**
Without a specific index on the join key in a large table, Postgres must construct an in-memory Hash Table to avoid a Cartesian product explosion.
```sql
EXPLAIN SELECT ingredients.name, supply_deliveries.delivery_time
FROM ingredients
JOIN supply_deliveries ON ingredients.id = supply_deliveries.ingredient_id
WHERE ingredients.id < 10;
-- Results (The Hash Table Build):
-- Hash Join (cost=4.00..96.70 rows=10 width=15)
-- Hash Cond: (supply_deliveries.ingredient_id = ingredients.id)
-- -> Seq Scan on supply_deliveries (cost=0.00..88.80 rows=1480)
-- -> Hash (cost=3.00..3.00 rows=10)
-- -> Seq Scan on ingredients (cost=0.00..3.00 rows=10)
```
**Plan 2: Indexed Join (Nested Loop)**
Once an index is added (`CREATE INDEX idx_supply_ingredient ON supply_deliveries(ingredient_id);`), the plan shifts. Postgres can now iterate through `ingredients` and perform a fast index lookup for each match in `supply_deliveries`.
```sql
-- Results:
-- Nested Loop Join (cost=0.43..85.00 rows=100 width=64)
-- -> Seq Scan on ingredients (cost=0.00..12.45 rows=10)
-- -> Index Scan using idx_supply_ingredient on supply_deliveries (cost=0.43..6.50 rows=10)
```
#### 3. The Sorted Join (Merge Join)
If both input streams are pre-sorted (typically by an Index Scan on a B-Tree), the planner selects a Merge Join for $O(N+M)$ efficiency.
```sql
SET enable_hashjoin = off;
EXPLAIN SELECT ingredients.id, supply_deliveries.id
FROM ingredients
JOIN supply_deliveries ON ingredients.id = supply_deliveries.ingredient_id
ORDER BY ingredients.id;
-- Result:
-- Merge Join (cost=0.30..150.00 rows=1000 width=8)
-- Merge Cond: (ingredients.id = supply_deliveries.ingredient_id)
-- -> Index Scan using ingredients_pkey on ingredients (...)
-- -> Index Scan using idx_supply_ingredient on supply_deliveries (...)
```
---
| ← Previous | ↑ Table of Contents | Next → |
| :--- | :---: | ---: |
| [[Chapter 3/3.2.1 - Scans\|3.2.1 The Food Runners (Scans)]] | [[Learn You a Postgres for Great Good\|Home]] | [[Chapter 3/3.2.3 - Aggregations\|3.2.3 The Prep Station (Aggregations)]] |