# 3.2.2 The Sous Chefs (Joins) ![The Sous Chefs](assets/arch_sous_chefs_joins.png) Once the raw ingredients have been fetched from the pantry, they must be **Joined**. In the Query Algebra, this is the high-energy work of the **Sous Chefs**. Their job is to take two distinct bowls of ingredients (result sets) and find the precise way to **join** them into a single, complex flavor. Depending on the size and "prepped" state of the ingredients, the Head Chef chooses a different **Join** technique. ## Nested Loop Join: The Manual Hand-Off The **[[Operations/NestedLoop|Nested Loop Join]]** is the simplest way to join ingredients. Imagine a Sous Chef holding one ingredient from Bowl A (the Outer relation). For _every single item_ they hold, they must search through the entire Bowl B (the Inner relation) to find its partner. If Bowl A has 100 items and Bowl B has 100 items, the Sous Chef performs the **join** 10,000 times! It’s like trying to pair every single sock in the laundry by hand—effective, but slow. The Head Chef only orders this when one bowl is tiny. ## Hash Join: The Prep Wall Join When joining large amounts of data, the Head Chef calls for a **[[Operations/HashJoin|Hash Join]]**. This is a two-step assembly process. First, a Sous Chef takes Bowl A and builds a **Mise en Place Wall** (the Hash Table), organizing ingredients into numbered buckets based on their matching ID. Once the wall is ready, the Sous Chef simply walks through Bowl B. For each ingredient, they check the ID, reach into the corresponding bucket on the wall, and instantly **join** it to its match. It’s incredibly fast, provided the kitchen counter (`work_mem`) is large enough to hold the wall! ## Merge Join: The Parallel Prep Join If both bowls are already perfectly sorted (prepped) by their IDs, the Head Chef deploys the elegant **[[Operations/MergeJoin|Merge Join]]**. Two Sous Chefs stand side-by-side with their sorted bowls and walk down their lists in sync. Since the ingredients are already in order, they don't have to search. They just "high-five" whenever their IDs match and **join** the ingredients into a new bowl. It requires almost no counter space, but it demands the high upfront cost of sorting the bowls first! ### The Duel of the Chefs Let's see how the Head Chef chooses his **Join** technique based on the scale of the Cafe's data. #### 1. The Small Batch Join (Nested Loop Join) When joining a single dish to its ingredients, the items are few. The engine just does a quick manual hand-off. ```sql EXPLAIN SELECT dishes.name, ingredients.name FROM dishes JOIN dish_ingredients ON dishes.id = dish_ingredients.dish_id JOIN ingredients ON dish_ingredients.ingredient_id = ingredients.id WHERE dishes.id = 5; -- Result: -- Nested Loop Join (cost=0.15..12.31 rows=1 width=64) -- -> Index Scan using dishes_pkey on dishes d (cost=0.14..8.15 rows=1 width=36) -- ... ``` #### 2. The Strategy Shift (Hash Join vs. Nested Loop Join) When we join all `ingredients` to their `supply_deliveries` to calculate total stock, the Head Chef has a choice to make depending on our prep work. **State 1: Joining with a Wall (Hash Join)** Without a specific index on the `ingredient_id` in a massive `supply_deliveries` table, the engine must build a Hash Wall to make the join feasible. ```sql EXPLAIN SELECT ingredients.name, supply_deliveries.delivery_time FROM ingredients JOIN supply_deliveries ON ingredients.id = supply_deliveries.ingredient_id WHERE ingredients.id < 10; -- Results: -- Hash Join (cost=4.00..96.70 rows=740 width=15) -- Hash Cond: (supply_deliveries.ingredient_id = ingredients.id) -- -> Seq Scan on supply_deliveries (cost=0.00..88.80 rows=1480 width=12) -- -> Hash (cost=3.00..3.00 rows=80 width=11) -- -> Seq Scan on ingredients (cost=0.00..3.00 rows=80 width=11) -- Filter: (id < 10) ``` **State 2: The Fast Hand-off Join (Nested Loop Join)** Once we add an index (`CREATE INDEX idx_supply_ingredient ON supply_deliveries(ingredient_id);`), the plan shifts. The engine can now use a Nested Loop Join because for every ingredient, it can instantly "fetch" the matching deliveries! ```sql -- Results: -- Nested Loop Join (cost=0.43..85.00 rows=100 width=64) -- -> Seq Scan on ingredients (cost=0.00..12.45 rows=10) -- -> Index Scan using idx_supply_ingredient on supply_deliveries (cost=0.43..6.50 rows=10) ``` #### 3. The Sorted Join (Merge Join) If both lists are already prepped (sorted) by ID, the Chefs perform a Merge Join. ```sql SET enable_hashjoin = off; EXPLAIN SELECT ingredients.id, supply_deliveries.id FROM ingredients JOIN supply_deliveries ON ingredients.id = supply_deliveries.ingredient_id ORDER BY ingredients.id; -- Result: -- Merge Join (cost=0.30..150.00 rows=1000 width=8) -- Merge Cond: (ingredients.id = supply_deliveries.ingredient_id) -- -> Index Scan using ingredients_pkey on ingredients (...) -- -> Index Scan using idx_supply_ingredient on supply_deliveries (...) ``` --- [[Chapter 3/3.2.1 - The Pantry Fetchers (Scans)|← 3.2.1 - The Pantry Fetchers (Scans)]] | [[Chapter 3/3.2 - Query Algebra|↑ 3.2 - Query Algebra]] | [[Chapter 3/3.2.3 - The Prep Station (Aggregations)|3.2.3 - The Prep Station (Aggregations) →]]