# 3.4 The Performance Lab (Exercises)

Welcome to the **Performance Lab**.
Understanding the Query Planner's menu is one thing, but standing over the stove during a lunch rush is quite another. In this chapter, we step away from the theory and into the heat of optimization. We will take several common "crimes of exertion" and transform them into masterpieces of laziness.
For each challenge, run the **Naive Solution** against your local Elephant Cafe database, observe the fallout in the `EXPLAIN` plan, and then apply the **Lazy Fix**.
---
## Challenge 1: The Case of the Cloaked Date
**The Request**: "Count all deliveries that occurred in the year 2024."
### The Naive Solution
The most common way to filter by year is to "cloak" the column in a date function. This feels intuitive, but it is a performance disaster.
```sql
-- ❌ The Naive Solution
EXPLAIN (ANALYZE, BUFFERS)
SELECT count(*) FROM supply_deliveries
WHERE date_trunc('year', delivery_time) = '2024-01-01';
```
### The Fallout
Because `delivery_time` is wrapped in `date_trunc()`, the engine cannot use the B-Tree index map. It is forced to perform a **Sequential Scan**, reading every single delivery suitcase in the pantry to check its date.
```text
-- Result:
Aggregate (actual time=0.089..0.089 rows=1 loops=1)
Buffers: shared hit=8
-> Seq Scan on supply_deliveries (actual time=0.087..0.087 rows=0 loops=1)
Filter: (date_trunc('year'::text, delivery_time) = '2024-01-01 00:00:00+00'::timestamp with time zone)
Rows Removed by Filter: 1000
Buffers: shared hit=8
Planning Time: 0.165 ms
Execution Time: 0.134 ms
```
### The Lazy Fix
To re-enable the index, we must move the logic to the **Values** side of the equation. We give the elephant a start and end date, allowing it to perform a high-speed range scan.
```sql
-- ✅ The Lazy Fix
EXPLAIN (ANALYZE, BUFFERS)
SELECT count(*) FROM supply_deliveries
WHERE delivery_time >= '2024-01-01' AND delivery_time < '2025-01-01';
```
**The Reward**: The plan shifts to an **Index Only Scan**, fetching the answer from the map in a fraction of the time without ever opening a filing cabinet.
```text
-- Result:
Aggregate (actual time=0.007..0.007 rows=1 loops=1)
Buffers: shared hit=2
-> Index Only Scan using idx_supply_delivery_time on supply_deliveries (actual time=0.005..0.005 rows=0 loops=1)
Index Cond: ((delivery_time >= '2024-01-01 00:00:00+00'::timestamp with time zone) AND (delivery_time < '2025-01-01 00:00:00+00'::timestamp with time zone))
Heap Fetches: 0
Buffers: shared hit=2
Planning Time: 0.052 ms
Execution Time: 0.015 ms
```
---
## Challenge 2: The Missing Foreign Key Index
**The Request**: "Show every dish ordered by animal #42, with the dish name and quantity."
### The Naive Assumption
You have foreign keys, so joins should be fast, right? Foreign keys enforce **integrity**, but they do not create **indexes**. The `order_items` table has a composite primary key `(order_id, dish_id)`, but no standalone index on `dish_id`.
```sql
-- The query that exposes the gap
EXPLAIN (ANALYZE, BUFFERS)
SELECT d.name, oi.quantity
FROM orders o
JOIN order_items oi ON o.id = oi.order_id
JOIN dishes d ON oi.dish_id = d.id
WHERE o.animal_id = 2;
```
### The Fallout
Postgres can use `idx_orders_animal_id` to find animal #42's orders quickly. But when it needs to resolve `oi.dish_id → d.id`, it has no shortcut. The engine must perform a **Sequential Scan** or a **Nested Loop** with repeated heap fetches on `order_items` for every matching order.
```text
-- Result:
Nested Loop (actual time=0.081..0.124 rows=10 loops=1)
Buffers: shared hit=42
-> Nested Loop (actual time=0.076..0.111 rows=10 loops=1)
Buffers: shared hit=22
-> Index Scan using idx_orders_animal_id on orders o (actual time=0.016..0.021 rows=10 loops=1)
Index Cond: (animal_id = 2)
Buffers: shared hit=2
-> Seq Scan on order_items oi (actual time=0.005..0.008 rows=1 loops=10)
Filter: (o.id = order_id)
Rows Removed by Filter: 89
Buffers: shared hit=20
-> Index Scan using dishes_pkey on dishes d (actual time=0.001..0.001 rows=1 loops=10)
Index Cond: (id = oi.dish_id)
Buffers: shared hit=20
Planning Time: 0.145 ms
Execution Time: 0.158 ms
```
### The Lazy Fix
Add the missing index on the foreign key column:
```sql
-- ✅ The Lazy Fix
CREATE INDEX idx_order_items_dish_id ON order_items(dish_id);
```
> [!WARNING]
> **The Foreign Key Index Trap**: Postgres does **not** automatically create indexes on foreign key columns. This is perhaps the most common performance mistake in production databases. Every foreign key that participates in `JOIN` clauses should have an explicit index. Audit your schema with:
> ```sql
> SELECT conrelid::regclass AS table_name,
> conname AS fk_name,
> a.attname AS column_name
> FROM pg_constraint c
> JOIN pg_attribute a ON a.attnum = ANY(c.conkey) AND a.attrelid = c.conrelid
> WHERE contype = 'f'
> AND NOT EXISTS (
> SELECT 1 FROM pg_index i
> WHERE i.indrelid = c.conrelid
> AND a.attnum = ANY(i.indkey)
> );
> ```
After creating the index, the same query identifies the dishes in a fraction of the time:
```sql
-- Re-run after index creation
EXPLAIN (COSTS OFF, ANALYZE, BUFFERS)
SELECT d.name, oi.quantity
FROM orders o
JOIN order_items oi ON o.id = oi.order_id
JOIN dishes d ON oi.dish_id = d.id
WHERE o.animal_id = 2;
```
```text
-- Result:
Nested Loop (actual time=0.065..0.108 rows=10 loops=1)
Buffers: shared hit=42
-> Nested Loop (actual time=0.062..0.088 rows=10 loops=1)
Buffers: shared hit=22
-> Index Scan using idx_orders_animal_id on orders o (actual time=0.012..0.016 rows=10 loops=1)
Index Cond: (animal_id = 2)
Buffers: shared hit=2
-> Index Scan using idx_order_items_dish_id on order_items oi (actual time=0.006..0.006 rows=1 loops=10)
Index Cond: (order_id = o.id)
Buffers: shared hit=20
-> Index Scan using dishes_pkey on dishes d (actual time=0.001..0.001 rows=1 loops=10)
Index Cond: (id = oi.dish_id)
Buffers: shared hit=20
Planning Time: 0.211 ms
Execution Time: 0.134 ms
```
---
## Challenge 3: The Arithmetic Trap
**The Request**: "Show all deliveries that were technically due before New Year's 2024, if we account for a one-day processing buffer."
### The Naive Solution
Math on the column side prevents the elephant from using the "Delivery Time" index. By adding an interval to the column, you force Postgres to calculate a new value for every single row before it can perform the comparison.
```sql
-- ❌ The Naive Solution
EXPLAIN (COSTS OFF, ANALYZE, BUFFERS)
SELECT count(*) FROM supply_deliveries
WHERE delivery_time + interval '1 day' < '2024-01-01';
```
### The Fallout
The engine doesn't know how to perform arithmetic on an index map. It must pull each suitcase, add a day to the date on the tag, and compare the result. This results in a **Sequential Scan**, touching every page in the table.
```text
-- Result:
Aggregate (actual time=0.085..0.086 rows=1 loops=1)
Buffers: shared hit=8
-> Seq Scan on supply_deliveries (actual time=0.084..0.084 rows=0 loops=1)
Filter: ((delivery_time + '1 day'::interval) < '2024-01-01 00:00:00+00'::timestamp with time zone)
Rows Removed by Filter: 1000
Buffers: shared hit=8
Planning Time: 0.127 ms
Execution Time: 0.097 ms
```
### The Lazy Fix
Isolate the column. Move the math to the constant value on the right-hand side. By comparing the raw column to a pre-calculated date, you allow the elephant to use the B-Tree index to skip straight to the relevant records.
```sql
-- ✅ The Lazy Fix
EXPLAIN (COSTS OFF, ANALYZE, BUFFERS)
SELECT count(*) FROM supply_deliveries
WHERE delivery_time < '2024-01-01'::timestamptz - interval '1 day';
```
```text
-- Result:
Aggregate (actual time=0.002..0.002 rows=1 loops=1)
Buffers: shared hit=2
-> Index Only Scan using idx_supply_delivery_time on supply_deliveries (actual time=0.002..0.002 rows=0 loops=1)
Index Cond: (delivery_time < '2023-12-31 00:00:00+00'::timestamp with time zone)
Heap Fetches: 0
Buffers: shared hit=2
Planning Time: 0.044 ms
Execution Time: 0.010 ms
```
## Challenge 4: The Multi-Table Join Cascade
**The Request**: "Count all deliveries for ingredients used in 'Dish 5'."
This is the ultimate sargability test. A single non-sargable predicate at the top of a join chain can cause the entire strategy to reverse. By "hiding" the name inside a function, you prevent the elephant from using it as an entry point.
### The Naive Solution
```sql
-- ❌ The Naive Solution
EXPLAIN (COSTS OFF, ANALYZE, BUFFERS)
SELECT count(*)
FROM dishes d
JOIN dish_ingredients di ON d.id = di.dish_id
JOIN ingredients i ON di.ingredient_id = i.id
JOIN supply_deliveries sd ON i.id = sd.ingredient_id
WHERE lower(d.name) = 'dish 5';
```
### The Fallout
Because `lower(d.name)` cannot be searched in a standard B-Tree index, the elephant loses its entry point. It is forced into a **Bottom-Up** strategy: it scans the mapping table (`dish_ingredients`) first, then joins back to `dishes` to check the name for every single row it found. It is wasteful, backwards, and slow.
```text
-- Result:
Aggregate (actual time=0.203..0.204 rows=1 loops=1)
Buffers: shared hit=135
-> Nested Loop (actual time=0.146..0.200 rows=100 loops=1)
Join Filter: (di.ingredient_id = sd.ingredient_id)
Buffers: shared hit=135
-> Nested Loop (actual time=0.134..0.151 rows=10 loops=1)
Buffers: shared hit=40
-> Nested Loop (actual time=0.131..0.143 rows=10 loops=1)
-- ⚠️ The elephant scans the junction table FIRST
-> Index Only Scan using dish_ingredients_pkey on dish_ingredients di (actual time=0.008..0.014 rows=90 loops=1)
-> Memoize (actual time=0.001..0.001 rows=0 loops=90)
-- ⚠️ Every ID lookup is filtered by the function
-> Index Scan using dishes_pkey on dishes d (actual time=0.013..0.013 rows=0 loops=9)
Index Cond: (id = di.dish_id)
Filter: (lower(name) = 'dish 5'::text)
-> Index Only Scan using idx_supply_ingredient on supply_deliveries sd (actual time=0.002..0.004 rows=10 loops=10)
Planning Time: 0.512 ms
Execution Time: 0.230 ms
```
### The Lazy Fix
A sargable match allows **Predicate Pushdown**. The elephant finds the unique ID for 'Dish 5' instantly using the `dishes_name_key` index and then precisely follows the trail "Top-Down" through the other tables.
```sql
-- ✅ The Lazy Fix
EXPLAIN (COSTS OFF, ANALYZE, BUFFERS)
SELECT count(*)
FROM dishes d
JOIN dish_ingredients di ON d.id = di.dish_id
JOIN ingredients i ON di.ingredient_id = i.id
JOIN supply_deliveries sd ON i.id = sd.ingredient_id
WHERE d.name = 'Dish 5';
```
```text
-- Result:
Aggregate (actual time=0.032..0.032 rows=1 loops=1)
Buffers: shared hit=119
-> Nested Loop (actual time=0.009..0.029 rows=100 loops=1)
Join Filter: (di.ingredient_id = sd.ingredient_id)
Buffers: shared hit=119
-> Nested Loop (actual time=0.007..0.012 rows=10 loops=1)
Buffers: shared hit=24
-> Nested Loop (actual time=0.006..0.007 rows=10 loops=1)
-- ✅ Surgical entry point!
-> Index Scan using dishes_name_key on dishes d (actual time=0.005..0.005 rows=1 loops=1)
Index Cond: (name = 'Dish 5'::text)
-- ✅ Only follows the 1 relevant dish ID
-> Index Only Scan using dish_ingredients_pkey on dish_ingredients di (actual time=0.001..0.002 rows=10 loops=1)
Index Cond: (dish_id = d.id)
-> Index Only Scan using idx_supply_ingredient on supply_deliveries sd (actual time=0.000..0.001 rows=10 loops=10)
Planning Time: 0.114 ms
Execution Time: 0.050 ms
```
**The Lesson**: In a join, the most selective filter is your North Star. If you hide that North Star inside a function, the elephant is forced to wander the map.
---
## Challenge 5: The Multicolumn Index
**The Request**: "Show all deliveries from Supplier #3 that arrived in March 2024."
### The Naive Solution
The schema has separate single-column indexes on `supplier_id` (implicit from the FK) and `delivery_time`. The planner must choose one or attempt a **BitmapAnd** to combine them.
```sql
-- The query that needs a compound shortcut
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM supply_deliveries
WHERE supplier_id = 3
AND delivery_time >= '2024-03-01'
AND delivery_time < '2024-04-01';
```
### The Fallout
Without a multicolumn index, Postgres faces two bad options:
1. Use `idx_supply_delivery_time` to find all March deliveries, then filter by supplier (many false positives).
2. Scan for supplier #3's deliveries, then filter by date (equally wasteful if the supplier has thousands of deliveries).
The planner may attempt a **BitmapAnd** — combining two separate Bitmap Index Scans and intersecting them — but this requires building two bitmaps in memory and performing bitwise AND, which is slower than a single direct lookup.
```text
-- Result:
Index Scan using idx_supply_delivery_time on supply_deliveries (actual time=0.001..0.001 rows=0 loops=1)
Index Cond: ((delivery_time >= '2024-03-01 00:00:00+00'::timestamp with time zone) AND (delivery_time < '2024-04-01 00:00:00+00'::timestamp with time zone))
Filter: (supplier_id = 3)
Buffers: shared hit=2
Planning Time: 0.077 ms
Execution Time: 0.004 ms
```
### The Lazy Fix
A **Multicolumn Index** gives the engine a single, pre-sorted map that answers both predicates simultaneously:
```sql
-- ✅ The Lazy Fix
CREATE INDEX idx_supply_supplier_time
ON supply_deliveries(supplier_id, delivery_time);
```
> [!IMPORTANT]
> **Column Order Matters**: The leftmost column in a multicolumn index acts as the **primary sort key**. An index on `(supplier_id, delivery_time)` is perfect for "all deliveries from supplier X in date range Y" but **useless** for "all deliveries in March regardless of supplier." The rule: put the **equality** column first, the **range** column second.
After creating the index, the same query collapses to a single **Index Scan** with both conditions resolved in one B-Tree traversal:
```sql
-- Re-run after index creation
EXPLAIN (COSTS OFF, ANALYZE, BUFFERS)
SELECT * FROM supply_deliveries
WHERE supplier_id = 3
AND delivery_time >= '2024-03-01'
AND delivery_time < '2024-04-01';
```
```text
-- Result:
Index Scan using idx_supply_supplier_time on supply_deliveries (actual time=0.008..0.008 rows=0 loops=1)
Index Cond: ((supplier_id = 3) AND (delivery_time >= '2024-03-01 00:00:00+00'::timestamp with time zone) AND (delivery_time < '2024-04-01 00:00:00+00'::timestamp with time zone))
Buffers: shared read=2
Planning Time: 0.075 ms
Execution Time: 0.011 ms
```
---
## Challenge 6: The Covering Index (Index-Only Scan)
**The Request**: "For each animal, show their name and how many orders they've placed — as fast as possible."
### The Naive Solution
This is a straightforward aggregation, but the default indexes force the engine to visit the heap for every matching row.
```sql
-- The query
EXPLAIN (ANALYZE, BUFFERS)
SELECT a.name, count(o.id)
FROM animals a
JOIN orders o ON a.id = o.animal_id
GROUP BY a.name;
```
### The Fallout
The `idx_orders_animal_id` index contains only `animal_id`. The engine can use it to find which orders belong to which animal, but it must then visit the **heap** (the actual table pages) to retrieve `o.id` for the count. These heap fetches are random I/O — exactly what the Planner fears most.
```text
-- Result:
HashAggregate (actual time=36.814..37.170 rows=10000 loops=1)
Group Key: a.name
Batches: 1 Memory Usage: 1553kB
Buffers: shared hit=1619
-> Hash Join (actual time=1.763..22.948 rows=200000 loops=1)
Hash Cond: (o.animal_id = a.id)
Buffers: shared hit=1619
-> Seq Scan on orders o (actual time=0.002..5.883 rows=200000 loops=1)
Buffers: shared hit=1471
-> Hash (actual time=1.757..1.757 rows=20000 loops=1)
Buckets: 32768 Batches: 1 Memory Usage: 1194kB
Buffers: shared hit=148
-> Seq Scan on animals a (actual time=0.001..0.748 rows=20000 loops=1)
Buffers: shared hit=148
Planning Time: 0.109 ms
Execution Time: 37.325 ms
```
### The Lazy Fix
A **Covering Index** uses the `INCLUDE` clause to attach extra columns to the index leaf pages without affecting the sort order. This allows an **Index-Only Scan** — the engine never needs to touch the heap at all.
```sql
-- ✅ The Lazy Fix
CREATE INDEX idx_orders_animal_covering
ON orders(animal_id) INCLUDE (id);
```
> [!TIP]
> **`INCLUDE` vs. Adding to the Key**: Columns in `INCLUDE` are stored in the leaf pages but are **not part of the B-Tree sort order**. This keeps the index smaller and faster to maintain than a full composite key `(animal_id, id)`. Use `INCLUDE` when you need the column for output or aggregation but never for filtering or ordering.
After creating the covering index:
```sql
-- Re-run after index creation
EXPLAIN (ANALYZE, BUFFERS)
SELECT a.name, count(o.id)
FROM animals a
JOIN orders o ON a.id = o.animal_id
GROUP BY a.name;
```
The plan should now show **Index Only Scan** with `Heap Fetches: 0` — the ultimate expression of the Lazy Elephant: answering the question without ever opening a single suitcase.
```text
-- Result:
HashAggregate (actual time=56.455..56.888 rows=10000 loops=1)
Group Key: a.name
Batches: 1 Memory Usage: 1553kB
Buffers: shared hit=100249 read=765
-> Hash Join (actual time=1.930..37.644 rows=200000 loops=1)
Hash Cond: (o.animal_id = a.id)
Buffers: shared hit=100249 read=765
-> Index Only Scan using idx_orders_animal_covering on orders o (actual time=0.004..20.290 rows=200000 loops=1)
Heap Fetches: 100040
Buffers: shared hit=100045 read=765
-> Hash (actual time=1.920..1.921 rows=20000 loops=1)
Buckets: 32768 Batches: 1 Memory Usage: 1194kB
Buffers: shared hit=204
-> Index Scan using animals_pkey on animals a (actual time=0.004..1.050 rows=20000 loops=1)
Buffers: shared hit=204
Planning Time: 0.159 ms
Execution Time: 57.065 ms
```
---
| ← Previous | ↑ Table of Contents | Next → |
| :--- | :---: | ---: |
| [[Chapter 3/3.3 - The Art of Not Opening Every Box (Sargability)\|3.3 The Art of Not Opening Every Box (Sargability)]] | [[Learn You a Postgres for Great Good\|Home]] | [[Chapter 3/3.5 - The Masterclass Lab (Advanced Operations)\|3.5 The Masterclass Lab (Advanced Operations)]] |