# 3.2.1 Scans

Before any processing can occur, Postgres must fetch the raw data from physical storage. In the Query Algebra, this is the responsibility of the **Scan Nodes**.
Scan Nodes are the only operators in the pipeline that directly interact with the **[[Chapter 1/1.3 - The Shipping Container (The Page)|Pages]]** stored on disk or in the **[[Chapter 5/5.2 - The Warming Rack (Shared Buffers)|Shared Buffers]]**. Their primary objective is to locate the correct tuples and stream them upward into the plan tree.
## The Universal Scan Interface (`ExecScan`)
Despite the vast difference between walking a table heap and traversing a B-Tree, every Scan Node in Postgres shares a common internal blueprint. This is handled by the **`ExecScan`** logic.
Think of it as a standardized conveyor belt. No matter how the scan node finds a suitcase—whether by checking every shelf or using a GPS shortcut—it delivers that suitcase to the node above it using the exact same movement. This abstraction allows the engine to handle filtering (`Filter`) and projection (`Projection`) generically, regardless of the underlying access method.
> [!NOTE]
> **The Table Access Method (Table AM)**: In modern Postgres (v12+), the storage itself is "pluggable" via the **Table AM API**. Frames as the **Pantry Blueprint**, this set of C-structs (like `TableAmRoutine`) defines exactly how to `scan_begin`, `tuple_getnext`, and `scan_end`. This abstraction is why Postgres can support standard Heaps, Columnar storage, or exotic zheap formats without changing how the Query Algebra operates.
## Sequential Scan: Linear Page Access
The **[[Operations/SeqScan|Sequential Scan]]** is the base case of data retrieval. Postgres performs a linear walk of the entire table heap, read-ahead buffer-by-buffer, to ensure 100% visibility of all qualifying tuples.
> [!TIP]
> **Synchronized Sequential Scans**: If multiple queries are performing a sequential scan on the same large table, Postgres avoids redundant I/O. The second query "hitches a ride" with the first. It joins the scan at the current page, and once it reaches the end of the table, it loops back to the beginning to pick up the pages it missed. No I/O is wasted.
## Index Scan: Non-Linear Point Access
The **[[Operations/IndexScan|Index Scan]]** utilizes the maps created in Chapter 2 to bypass the linear scan. Postgres traverses the B-Tree to find the exact coordinates of the required tuples and performs a direct, non-linear fetch from the table heap.
## Bitmap Scan: The Reservation Map
What if a single Index Scan isn't enough? Suppose the server needs to fetch 50 orders for "Saffron Salmon" scattered across the dining room.
An **[[Operations/IndexScan|Index Scan]]** would require running to the kitchen for Table 12, then Table 4, then Table 22—50 individual laps (Random I/O). A **[[Operations/SeqScan|Sequential Scan]]** would require walking past every table in the restaurant just to find those 50.
The **Bitmap Scan** is the "Smart Route":
1. **[[Operations/BitmapIndexScan|Bitmap Index Scan]] (The Mapper)**: Postgres first scans the index and builds a **Bitmap**—a specialized memory map where each "bit" represents a page (a table) that contains a matching record. Think of this as the Matre D' marking a floor plan with a highlighter.
2. **[[Operations/BitmapAndBitmapOr|Bitmap And / Or]] (The Combiner)**: If the query has multiple filters (e.g., `Species: Capybara` AND `Diet: Herbivore`), Postgres can combine multiple maps instantly using bitwise logic.
3. **[[Operations/BitmapHeapScan|Bitmap Heap Scan]] (The Runner)**: The meticulous database engine then visits the highlighted pages in their physical order on the disk. By fetching the containers sequentially, it avoids back-and-forth movement and maximizes efficiency.
### EXPLAIN: The Smart Lap
When you see a Bitmap Scan in your execution plan, you are seeing Postgres transition from "Point Searching" to "Bulk Retrieval."
> [!NOTE]
> **The Small Table Paradox**: For the small tables in our Elephant Cafe, the planner almost always prefers a **Sequential Scan**. The overhead of opening the Index map is often greater than just glancing at the few pages of the whole table. To see these specialized plans, we must occasionally "force" the meticulous database engine to use the map (`SET enable_seqscan = off`).
```sql
-- Searching for all animals in two specific species
-- (Forced Index Scan for demonstration)
SET enable_seqscan = off;
EXPLAIN SELECT * FROM animals WHERE species_id = 1 OR species_id = 5;
-- Result:
-- Bitmap Heap Scan on animals (cost=8.39..19.08 rows=11 width=48)
-- Recheck Cond: ((species_id = 1) OR (species_id = 5))
-- -> BitmapOr (cost=8.39..8.39 rows=11 width=0)
-- -> Bitmap Index Scan on idx_animals_species_id (cost=0.00..4.19 rows=5 width=0)
-- Index Cond: (species_id = 1)
-- -> Bitmap Index Scan on idx_animals_species_id (cost=0.00..4.19 rows=5 width=0)
-- Index Cond: (species_id = 5)
```
> [!NOTE]
> **The Recheck Condition**: Notice the **`Recheck Cond`** in the plan. Because Bitmaps are stored in **`work_mem`**, they can sometimes become "Lossy" if the target set is too large to fit in memory. When this happens, the bitmap marks whole pages instead of individual rows. Postgres must "Recheck" the condition for every row on that page to make sure it actually matches your query.
### The High-Stakes Dinner Rush
To understand the Planner's decisions, we must look at the **Estimated Cost**. The planner assigns a "sweat value" to every operation.
**The Sequential Fetch (The Default Walk):**
```sql
EXPLAIN SELECT * FROM ingredients WHERE category = 'Herb';
-- Result:
-- Seq Scan on ingredients (cost=0.00..1.40 rows=8 width=54)
-- Filter: (category = 'Herb'::ingredient_category)
```
**The Index Fetch (The Point Search):**
```sql
-- Forced Index Scan
SET enable_seqscan = off;
EXPLAIN SELECT * FROM ingredients WHERE id = 12;
-- Result:
-- Index Scan using ingredients_pkey on ingredients (cost=0.15..8.17 rows=1 width=54)
-- Index Cond: (id = 12)
```
**State 3: Index-Only Scan (Heap Avoidance)**
If the index map already contains all the data requested, Postgres avoids visiting the table heap entirely.
```sql
-- Forced Index Only Scan
SET enable_seqscan = off;
SET enable_bitmapscan = off;
EXPLAIN SELECT id FROM ingredients WHERE id < 100;
-- Result:
-- Index Only Scan using ingredients_pkey on ingredients (cost=0.12..8.14 rows=1 width=4)
-- Index Cond: (id < 100)
```
> [!IMPORTANT]
> **The Visibility Map (VM)**: An Index-Only Scan is not a guarantee. Because indexes do not store **MVCC (Visibility)** information, the meticulous database engine must consult the **Visibility Map**. If the VM confirms the page is "clean," the fetch is successful. Otherwise, it must visit the heap.
Notice that for our tiny `ingredients` table, the **Seq Scan (cost=1.40)** is actually "cheaper" than the **Index Scan (cost=8.17)**! Postgres is smart—it knows that for a small Cafe, it doesn't need a map to find the saffron.
---
| ← Previous | ↑ Table of Contents | Next → |
| :--- | :---: | ---: |
| [[Chapter 3/3.2 - The Assembly Line (Query Algebra)\|3.2 The Assembly Line (Query Algebra)]] | [[Learn You a Postgres for Great Good\|Home]] | [[Chapter 3/3.2.2 - Joins\|3.2.2 The Matchmakers (Joins)]] |