6.2 - The Deep Pantry (Disk Wait)

# 6.2 The Deep Pantry (Disk Wait) ![The Deep Pantry](assets/arch_wait_pantry.png) This is likely the most common wait event you will encounter in any production system. It is **The Sigh of I/O**—the literal pause required to bridge the latency gap between memory and storage. ## The Block Latency Gap As we saw in the **[[Chapter 5/5.6 - The Physical Machine (RAM, CPU & Disk)|Hardware Era Reference]]**, data not found in the **[[Chapter 5/5.2 - The Warming Rack (Shared Buffers)|Shared Buffer Pool]]** must be fetched from a block device. When a process initiates a read, the CPU is set aside and the process is placed in a sleep queue until the drive controller signals that the data is ready. In terms of modern processor cycles, even a fast 100 µs NVMe read is an eternity — thousands of potential instructions lost to the speed of physics. > [!IMPORTANT] > **Measuring the Gap: `track_io_timing`** > By default, Postgres does not time every single I/O operation because the overhead of calling the system clock can be expensive on some hardware. To see I/O time in your execution plans, you must enable this parameter: > ```sql > SET track_io_timing = on; > ``` > Without this, `EXPLAIN ANALYZE` will show you *that* I/O happened, but it will not show you how many milliseconds were lost to it. ## Random vs. Sequential I/O Patterns Postgres experiences disk access in two primary patterns: 1. **Sequential (Adjacent Containers)**: The process reads adjacent blocks in a single predictable stream. The OS kernel often detects this and pre-fetches the next blocks into the **Linux Page Cache** before Postgres even asks for them. Sequential I/O is the "happy path" of database storage. 2. **Random (Scattered Containers)**: The process jumps between disparate blocks (often during an Index Scan). This is far more taxing on the hardware and the kernel, as pre-fetching is often impossible. This is the #1 consumer of wait time in aging or unoptimized systems. ## How to Spot It In `pg_stat_activity`, you will see these events labeled: - **[[Workloads/IO/DataFileRead|IO:DataFileRead]]**: The backend process is waiting for a storage device to fulfill a block read request. - **[[Workloads/IO/DataFileWrite|IO:DataFileWrite]]**: The backend process is waiting to flush a modified page to storage. - **[[Workloads/IO/WALWrite|IO:WALWrite]]**: The backend process is waiting for the kernel to confirm a WAL write is durable (`fsync`). ## Reading the Evidence: EXPLAIN (ANALYZE, BUFFERS) The most honest line in any query plan is the **Buffers** report. It tells you exactly how many trips to the Deep Pantry were made. ```sql -- Run EXPLAIN with BUFFERS to see the I/O story EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM orders WHERE animal_id = 42; ``` The `Buffers` line in the output is the most actionable part of any query plan: ```text Index Scan using idx_orders_animal_id on orders (cost=0.14..12.50 rows=5 width=24) (actual time=0.182..0.195 rows=5 loops=1) Index Cond: (animal_id = 42) Buffers: shared hit=4 read=2 Planning: Buffers: shared hit=88 Planning Time: 0.2 ms Execution Time: 0.3 ms ``` - **`shared hit=4`**: The page was found in the **[[Chapter 5/5.2 - The Warming Rack (Shared Buffers)|Shared Buffer Pool]]** (RAM). No physical I/O was required. - **`shared read=2`**: The page was not in memory and required a physical block read from storage. - **`shared dirtied=1`**: The process modified a page in memory. It is now "dirty" and must be eventually flushed by the **[[Chapter 5/5.4 - The Housekeepers (Vacuum & Freezing)|Background Writer]]**. - **`shared written=1`**: The process was forced to physically write the page to storage itself (often because of a Buffer Eviction). This is slow. A query with all `shared hit` and zero `shared read` is a query living entirely in memory—the ultimate performance goal. A query with a high `shared read` count is one bottlenecked by the latency of your storage hardware. ## Resolution Paths If your database is I/O-bound, there are two levers: 1. **Faster storage hardware**: NVMe SSDs have roughly 100× lower latency than HDDs. Reducing raw read latency directly reduces `DataFileRead` wait times. 2. **Increase `shared_buffers`**: The more of your active working set fits in the buffer pool, the fewer reads reach the storage hardware at all. ```sql -- How large is the current buffer pool? SHOW shared_buffers; -- See which tables are currently warm in the buffer pool -- (requires pg_buffercache extension) SELECT relname, count(*) AS buffers_in_cache FROM pg_buffercache b JOIN pg_class c ON b.relfilenode = c.relfilenode GROUP BY relname ORDER BY buffers_in_cache DESC LIMIT 10; ``` The more of your active data set fits in your buffer pool, the fewer requests reach the storage hardware—and the faster your queries become. --- | ← Previous | ↑ Table of Contents | Next → | | :--- | :---: | ---: | | [[Chapter 6/6.1.3 - The Background Rhythm (Activity & Extensions)\|6.1.3 Background Rhythm & Activity]] | [[Learn You a Postgres for Great Good\|Home]] | [[Chapter 6/6.2.1 - The Pocket Diary (WAL IO)\|6.2.1 The Pocket Diary (WAL IO)]] |