6.4 - The Sweat (CPU-Bound Workloads)

# 6.4 The Sweat (CPU-Bound Workloads) ![The Sweat](assets/arch_cpu_sweat_v2.png) When you open `pg_stat_activity` to find a bottleneck and see... nothing. There are no `IO` waits, no `Lock` contentions, and no `LWLock` synchronization barriers. The `wait_event` column is entirely **NULL** for your target pid. Yet, the query has been running for minutes. You switch to the server’s OS monitoring (`top` or `htop`), and the reality is clear: **100% CPU Utilization**. Your process is not waiting. It is executing. ## Instruction-Bound Workloads (CPU) When a workload is **CPU-Bound**, the execution plan is simply asking the Postgres engine to perform a massive volume of logical comparisons, arithmetic, or data transformations. Every microsecond of the process's time is spent inside the **Execution Loop**. Common causes of high CPU utilization include: - **Massive Sorts**: Asking the engine to sort 10 million unindexed rows. Each row must be compared against others using the system's sorting algorithm (`quicksort` or `external merge`). String comparisons are particularly expensive. - **Nested Loop Joins**: Without an appropriate index, the planner may fall back to a Nested Loop. If Table A has 1,000 rows and Table B has 1,000, the engine must execute its inner loop 1,000,000 times. - **Expression Evaluation**: Computing millions of regex matches (`~`), executing complex PostGIS spatial functions, or deeply parsing large `JSONB` documents. In this state, the engine is delivering exactly what was requested; the bottleneck is the sheer volume of instructions required to fulfill the plan. ## The Cost of Parallelism Modern Postgres can alleviate CPU bottlenecks by using **Parallel Query**. Instead of a single process handling the entire task, the "Leader" process spawns multiple "Background Workers." However, parallelism is not free. It introduces new CPU costs: 1. **Worker Spawning**: The overhead of the OS creating new processes. 2. **DSM Mapping**: Workers must map a **Dynamic Shared Memory (DSM)** segment to communicate with the Leader. 3. **Tuple Serialization**: Results must be passed back from workers to the Leader, which can become a new bottleneck if the result set is large. If you see 100% CPU across many cores, the engine is effectively parallelizing the load. If you see 100% on a single core while others are idle, the query is likely executing a non-parallelizable plan node. ## Tuning for CPU Efficiency When your workload is CPU-bound, typical hardware upgrades (like faster disks) will not help. You have three primary levers: 1. **Eliminate the Work (Indexing)**: The most efficient instruction is the one never executed. A B-Tree index allows the engine to skip the sorting loop entirely because the data is already stored in order. 2. **Change the Algorithm (Query Rewrite)**: Can you provide a join condition that allows a **Hash Join** instead of a Nested Loop? 3. **Scale the Hardware**: If you truly need to process millions of complex expressions per second, you simply need more cores and higher clock speeds. The sign of high-level Postgres mastery is looking at a 100% CPU spike and knowing immediately whether it represents an analytics engine doing necessary, heavy lifting, or a missing index forcing a process to iterate through the same dictionary millions of times. --- | [[Chapter 6/6.3.2 - Microscopic Traffic (Latches & LockManager)\|6.3.2 Microscopic Traffic (Latches & LockManager)]] | [[Learn You a Postgres for Great Good\|Home]] | [[Chapter 6/6.5 - The Kitchen Chaos (Concurrency & Performance Reasoning)\|6.5 The Kitchen Chaos (Concurrency & Performance Reasoning)]] |