# 6.4 The Sweat (CPU-Bound Workloads)

When you open `pg_stat_activity` to find a bottleneck and see... nothing.
There are no `IO` waits, no `Lock` contentions, and no `LWLock` synchronization barriers. The `wait_event` column is entirely **NULL** for your target pid. Yet, the query has been running for minutes.
You switch to the server’s OS monitoring (`top` or `htop`), and the reality is clear: **100% CPU Utilization**.
Your process is not waiting. It is executing.
## Instruction-Bound Workloads (CPU)
When a workload is **CPU-Bound**, the execution plan is simply asking the Postgres engine to perform a massive volume of logical comparisons, arithmetic, or data transformations. Every microsecond of the process's time is spent inside the **Execution Loop**.
Common causes of high CPU utilization include:
- **Massive Sorts**: Asking the engine to sort 10 million unindexed rows. Each row must be compared against others using the system's sorting algorithm (`quicksort` or `external merge`). String comparisons are particularly expensive.
- **Nested Loop Joins**: Without an appropriate index, the planner may fall back to a Nested Loop. If Table A has 1,000 rows and Table B has 1,000, the engine must execute its inner loop 1,000,000 times.
- **Expression Evaluation**: Computing millions of regex matches (`~`), executing complex PostGIS spatial functions, or deeply parsing large `JSONB` documents.
In this state, the engine is delivering exactly what was requested; the bottleneck is the sheer volume of instructions required to fulfill the plan.
## The Cost of Parallelism
Modern Postgres can alleviate CPU bottlenecks by using **Parallel Query**. Instead of a single process handling the entire task, the "Leader" process spawns multiple "Background Workers."
However, parallelism is not free. It introduces new CPU costs:
1. **Worker Spawning**: The overhead of the OS creating new processes.
2. **DSM Mapping**: Workers must map a **Dynamic Shared Memory (DSM)** segment to communicate with the Leader.
3. **Tuple Serialization**: Results must be passed back from workers to the Leader, which can become a new bottleneck if the result set is large.
If you see 100% CPU across many cores, the engine is effectively parallelizing the load. If you see 100% on a single core while others are idle, the query is likely executing a non-parallelizable plan node.
## Tuning for CPU Efficiency
When your workload is CPU-bound, typical hardware upgrades (like faster disks) will not help. You have three primary levers:
1. **Eliminate the Work (Indexing)**: The most efficient instruction is the one never executed. A B-Tree index allows the engine to skip the sorting loop entirely because the data is already stored in order.
2. **Change the Algorithm (Query Rewrite)**: Can you provide a join condition that allows a **Hash Join** instead of a Nested Loop?
3. **Scale the Hardware**: If you truly need to process millions of complex expressions per second, you simply need more cores and higher clock speeds.
The sign of high-level Postgres mastery is looking at a 100% CPU spike and knowing immediately whether it represents an analytics engine doing necessary, heavy lifting, or a missing index forcing a process to iterate through the same dictionary millions of times.
---
| [[Chapter 6/6.3.2 - Microscopic Traffic (Latches & LockManager)\|6.3.2 Microscopic Traffic (Latches & LockManager)]] | [[Learn You a Postgres for Great Good\|Home]] | [[Chapter 6/6.5 - The Kitchen Chaos (Concurrency & Performance Reasoning)\|6.5 The Kitchen Chaos (Concurrency & Performance Reasoning)]] |