# Chapter 7 Technical Audit: Distributed Architecture
**Status**: Solid baseline, but missing a few critical mechanical linkages and advanced Postgres features.
While the structural flow of Chapter 7 is good, the technical framing can be hardened. A reader coming away from this chapter should not just understand *what* these distributed tools are, but exactly *how* they fail under load.
Here are the primary technical gaps and opportunities to improve the "Feynman Principle" execution:
## 1. The ProcArray Contention (Chapter 7.4)
- **Current Framing**: States that connecting/disconnecting requires a lock on the `ProcArray`.
- **The Missing Link**: You need to tie this back to Chapter 1's MVCC. *Every single query* requires a snapshot to determine which tuples are visible. To build that snapshot, the query must read the `ProcArray` to see what other transactions are active. Therefore, high connection churn (which exclusively locks the `ProcArray` to add/remove PIDs) doesn't just hurt new connections—it paralyzes *every active query* in the system trying to build an MVCC snapshot.
## 2. Archiving vs. Dropping Partitions (Chapter 7.3)
- **Current Framing**: Mentions that range partitioning allows for instant deletion of old data by using `DROP TABLE`.
- **The Missing Link**: `DROP TABLE` is destructive. The true superpower of Postgres partitioning is `DETACH PARTITION CONCURRENTLY`. This removes the table from the routing logic without acquiring a heavy lock on the parent, allowing you to quietly archive the detached table to cold storage (e.g., `pg_dump`) without interrupting live inserts. This is the professional lifecycle management pattern.
## 3. The Unexplained LSN Metrics (Chapter 7.1)
- **Current Framing**: The SQL query in 7.1 selects `sent_lsn`, `write_lsn`, `flush_lsn`, and `replay_lsn`. However, the text only defines `sent_lsn` and `replay_lsn`.
- **The Missing Link**: The gap between these four states is the literal definition of the **Durability Gradient** introduced in 7.1.1.
- `sent`: Primary put it on the wire.
- `write`: Replica OS received it in RAM. (Corresponds to `remote_write`)
- `flush`: Replica OS bolted it to disk via fsync. (Corresponds to `on`)
- `replay`: Replica applied it to the page. (Corresponds to `remote_apply`)
These should be explicitly defined and linked to the `synchronous_commit` settings in the next subchapter.
## 4. The HA Data-Loss Trap (Chapter 7.5)
- **Current Framing**: Explains how Patroni promotes the replica with the highest LSN during a failover.
- **The Missing Link**: There is no explicit warning connecting Failover to Asynchronous Replication. If a cluster uses `synchronous_commit = off` or `local` (the defaults), and the primary dies violently (e.g., hardware failure), the replicas *will not* have the last few milliseconds of data. Promoting a replica in an async cluster guarantees data loss. This is the critical tradeoff between 7.1.1 and 7.5 that must be stated.
## Recommendation
I recommend editing `7.1`, `7.3`, `7.4`, and `7.5` to inject these specific mechanical truths. This will elevate the chapter from a "conceptual overview" to a "production survival guide."