# 5.5 The Great Overflow (Tuple bloat)

When the staff is updating and deleting records faster than the housekeepers can clean, the database enters a state of **Bloat**. This is a physical crisis: even though your "live" data remains the same size, the Filing Cabinet (the data file on disk) begins to swell with "shredded paper"—uncollected dead tuples that have not yet been recycled.
## The Mechanic of Table Bloat
In **[[Chapter 1/1.5 - The Sharpie Ledger (MVCC)|MVCC]]**, deletes and updates leave "ghost" records behind.
Under normal circumstances, **Autovacuum** sweeps these records and marks the space as "Free" in the **Free Space Map (FSM)**. Future inserts will find these empty holes and fill them. The file size remains stable.
However, if a transaction stays open for too long (a **Long-Running Transaction**), Autovacuum cannot clean the rows that were deleted *after* that transaction started. Postgres must keep those dead rows around just in case the long-running guest needs to see them.
The scraps pile up. If new data arrives while the floor is covered in un-sweepable scraps, the database engine has no choice but to add **New Pages** to the end of the file. The database is now physically larger than it needs to be.
## The Librarian's Dilemma: Index Bloat
While `VACUUM` can eventually clean the heap, it struggles with the **[[Chapter 2/2.1 - The Balanced Bookshelf (The B-Tree)|Index (The Librarian's Map)]]**.
When a row is deleted, the pointer in the B-Tree doesn't just evaporate. The page in the index might become 90% "empty," but the B-Tree structure remains. Because an index is a sorted tree, the Librarian cannot easily "merge" partially empty pages without a massive, expensive re-shuffling of the entire bookshelf.
This is **Index Bloat**. A bloated index makes every search slower because the Runner has to traverse more levels of the tree and read more pages just to find the same amount of data.
> [!WARNING]
> **The Vacuum Trap**: Standard `VACUUM` does **not** return disk space to the Operating System. It only makes the space available for future Postgres data. To truly shrink a 100GB file back down to 10GB, you must perform a structural reclamation.
## Reclaiming the Space (The Nuclear Options)
If the bloat is so catastrophic that it threatens to fill the entire disk, you have three mechanical choices:
1. **`REINDEX`**: This throws away a bloated index and builds a fresh, compact one from scratch. It is the best way to fix "The Librarian's Dilemma."
2. **`VACUUM FULL`**: The ultimate structural repair. Postgres creates a brand-new, perfectly compact data file and moves every live suitcase into it. This requires a **[[Chapter 6/6.3.1 - The Private Booth (Heavyweight Locks)|AccessExclusiveLock]]**—the Cafe is closed to all patrons until the move is finished.
3. **Internal `pg_repack`**: A community tool that performs a `VACUUM FULL` in the background without locking the table for long periods. It is the "Professional Mover" of the Postgres world.
## The Diagnostics: Checking the Bins
To find out if your housekeepers are falling behind, you must look at the **Waste Audit**:
```sql
-- How much shredded paper is on the floor?
SELECT relname, n_live_tup, n_dead_tup, last_vacuum, last_autovacuum
FROM pg_stat_user_tables
WHERE relname = 'supply_deliveries';
```
If **`n_dead_tup`** (dead tuples) is significantly higher than **`n_live_tup`**, Postgres is drowning in its own trash!
## The Solution: Tuning the Cleaners
The database engine is very conservative. By default, it only starts cleaning when 20% of a table is dead. For a table with a billion rows, that’s 200 million dead rows before the vacuum even turns on!
To maintain a healthy database, you must adjust the **Thresholds of Cleanliness**:
```sql
-- Telling the Housekeeper to be more aggressive on this table
ALTER TABLE supply_deliveries SET (
autovacuum_vacuum_scale_factor = 0.01,
autovacuum_vacuum_cost_limit = 1000
);
```
By lowering the `scale_factor` to 1% (0.01), we ensure the housekeeper starts sweeping much earlier. By increasing the `cost_limit`, we give them a bigger broom!
## The Nuclear Option (VACUUM FULL)
If the bloat is so bad that you absolutely *must* reclaim the disk space immediately, you can order a **`VACUUM FULL`**.
Be warned: this is the **Nuclear Option**. Postgres will lock the entire table heap, move every single suitcase into a brand-new cabinet, and throw the old cabinet in the river. No one can order tea, no one can pay their bill, and the Cafe is effectively closed for the duration of the move. Great engineers use `VACUUM FULL` only in the direst of emergencies!
It’s more expensive and requires more "staging area" (disk space), but it ensures the Cafe stays open while the trash is removed!
## Summary: Postgres's Menu (The Tuning Matrix)
Tuning the database engine's appetite is about balancing speed and safety. Use this matrix to guide your next configuration:
| Setting | Role | The "Lazy" Wisdom |
| :------------------------- | :---------------------- | :-------------------------------------------------------- |
| `shared_buffers` | The Warming Rack | Set to ~25% of RAM. Leave the rest for the Building Manager (OS). |
| `work_mem` | The Private Desk | Set low (4-16MB) by default. Set high for specific messy sessions. |
| `maintenance_work_mem` | The Housekeeping Budget | Set high (256MB-1GB) so the Vacuum Crew has a bigger broom. |
| `max_wal_size` | The Pocket Size | Larger keeps the checkpointer from panicking; smaller saves disk. |
| `autovacuum_scale_factor` | The Mess Threshold | Lower it for busy tables so the cleaners start earlier. |
| `vacuum_cost_limit` | The Cleaning Speed | Increase it to let the cleaners work faster at the cost of I/O. |
By mastering the Warming Rack and the Housekeeper's Budget, you ensure the Elephant Cafe is lean, fast, and always ready for the next lunch rush.
---
Now that Postgres has a full stomach and a clean engine, it needs to worry about the **Crowds**. In the next chapter, we'll explore **The Waiting Game (Workloads & Locking)**, where we see what happens when too many elephants try to use the same suitcase at once!
---
| ← Previous | ↑ Table of Contents | Next → |
| :--- | :---: | ---: |
| [[Chapter 5/5.4 - The Housekeepers (Vacuum & Freezing)\|5.4 The Housekeepers (Vacuum & Freezing)]] | [[Learn You a Postgres for Great Good\|Home]] | [[Chapter 5/5.6 - The Physical Machine (RAM, CPU & Disk)\|5.6 The Physical Machine (RAM, CPU & Disk)]] |