7.3 - Table Partitioning

# 7.3 Splitting the Table (Partitioning) ![Splitting the Table](assets/arch_partitioning_diverse.png) This is **Write Contention**. The solution is to split that one massive table into many smaller, manageable pieces. We call this **Partitioning**. > [!TIP] > **The Rule of Thumb**: Partitioning is for **"Management"** (easy deletes, skipping big chunks, and cleaning up vacuum bloat), not for **"Search Speed."** A standard B-Tree index is already incredibly fast; partitioning only helps if you can "Prune" or if the table is so big it’s breaking the physics of the disk. > [!IMPORTANT] > **Partitioning vs. Sharding**: > - **Partitioning** is an **Internal** split. You still have one elephant and one database, but the giant cabinet has been replaced by a row of smaller drawers. > - **Sharding** is an **External** split. You have many different elephants in many different cafes, each holding a piece of the menu. ## The Mini-Table Strategy Instead of one "Orders" table, the elephant creates a "Parent" table that acts as a signpost, and dozens of "Child" tables (Partitions). Now, elephants working on 2023 data never even see the elephants working on 2024 data. The contention disappears, and the engine feels empty again. ### Partition Pruning (The Label Check) The best part of the drawer strategy is **Partition Pruning**. If you search for "Orders from June 2024," the elephant looks at the labels on the drawers. He sees the "2023" drawer and doesn't even touch the handle. He skips it entirely! In your **[[Operations/_Postgres Operations|Service Receipt]]**, you will see this as **"Subplans Removed"**. The elephant only walks to the drawers he needs, saving immense energy and time. ## Why It’s Difficult to Do Well If partitioning was easy, everyone would do it for every table. But it is a double-edged sword (and a very sharp one at that). ### 1. The Partition Key Dilemma You must choose a **Partition Key**—the rule for how data is split. - If you split by **Date**, but your users always search by **Customer ID**, the Maitre D' (Query Planner) has to send a staff member to _every single partition_ to find the data. This is called **Fan-out**, and it's devastating for performance. Imagine a Library where books are organized by **Year**. If a reader asks for "The Great Gatsby" (searching by Title), the Librarian has to ask 100 different staff members to check their 100 different rooms. The **Overhead** of asking 100 people is 100x the bureaucracy of asking one person to check one room. This is the **Fan-out Penalty**. - If you split by **Hash** (randomly spreading rows), you solve the write contention perfectly, but you can no longer perform efficient "Range Queries" (e.g., "Show me everything from June"). ### 2. The Maintenance Burden Partitions are not magic; they are real tables. You have to create them, index them, and manage them. If you forget to create the "2025" partition before January 1st, the elephant will suddenly have nowhere to put the data and will start yelling errors. It’s a very rude way to start the new year! ### 3. The Cross-Partition Join Joining two large partitioned tables is like trying to coordinate two different armies across a hundred different kitchens. Unless the partition keys match perfectly (Co-location), the performance will plummet into a combinatorial abyss. Partitioning is a powerful tool for scaling **Writes** and managing **Data Lifecycle** (e.g., deleting old data by simply dropping a partition). But like any powerful tool, it requires a master's touch to avoid turning your organized database into a fragmented maze. It’s a game of chess, played with elephants! ### Shattering the Monolith Let's perform the migration for our `supply_deliveries` table. We are moving from a single massive table to a partitioned one: ```sql -- 1. Look before you leap: Create the new Parent CREATE TABLE supply_deliveries_partitioned ( id BIGINT GENERATED ALWAYS AS IDENTITY, supplier_id INT NOT NULL, ingredient_id INT NOT NULL, delivery_time TIMESTAMPTZ NOT NULL, quantity_kg NUMERIC(8,2) NOT NULL, PRIMARY KEY (id, delivery_time) ) PARTITION BY RANGE (delivery_time); -- 2. Create the rooms for the different years CREATE TABLE supply_deliveries_2024 PARTITION OF supply_deliveries_partitioned FOR VALUES FROM ('2024-01-01') TO ('2025-01-01'); -- 3. The Swap (In a Transaction!) BEGIN; ALTER TABLE supply_deliveries RENAME TO supply_deliveries_old; ALTER TABLE supply_deliveries_partitioned RENAME TO supply_deliveries; COMMIT; ``` Now, the application keeps sending data to `supply_deliveries`, but the elephant secretly routes it to the correct "Year Room." The aisle is no longer cramped! --- | ← Previous | ↑ Table of Contents | Next → | | :--- | :---: | ---: | | [[Chapter 7/7.2 - The Infinite Archive (Cloud Storage)\|7.2 The Infinite Archive (Cloud Storage)]] | [[Learn You a Postgres for Great Good\|Home]] | [[Chapter 7/7.4 - The Crowded Hallway (Connection Pooling)\|7.4 The Crowded Hallway (Connection Pooling)]] |