_Distributed - Aesa's Notes

# Distributed Operations ![[assets/arch_distributed_query.png|256]] **Distributed Operations** (and Parallel Operations) handle the coordination of data across process or network boundaries. They enable the engine to use multiple CPUs or multiple servers to solve a single problem. ### The Architectural Role In a parallel or distributed query, data must be gathered from multiple workers and "shipped" to a coordinator node. These operations handle: 1. **Serialization**: Packing tuples into a format that can be sent over a shared memory queue or network socket. 2. **Redistribution**: Ensuring that data reaches the correct node for a join or aggregation (e.g., Sharding). 3. **Synchronization**: Waiting for all workers to finish their work before the next step can proceed. ### In the Explain Plan These nodes indicate that the "Single Elephant" has delegated work to a team. Look for `Workers Planned` and `Gather` nodes. ```text -> Gather (cost=1000.00..5678.00 rows=10000 width=24) Workers Planned: 2 -> Parallel Seq Scan on large_table (cost=1.00..345.00 rows=5000 width=24) ``` --- ![[Operations/Distributed/Broadcast]] --- ![[Operations/Distributed/Redistribute]] --- ![[Operations/Distributed/Gather]]