Gather - Aesa's Notes

> [!NOTE] Distributed Gather > <table> > <tr> > <td width="25%"><img src="assets/ex_gather.png"></td> > <td>The entry point for parallel query execution. This node coordinates multiple 'Parallel Workers', collecting their individual results and merging them back into a single stream for the leader process to handle.</td> > </tr> > </table> > > ```sql > -- Forcing a parallel aggregation and gather > SET max_parallel_workers_per_gather = 2; > SET min_parallel_table_scan_size = 0; > SET parallel_setup_cost = 0; > > EXPLAIN (ANALYZE, COSTS, BUFFERS, VERBOSE) > SELECT count(*) FROM animals; > ``` > > ```text > -> Gather (cost=252.17..252.18 rows=2 width=8) (actual time=2.347..3.628 rows=3 loops=1) > Output: (PARTIAL count(*)) > Workers Planned: 2 > Workers Launched: 2 > Buffers: shared hit=148 > -> Partial Aggregate (...) > ``` > > <table> > <tr> > <td rowspan="5" width="25%"><img src="assets/ex_gather_motion.svg"></td> > <td><b>Performance</b></td><td>Can introduce a serialization bottleneck if a large volume of data must be shipped through the coordinator.</td> > </tr> > <tr><td><b>Factors</b></td><td>Volume of data, number of workers, and IPC (Inter-Process Communication) overhead.</td></tr> > <tr><td><b>Cost</b></td><td><code>setup_cost + communication_cost * data_size</code></td></tr> > <tr><td><b>Operates on</b></td><td><a href="Structures/Result Set">Result Set</a></td></tr> > <tr><td><b>Workloads</b></td><td><a href="Workloads/IPC/Parallel/ExecuteGather">IPC: ExecuteGather</a>, <a href="Workloads/IPC/Parallel/ParallelFinish">IPC: ParallelFinish</a>, <a href="Workloads/LWLock/Parallel/ParallelQueryDSA">LWLock: ParallelQueryDSA</a>, <a href="Workloads/LWLock/Buffers/SharedTupleStore">LWLock: SharedTupleStore</a></td></tr> > <tr><td colspan="3"><b>Description</b>: Collects data from multiple nodes or parallel processes and consolidates it into a single node. This is often the final step in a parallel query.</td></tr> > </table>