Broadcast - Aesa's Notes

> [!NOTE] Broadcast > <table> > <tr> > <td width="25%"><img src="assets/ex_broadcast.png"></td> > <td>A distributed optimization where the engine copies an entire (usually small) relation to every worker node in the cluster. This allows large distributed tables to be joined with small reference tables without moving the large data set.</td> > </tr> > </table> > > > [!NOTE] > > **Distributed Feature**: Broadcast nodes are found in distributed extensions like Citus. They do not appear in standard single-node PostgreSQL plans. > > ```sql > -- Conceptual plan for a distributed Join > -- (Table 'species' is broadcast to all nodes) > EXPLAIN SELECT * FROM animals a > JOIN species s ON a.species_id = s.id; > ``` > > ```text > Custom Scan (Citus Adaptive) (cost=1.00..1.00 rows=0 width=0) > -> Distributed Subplan 1 > -> Broadcast (cost=1.00..2.05 rows=5 width=15) > -> Seq Scan on species s (...) > -> Task Executor > -> Index Scan using idx_animals_species_id on animals_102001 a (...) > ``` > > <table> > <tr> > <td rowspan="5" width="25%"><img src="assets/ex_broadcast_motion.svg"></td> > <td><b>Performance</b></td><td>Highly network-intensive; primarily used when the "broadcast" table is small enough to fit in the memory of every worker node.</td> > </tr> > <tr><td><b>Factors</b></td><td>Size of the dataset being broadcast, network bandwidth, and the total number of nodes in the cluster.</td></tr> > <tr><td><b>Cost</b></td><td><code>network_transfer_cost * size_of_data * number_of_nodes</code></td></tr> > <tr><td><b>Operates on</b></td><td><a href="Structures/Result Set">Result Set</a></td></tr> > <tr><td><b>Workloads</b></td><td><a href="Workloads/IPC/MessageQueue/MessageQueueSend">IPC: MessageQueueSend</a>, <a href="Workloads/IPC/MessageQueue/MessageQueueReceive">IPC: MessageQueueReceive</a>, <a href="Workloads/LWLock/Parallel/ParallelQueryDSA">LWLock: ParallelQueryDSA</a></td></tr> > <tr><td colspan="3"><b>Description</b>: Broadcast operations distribute data from one node to all other nodes in a distributed database system. This is often used in join operations where a small table is sent to all nodes to join with a larger, distributed table.</td></tr> > </table>