SampleScan - Aesa's Notes

> [!NOTE] Sample Scan > <table> > <tr> > <td width="25%"><img src="assets/ex_samplescan.png"></td> > <td>Implements the <code>TABLESAMPLE</code> clause. Instead of scanning the whole table, the engine uses a specific sampling method (Bernoulli or System) to select a random subset of pages or rows, drastically reducing I/O for statistical queries.</td> > </tr> > </table> > > ```sql > -- Sampling 10% of the animals table using the SYSTEM method > EXPLAIN (ANALYZE, COSTS, BUFFERS, VERBOSE) > SELECT * FROM animals TABLESAMPLE SYSTEM (10); > ``` > > ```text > Sample Scan on public.animals (cost=1.00..80.00 rows=2000 width=27) (actual time=1.004..1.083 rows=1768 loops=1) > Output: id, name, species_id, created_at > Sampling: system ('10'::real) > Buffers: shared hit=13 > ``` > > <table> > <tr> > <td rowspan="5" width="25%"><img src="assets/ex_named_tuplestore_scan.svg"></td> > <td><b>Performance</b></td><td>High performance; <code>SYSTEM</code> sampling is typically much faster than <code>BERNOULLI</code> as it samples at the block level rather than the row level.</td> > </tr> > <tr><td><b>Factors</b></td><td>Sample percentage, sampling method, and physical page layout.</td></tr> > <tr><td><b>Cost</b></td><td><code>sampling cost * sample size</code></td></tr> > <tr><td><b>Operates on</b></td><td><a href="Structures/Tuple">Tuple</a></td></tr> > <tr><td><b>Workloads</b></td><td><a href="Workloads/IO/DataFile/DataFileRead">IO: DataFileRead</a>, <a href="Workloads/IO/DataFile/DataFilePrefetch">IO: DataFilePrefetch</a>, <a href="Workloads/LWLock/Buffers/BufferContent">LWLock: BufferContent</a></td></tr> > <tr><td colspan="3"><b>Description</b>: Scans a random sample of rows.</td></tr> > </table>