GiST - Aesa's Notes

- **Description**: "Generalized Search Tree". It's a height-balanced tree framework that allows for the implementation of arbitrary indexing schemes. Unlike B-Tree, keys in a GiST index can overlap, which is handled by a **Penalty function** during insertion. - **The Framework Interface**: To implement a new type (using `CREATE OPERATOR CLASS`), you must provide: - `consistent`: Can this page contain matches for the query? - `union`: Combine multiple internal keys into one. - `compress/decompress`: Optimize key storage on disk. - `penalty`: How much would it 'cost' to insert this new key here? - `picksplit`: How to divide a sibling page when it overflows. - **Supported Operators**: Varies significantly (e.g., `&&`, `@>`, `<@`, `~=`). - **Data Structure**: - **GiST Page Header (`GISTPageOpaqueData`)**: Similar to B-Tree, GiST pages use an opaque tail for metadata. Flags include `F_LEAF` to distinguish between internal bounding boxes and leaf TIDs. - **Overlapping Keys**: Unlike B-Tree, GiST allows keys to overlap. If a query matches multiple internal nodes, Postgres must explore all of them. - **Minimum Bounding Rectangles (MBR)**: For spatial data, internal nodes store the "union" of all child objects as an MBR. - **Nearest-Neighbor (k-NN) Implementation**: - **Distance Prioritization**: When using the distance operator (`<->`), Postgres uses a **Priority Queue** to store pending page visits. It calculates the distance to the *bounding box* of a page; if that distance is greater than the current candidate's distance, the entire branch is deprioritized. - **Lossy Distance**: For complex shapes, GiST calculates the distance to the MBR first (fast) and only performs the exact, expensive calculation on the leaf record. - **Metaphor**: A multi-layered, magical sieve. You throw a bunch of shapes at it, and it catches exactly the ones that "overlap" or are "nearby" your target. If the sieve gets too full, the `picksplit` logic ensures the chaos is divided into two new, slightly-less-chaotic sieves.