# pg_wait_tracer — Development Roadmap
This document tracks completed work and planned features. Each phase builds
on the previous one. Items marked **DONE** are merged to master.
## Completed Phases
### Phase 1–8: Core Tracer (DONE, tag v0.2)
CLI redesign, 6 diagnostic views, multi-window support, auto-discovery.
- Hardware watchpoint on `my_wait_event_info` per backend
- BPF ring buffer with raw event streaming
- Accumulator with 6 views: time_model, system_event, session_event,
histogram, query_event, active
- Multi-window time comparisons (`--window 5s,1m,5m`)
- Auto-detect single PG instance
- TUI and text output modes (auto-detect TTY)
### Phase A: BPF Ring Buffer Dual-Path (DONE)
Raw event streaming alongside aggregated maps. Foundation for trace recording.
### Phase B: Accumulator + 6 Views (DONE)
In-memory accumulation of ring buffer events into the 6 diagnostic views.
### Phase C: Event File Writer + Reader + Replay (DONE)
- Columnar LZ4-compressed trace files (4096 events/block, ~36x compression)
- Block index footer for time-range binary search
- Hourly rotation, configurable retention (default 24h)
- Offline replay mode (`--replay`)
### Phase D: Daemon Mode (DONE)
- Persistent monitoring with `--daemon` flag
- Automatic PostgreSQL restart detection (postmaster health check)
- BPF destroy+reload on PG restart (rodata is immutable after load)
- PGDATA inference from `/proc/<pid>/cwd`
### Phase E.0: Investigation Client — Foundation (DONE)
Investigation client for trace file analysis. (Rust TUI client removed in
Sprint 8; replaced by `pgwt-server --dump` for CLI and `pgwt` web client.)
- **Trace file reader** (Rust): reads `.trace.lz4` files (header, footer,
block index, LZ4 decompression, columnar decode)
- **Streaming reader** for `current.trace` (no footer — scans blocks
sequentially, handles truncated blocks gracefully)
- **4 table views**: Overview (time_model), Top Events, Top Sessions,
Top Queries
- **Wait event name resolution**: full PG18 event name tables ported to Rust
- **`--dump` mode**: non-interactive summary to stdout
- **`--generate-test`**: synthetic test data generator with realistic
OLTP workload patterns (10 sessions, 6 roles, time phases, burst/idle)
### Phase E.1: AAS Stacked Bar Chart (DONE)
The centerpiece visual — Average Active Sessions over time.
- **AAS bucketing**: per-bucket, per-wait-class accumulation with
overlap-aware duration splitting
- **Half-block rendering**: `'▄'` technique for 2x vertical resolution,
works on any terminal
- **Pixel-perfect rendering**: via `ratatui-image` (iTerm2, Sixel, Kitty
graphics protocols), auto-detected at startup
- **Transparent background**: RGBA images blend with terminal theme
- **Chart height**: 20 rows (40 virtual rows in half-block mode)
- **Color palette**: 11 wait classes with distinct colors
(CPU=green, IO=blue, Lock=red, LwLock=pink, IPC=cyan, etc.)
- **Decorations**: auto-scaled Y-axis, HH:MM:SS X-axis, inline legend
---
## Phase F: Web Investigation Client (DONE)
**Goal**: Oracle ASH / RDS Performance Insights-class interactive
investigation tool with a rich web UI, running on the DBA's laptop and
connecting to the DB server over SSH.
### Architecture
```
[DBA laptop] [DB server]
pgwt (Go binary) pgwt-server (C binary)
├─ spawns: ssh user@host pgwt-server ├─ reads trace files (reuses event_reader.c)
├─ localhost HTTP server (net/http) ├─ computes aggregates (reuses compute.c,
├─ WebSocket bridge: browser ↔ SSH │ map_reader.c, wait_event.c)
├─ static assets (//go:embed) └─ JSON lines on stdin/stdout
└─ auto-opens browser
```
**Why this architecture**:
- Like git/rsync/mosh — spawns real `ssh`, inherits user's `~/.ssh/config`,
agent, ProxyJump, known_hosts. Zero auth code.
- Server side is pure C — reuses existing trace reader and compute code,
no new runtime dependencies on DB server
- Client side is Go — single static binary, stdlib HTTP/embed, trivial
cross-compilation (macOS/Linux/Windows)
- Web frontend (ECharts + JS) gives full interactive charts, mouse
interaction, tooltips — all the UX patterns that terminal can't do
**Usage**:
```bash
pgwt root@db-server
# Browser opens http://localhost:8384 automatically
```
### F.1: Server side — `pgwt-server` (C) — DONE
Binary: `src/server.c` (+ `src/compute.c`)
Reuses existing modules directly:
- `event_reader.c` — trace file reading, LZ4 decompression, block index
- `replay.c` — event iteration, time range filtering
- `wait_event.c` — event name resolution (PG17/PG18 tables)
- `map_reader.c` — accumulator logic
**Protocol** (JSON lines over stdin/stdout):
```
--> {"id":1,"cmd":"info"}
<-- {"id":1,"from_ns":1709000000,"to_ns":1709036000,"num_events":284000,"num_cpus":8}
--> {"id":2,"cmd":"aas","from":1709000000,"to":1709003600,"buckets":120,"filters":{}}
<-- {"id":2,"buckets":[{"t":1709000000,"cpu":2.1,"io":0.8,"lock":0.3,...},...],
"max_aas":4.2}
--> {"id":3,"cmd":"top_events","from":1709000000,"to":1709003600,"filters":{"class":"IO"}}
<-- {"id":3,"rows":[{"event_id":167772161,"name":"IO:DataFileRead","class":"IO",
"count":4521,"total_ms":892.3,"avg_us":197.4,"max_us":12400,"pct":34.2,
"aas":0.25},...], "db_time_ms":2608.1}
--> {"id":4,"cmd":"top_sessions","from":...,"to":...,"filters":{}}
<-- {"id":4,"rows":[{"pid":1234,"db_time_ms":450.2,"cpu_pct":62.1,
"top_wait":"IO:DataFileRead"},...]}
--> {"id":5,"cmd":"top_queries","from":...,"to":...,"filters":{}}
<-- {"id":5,"rows":[{"query_id":123456789,"count":891,"total_ms":1200.5,"pct":46.0,
"top_wait":"Lock:transactionid"},...]}
--> {"id":6,"cmd":"time_model","from":...,"to":...,"filters":{}}
<-- {"id":6,"db_time_ms":2608.1,"idle_time_ms":45000,"aas":2.61,
"classes":[{"name":"CPU","ms":1200,"pct":46.0,"aas":1.2},...],"wall_ms":1000}
```
**Filters** (all optional, AND logic):
```json
{"class":"IO","event_id":167772161,"pid":1234,"query_id":123456789}
```
**Build**: compiled alongside the daemon by the existing Makefile.
Linked against same object files. No new dependencies.
### F.2: Client side — `pgwt` (Go) — DONE
Directory: `web/` (Go module)
```
web/
├── main.go # CLI args, spawn ssh, HTTP server, open browser
├── bridge.go # WebSocket ↔ SSH stdin/stdout bridge
├── static/ # Embedded web assets
│ ├── index.html # Single-page app
│ ├── app.js # Main application logic (chart + tables + drill-down)
│ └── style.css # Layout and styling
├── go.mod
└── go.sum
```
**Go dependencies** (minimal):
- `gorilla/websocket` — WebSocket support
- Everything else is stdlib (`net/http`, `os/exec`, `embed`, `encoding/json`)
**SSH spawning**:
```go
cmd := exec.Command("ssh", host, "pgwt-server", traceDir)
stdin, _ := cmd.StdinPipe() // send requests
stdout, _ := cmd.StdoutPipe() // read responses
```
**WebSocket bridge**:
- Browser sends JSON request via WebSocket
- Go forwards to SSH stdin
- Go reads JSON response from SSH stdout
- Go forwards to browser via WebSocket
- Requests/responses matched by `"id"` field for concurrency
### F.3: Web frontend (HTML + JS + ECharts) — DONE
**Layout**:
```
┌──────────────────────────────────────────────┐
│ pgwt [Connected] 14:00–14:30 │ header
├──────────────────────────────────────────────┤
│ ▓▓▓▓▓▓▓▓▓▓████████▓▓▓▓▓▓░░░░░░░░░░░▓▓▓▓▓▓ │ AAS stacked
│ ▓▓▓▓▓▓████████████▓▓▓▓░░░░░░░░░░░░░░▓▓▓▓▓▓ │ area chart
│ ▓▓▓▓████████████▓▓▓▓░░░░░░░░░░░░░░░░░▓▓▓▓▓ │ (ECharts)
├──────────────────────────────────────────────┤
│ [Overview] [Events] [Sessions] [Queries] │ tabs
├──────────────────────────────────────────────┤
│ ● IO > IO:DataFileRead ✕ │ breadcrumbs
├──────────────────────────────────────────────┤
│ DB Time: 2.3s Wall: 2m 47s AAS: 1.38 │ summary bar
├──────────────────────────────────────────────┤
│ # Wait Event Count Total %DB AAS │ data table
│ 1 ● IO:DataFileRead 4521 892ms ██ 34% │ (click = drill)
│ 2 ● Lock:transaction 1203 450ms █ 17% │
│ ... │
└──────────────────────────────────────────────┘
```
**Chart features**:
- Stacked area chart with 11 wait class colors (ECharts)
- Tooltip with per-class AAS breakdown and percentages
- Interactive drag-to-zoom
- Responsive resize
**Table features**:
- Click row to drill down (adds filter, pivots to next view)
- Breadcrumb trail with colored dots: `● IO > ● IO:DataFileRead > PID 1234`
- Color-coded dots next to wait class/event names in all tables
- Percentage bars (visual bar behind `%DB`, `CPU%`, `Wait%` columns)
- Summary header above overview table (DB Time, Wall, AAS, Idle, CPUs)
- Sortable columns (click header)
**Connectivity**:
- WebSocket auto-reconnect with exponential backoff (2s → 16s max)
- Connection status indicator (Connecting / Connected / Reconnecting)
### F.4: Implementation steps (all DONE)
| Step | What | Status |
|------|------|--------|
| 1 | `pgwt-server` — info + aas commands | DONE |
| 2 | `pgwt` Go skeleton — SSH + HTTP + WS bridge | DONE |
| 3 | AAS stacked area chart in browser | DONE |
| 4 | `pgwt-server` — all 6 commands | DONE |
| 5 | Data tables + drill-down + breadcrumbs | DONE |
| 6 | Color dots, percentage bars, summary header | DONE |
| 7 | Auto-reconnect, tooltip %, style polish | DONE |
---
## Planned: Phase G — Daemon Enhancements
### G.1: Query Text Capture
**Goal**: Show the actual SQL text alongside `query_id` in the investigation
client, without requiring pg_stat_statements lookups.
**How it works**: The daemon reads `PgBackendStatus.st_activity` from shared
memory via `process_vm_readv()` when it first sees a new `query_id`. This
gives the actual running SQL (not normalized — real parameter values), which
is more useful for debugging than `pg_stat_statements`' normalized form.
#### Daemon side
New module: `src/query_text.c`
1. **Discover `st_activity` offset** in `PgBackendStatus`:
- DWARF debug symbols (preferred): parse `readelf` output
- Known offset table: PG17/18 x86_64 fallback
- Same approach as existing `st_query_id` offset discovery
2. **Discover `BackendStatusArray` base address**:
- ELF symbol resolution (already in `src/discovery.c`)
- `BackendStatusArray` is a global pointer to an array of
`PgBackendStatus` structs in shared memory
3. **Track seen query_ids**: hash set in daemon memory
4. **Capture logic** (on each timer tick):
- For each backend with a `query_id` not in the seen set:
- `process_vm_readv()` to read `st_activity` (1024 bytes,
`PGBE_ACTIVITY_SIZE`, null-terminated)
- Write `{query_id, text, timestamp}` to sidecar file
- Add `query_id` to seen set
5. **Sidecar file**: `query_texts.jsonl` in trace directory
```json
{"query_id":1234567890,"text":"SELECT * FROM accounts WHERE aid = 42837","ts":1709012345000000000}
```
- One line per unique `query_id`
- Rotated alongside trace files (same retention policy)
#### Server side
`pgwt-server` loads `query_texts.jsonl`, serves via `top_queries` response:
```json
{"query_id":123,"text":"SELECT * FROM accounts WHERE...","count":891,...}
```
#### Offset discovery
- `BackendStatusArray + backend_index * sizeof(PgBackendStatus) + st_activity_offset`
- Backend index = `MyBackendId - 1`
- `st_activity` is `char[1024]` — null-terminated C string
### G.2: Plan Identifier Capture (PG18+)
**Goal**: Capture the execution plan hash per-backend to detect plan changes
correlated with wait event spikes (plan regression detection).
**Background**: PostgreSQL 18 added `int64 st_plan_id` to `PgBackendStatus`
in shared memory (right after `st_query_id`). Core PG does **not** compute
it automatically — it stays 0 unless a `planner_hook` extension populates it.
There is no `compute_plan_id` GUC. On PG17, `st_plan_id` does not exist.
#### PG version matrix
| PG Version | `st_query_id` | `st_plan_id` | Notes |
|------------|:---:|:---:|---|
| 14–16 | Exists but unreliable | N/A | `my_wait_event_info` not writable |
| 17 | `uint64` | Does not exist | Full wait tracing works |
| 18+ | `int64` | `int64` (new) | Plan ID requires planner_hook extension |
#### Daemon side
1. **Discover `st_plan_id` offset** in `PgBackendStatus`:
- DWARF: same approach as `st_query_id` / `st_activity`
- Known offset: immediately after `st_query_id` (offset + 8)
- PG17: skip (field doesn't exist)
2. **Read alongside `st_query_id`**: extend existing `process_vm_readv`
call by 8 bytes — essentially free, no extra syscall
3. **Store in trace events**: add `plan_id` column to columnar block format
- Bump trace file version (v1 -> v2)
- Reader handles both versions gracefully
#### Server + client side
1. Show `plan_id` column in queries view (when non-zero)
2. Plan change detection: highlight when same `query_id` has multiple
`plan_id` values over time
3. Investigation flow: wait time spike -> drill to query -> see `plan_id`
changed at the spike -> plan regression confirmed
#### Requirements for non-zero plan_id
Users must install a `planner_hook` extension:
- **pg_stat_sql_plans** — computes plan hash, stores plan-level stats
- **pg_store_plans** — stores actual plan text indexed by (queryid, planid)
- Custom extension using `planner_hook` + `result->planId = hash`
Without such an extension, `st_plan_id` is always 0 and the feature is
silently inactive. Document this clearly.
---
## Planned: Phase H — Live Mode
**Goal**: Connect the web client to a running daemon for real-time streaming.
- `pgwt-server` connects to daemon via Unix socket on the DB server
- Pushes live AAS updates to the web client over the SSH channel
- Active view works only in live mode (requires BPF state_map)
- Dual mode: historical (trace files) + live (daemon) in same web UI
---
## Legacy: Rust TUI Client (Removed)
The Rust TUI client (`client/` directory, Phases E.0–E.1) was removed in
Sprint 8. Its functionality is covered by:
- `pgwt-server --dump` — CLI text summary (time model, top events, sessions, queries)
- `pgwt` — web investigation client with full drill-down and AAS charts
---
## Architecture Notes
### Hardware watchpoint approach
pg_wait_tracer uses CPU debug registers to set a hardware watchpoint on
the `my_wait_event_info` global variable. PostgreSQL 17+ writes wait events
through `*my_wait_event_info` (pointer dereference), which the watchpoint
captures. This gives exact, non-sampled event transitions at nanosecond
precision with ~5% TPS overhead.
### BPF constraints
- BPF rodata is immutable after skeleton load — must destroy+reload on
PG restart (daemon mode)
- Hardware watchpoints limited to ~4 per CPU — kill stale processes
between test runs
- `pgwt_daemon` struct is ~27MB — heap-allocated
### Trace file design
- Columnar blocks (4096 events), LZ4 compressed (~36x ratio)
- Block index footer for binary search by timestamp
- `current.trace` has no footer while being written — streaming reader
scans blocks sequentially
- Hourly rotation to `YYYY-MM-DD_HH.trace.lz4`
- Configurable retention (default 24h)
### Client-server protocol
- **Transport**: SSH exec channel (stdin/stdout of `pgwt-server`)
- **Format**: JSON lines (newline-delimited JSON)
- **Request**: `{"id":1,"cmd":"aas","from":...,"to":...,"filters":{}}`
- **Response**: `{"id":1,"buckets":[...],"max_aas":4.2}`
- **Matching**: `id` field for concurrent request/response pairing
- **Same model as**: git over SSH, rsync --server, ParaView pvserver
### UX reference sources
- **Oracle EM Performance Hub / ASH Analytics**: stacked area chart,
two-tier time slider, click-to-filter, dimensional pivot dropdown,
filter tags bar, CPU cores line, load map treemap view
- **AWS RDS Performance Insights**: "slice by" dropdown, mini wait bars
per SQL row, Max vCPU line, top 25 items per dimension tab
- **PASH-Viewer**: time range selection on chart, Top SQL drill-down,
execution plan display
- **pg_ash**: ASCII stacked bar charts, Unicode block characters for
rendering, "Other" rollup in top-N, per-query wait profile,
semantic color mapping (green=CPU, blue=IO, red=Lock)