Observability

topics exposes three operational endpoints — liveness, readiness, and metrics — plus an inline performance block on every JSON response. The probes are designed for load-balancer and Kubernetes health checks, and crucially the readiness probe gates traffic during WAL replay on boot. The metrics surface is a full catalog — process/aggregate gauges, per-topic gauges, real WAL counters, and a fsync-latency histogram (see below) — not a topic-count stub.

Liveness — health

GET/v0/health

Returns 200 always while the process can serve a request. Use it for a load balancer’s “is this process up” check. There is a root alias GET /healthz for proxies that probe the server root.


curl localhost:4000/v0/health


{ "status": "ok", "version": "0.1.0", "uptime_ms": 84012 }

Field	Type	Meaning
`status`	string	Always `"ok"` on a `200`.
`version`	string	The running build version.
`uptime_ms`	`u64`	Milliseconds since the process started.

Liveness answers “is the process running,” not “is it ready to serve.” A booting server replaying its WAL returns 200 from /v0/health but 503 from /v0/ready — use the right probe for the right question.

Readiness — ready

GET/v0/ready

Returns 200 only when the server is actually serving. During boot, topics replays the WAL to rebuild in-memory state; until that completes the server is not ready, and /v0/ready returns 503 so a load balancer or Kubernetes keeps traffic away. Root alias: GET /readyz.


curl -i localhost:4000/v0/ready


# → 200 OK when serving
{ "status": "ready", "wal_replay_complete": true, "topics": 42 }

While replaying the WAL, it returns 503 not_ready with a Retry-After header and a replay progress fraction:


# → 503 Service Unavailable during WAL replay
{ "error": {
    "code": "not_ready",
    "message": "WAL replay in progress",
    "detail": { "replay_progress": 0.62 }
} }

State	HTTP	`error.code`	Body
Serving	`200`	—	`{ status, wal_replay_complete: true, topics }`
Booting (WAL replay)	`503`	`not_ready`	`Retry-After` + `error.detail.replay_progress` (0.0–1.0)
Draining (shutdown)	`503`	`shutting_down`	`Retry-After`

Wire /v0/ready to your Kubernetes readinessProbe (not the livenessProbe). On restart, replaying a large WAL can take a fraction of a second to roughly a second; gating on readiness keeps the pod out of the Service endpoints until replay finishes, so requests are never served against a half-rebuilt state. Use /v0/health for the livenessProbe so a slow replay does not trigger a restart loop.

Probe wiring

Kubernetes


livenessProbe:
  httpGet:
    path: /v0/health
    port: 4000
readinessProbe:
  httpGet:
    path: /v0/ready
    port: 4000
  # /v0/ready returns 503 during WAL replay; readiness keeps the pod
  # out of rotation until replay completes.

By default the probes skip auth so a load balancer can poll liveness/readiness without a key. Set TOPICS_PROBE_AUTH=true to require auth on the health/ready/metrics endpoints too.

Metrics

GET/v0/metrics

Returns metrics for scraping. Prometheus text exposition (text/plain; version=0.0.4) by default, or a JSON snapshot if you send Accept: application/json. It returns 200 always, even when the server is not ready — metrics describe the recovering process.


# Prometheus text (default)
curl localhost:4000/v0/metrics
 
# JSON snapshot
curl -H 'accept: application/json' localhost:4000/v0/metrics

Auth. Unlike /v0/health and /v0/ready, the metrics endpoint exposes operational state and is auth-gated by default when keys are configured: it requires a key with the read scope (a full-access key suffices). In dev mode (no keys) it is open.

/v0/metrics emits a full catalog: process/aggregate gauges (topics_topics, topics_topics_by_class{class=…}, topics_routers, topics_records_live, topics_bytes_live, topics_queue_topics, topics_queue_leases_in_flight, topics_sse_connections, topics_watch_sessions, topics_ready, topics_recovery_progress, topics_uptime_ms), per-topic gauges (topics_topic_head_seq / _earliest_seq / _records_live / _bytes_live / _queue_ready / _queue_in_flight, labelled {topic=…}, bounded by topics_topic_metrics_truncated), real WAL counters (topics_wal_frames_total / _batches_total / _fsyncs_total / _bytes_written_total / _rotations_total / _queue_depth / _queue_depth_peak / _submit_full_total / _read_only), and a fsync-latency histogram topics_wal_fsync_latency_us. There are no per-topic append/read/eviction/tombstone counters and no scheduler-throttle metric.


# Accept: application/json — the JSON snapshot mirrors the same series in one object
{ "topics": 42, "topics_memory": 3, "topics_disk": 30, "topics_fsync": 9, "routers": 5,
  "records_live": 1843201, "bytes_live": 734003200, "queue_topics": 2,
  "queue_leases_in_flight": 286, "sse_connections": 41, "watch_sessions": 44,
  "ready": true, "replay_progress": 1.0, "uptime_ms": 360123,
  "wal": { "fsyncs": 88241, "frames": 1843290, "batches": 90011,
           "bytes_written": 812340992, "rotations": 12, "queue_depth": 0,
           "queue_depth_peak": 1280, "submit_full_total": 0, "read_only": 0,
           "fsync_count": 88241, "fsync_micros_total": 441205000 } }

The per-response performance block below complements the scrape — it surfaces per-call latency, fsync cost, scan counts, and cold-read counts inline.

The performance block (inline observability)

Every JSON response (and most errors) carries a performance object, so observability lives in the response itself, not a side channel. This is the primary way to observe per-call cost in topics today.


"performance": {
  "server_total_ms": 0.41,
  "wal_append_ms": 0.12,
  "fsync_ms": 0.0,
  "records_scanned": 128,
  "throttle_wait_ms": 0.0
}

Fields are best-effort and additive — a client must tolerate any subset; a field is omitted when it does not apply to that call.

Field	When present	Meaning
`server_total_ms`	always	Total server-side handling time for the request.
`wal_append_ms`	writes	Time to serialize and enqueue the WAL frame(s).
`fsync_ms`	durable writes	Time parked on the group-commit fsync. `0.0` on non-durable (`disk`/`memory`) topics.
`records_scanned`	reads	Records examined (including filtered/deleted/own-node ones the cursor advanced past).
`throttle_wait_ms`	under pressure	Time parked behind the elastic scheduler before the call ran.
`cold_segments_read`	cold-tier reads	Number of cold-tier segments touched. Present only when a read reached cold storage.

A few patterns these enable without any external metrics:

Durability cost — a non-zero fsync_ms on a write is the cost of the fsync commit class. If fsync_ms is the bulk of server_total_ms, you are fsync-bound (a hardware floor on the disk, not a server bug).
Read efficiency — records_scanned far exceeding the records returned means a lot of filtered/deleted/own-node records sit in the scanned range; the cursor still advanced past them.
Backpressure — a non-zero throttle_wait_ms means the elastic scheduler paced the call under CPU pressure. The work was deferred, never dropped.
Cold reads — a present cold_segments_read confirms a read reached the cold tier. By the hard tiering invariant, this slows only that historical read, never writes or live delivery.

429 throttled responses carry a Retry-After header; a CPU-pressure throttle adds error.detail.retry_after_ms, and a resource-cap throttle adds error.detail.limit naming the cap that was hit. Branch on error.code and respect Retry-After. See Errors.