Skip to Content
DeploymentObservability

Observability

topics exposes three operational endpoints — liveness, readiness, and metrics — plus an inline performance block on every JSON response. The probes are designed for load-balancer and Kubernetes health checks, and crucially the readiness probe gates traffic during WAL replay on boot. The metrics surface is a full catalog — process/aggregate gauges, per-topic gauges, real WAL counters, and a fsync-latency histogram (see below) — not a topic-count stub.

Liveness — health

GET/v0/health

Returns 200 always while the process can serve a request. Use it for a load balancer’s “is this process up” check. There is a root alias GET /healthz for proxies that probe the server root.

curl localhost:4000/v0/health
{ "status": "ok", "version": "0.1.0", "uptime_ms": 84012 }
FieldTypeMeaning
statusstringAlways "ok" on a 200.
versionstringThe running build version.
uptime_msu64Milliseconds since the process started.

Liveness answers “is the process running,” not “is it ready to serve.” A booting server replaying its WAL returns 200 from /v0/health but 503 from /v0/ready — use the right probe for the right question.

Readiness — ready

GET/v0/ready

Returns 200 only when the server is actually serving. During boot, topics replays the WAL to rebuild in-memory state; until that completes the server is not ready, and /v0/ready returns 503 so a load balancer or Kubernetes keeps traffic away. Root alias: GET /readyz.

curl -i localhost:4000/v0/ready
# → 200 OK when serving { "status": "ready", "wal_replay_complete": true, "topics": 42 }

While replaying the WAL, it returns 503 not_ready with a Retry-After header and a replay progress fraction:

# → 503 Service Unavailable during WAL replay { "error": { "code": "not_ready", "message": "WAL replay in progress", "detail": { "replay_progress": 0.62 } } }
StateHTTPerror.codeBody
Serving200{ status, wal_replay_complete: true, topics }
Booting (WAL replay)503not_readyRetry-After + error.detail.replay_progress (0.0–1.0)
Draining (shutdown)503shutting_downRetry-After

Wire /v0/ready to your Kubernetes readinessProbe (not the livenessProbe). On restart, replaying a large WAL can take a fraction of a second to roughly a second; gating on readiness keeps the pod out of the Service endpoints until replay finishes, so requests are never served against a half-rebuilt state. Use /v0/health for the livenessProbe so a slow replay does not trigger a restart loop.

Probe wiring

livenessProbe: httpGet: path: /v0/health port: 4000 readinessProbe: httpGet: path: /v0/ready port: 4000 # /v0/ready returns 503 during WAL replay; readiness keeps the pod # out of rotation until replay completes.

By default the probes skip auth so a load balancer can poll liveness/readiness without a key. Set TOPICS_PROBE_AUTH=true to require auth on the health/ready/metrics endpoints too.

Metrics

GET/v0/metrics

Returns metrics for scraping. Prometheus text exposition (text/plain; version=0.0.4) by default, or a JSON snapshot if you send Accept: application/json. It returns 200 always, even when the server is not ready — metrics describe the recovering process.

# Prometheus text (default) curl localhost:4000/v0/metrics # JSON snapshot curl -H 'accept: application/json' localhost:4000/v0/metrics

Auth. Unlike /v0/health and /v0/ready, the metrics endpoint exposes operational state and is auth-gated by default when keys are configured: it requires a key with the read scope (a full-access key suffices). In dev mode (no keys) it is open.

/v0/metrics emits a full catalog: process/aggregate gauges (topics_topics, topics_topics_by_class{class=…}, topics_routers, topics_records_live, topics_bytes_live, topics_queue_topics, topics_queue_leases_in_flight, topics_sse_connections, topics_watch_sessions, topics_ready, topics_recovery_progress, topics_uptime_ms), per-topic gauges (topics_topic_head_seq / _earliest_seq / _records_live / _bytes_live / _queue_ready / _queue_in_flight, labelled {topic=…}, bounded by topics_topic_metrics_truncated), real WAL counters (topics_wal_frames_total / _batches_total / _fsyncs_total / _bytes_written_total / _rotations_total / _queue_depth / _queue_depth_peak / _submit_full_total / _read_only), and a fsync-latency histogram topics_wal_fsync_latency_us. There are no per-topic append/read/eviction/tombstone counters and no scheduler-throttle metric.

# Accept: application/json — the JSON snapshot mirrors the same series in one object { "topics": 42, "topics_memory": 3, "topics_disk": 30, "topics_fsync": 9, "routers": 5, "records_live": 1843201, "bytes_live": 734003200, "queue_topics": 2, "queue_leases_in_flight": 286, "sse_connections": 41, "watch_sessions": 44, "ready": true, "replay_progress": 1.0, "uptime_ms": 360123, "wal": { "fsyncs": 88241, "frames": 1843290, "batches": 90011, "bytes_written": 812340992, "rotations": 12, "queue_depth": 0, "queue_depth_peak": 1280, "submit_full_total": 0, "read_only": 0, "fsync_count": 88241, "fsync_micros_total": 441205000 } }

The per-response performance block below complements the scrape — it surfaces per-call latency, fsync cost, scan counts, and cold-read counts inline.

The performance block (inline observability)

Every JSON response (and most errors) carries a performance object, so observability lives in the response itself, not a side channel. This is the primary way to observe per-call cost in topics today.

"performance": { "server_total_ms": 0.41, "wal_append_ms": 0.12, "fsync_ms": 0.0, "records_scanned": 128, "throttle_wait_ms": 0.0 }

Fields are best-effort and additive — a client must tolerate any subset; a field is omitted when it does not apply to that call.

FieldWhen presentMeaning
server_total_msalwaysTotal server-side handling time for the request.
wal_append_mswritesTime to serialize and enqueue the WAL frame(s).
fsync_msdurable writesTime parked on the group-commit fsync. 0.0 on non-durable (disk/memory) topics.
records_scannedreadsRecords examined (including filtered/deleted/own-node ones the cursor advanced past).
throttle_wait_msunder pressureTime parked behind the elastic scheduler before the call ran.
cold_segments_readcold-tier readsNumber of cold-tier segments touched. Present only when a read reached cold storage.

A few patterns these enable without any external metrics:

  • Durability cost — a non-zero fsync_ms on a write is the cost of the fsync commit class. If fsync_ms is the bulk of server_total_ms, you are fsync-bound (a hardware floor on the disk, not a server bug).
  • Read efficiencyrecords_scanned far exceeding the records returned means a lot of filtered/deleted/own-node records sit in the scanned range; the cursor still advanced past them.
  • Backpressure — a non-zero throttle_wait_ms means the elastic scheduler paced the call under CPU pressure. The work was deferred, never dropped.
  • Cold reads — a present cold_segments_read confirms a read reached the cold tier. By the hard tiering invariant, this slows only that historical read, never writes or live delivery.

429 throttled responses carry a Retry-After header; a CPU-pressure throttle adds error.detail.retry_after_ms, and a resource-cap throttle adds error.detail.limit naming the cap that was hit. Branch on error.code and respect Retry-After. See Errors.

See also

Last updated on