Skip to Content

Errors

Every non-2xx response carries one fixed JSON shape, and every error.code is a stable machine-readable string you can branch on. This page is the canonical mapping of HTTP status to error.code, plus the two backpressure signals (429, 503) and the performance block that rides on success responses too.

The error envelope

Every non-2xx response — except in-stream SSE errors, which are delivered as an event: error frame — carries this exact shape:

{ "error": { "code": "topic_not_found", "message": "topic \"orders\" does not exist", "detail": { "topic": "orders" } } }
FieldTypeMeaning
error.codestringStable, machine-readable snake_case string. Branch on this.
error.messagestringHuman-readable; may change between versions, never parse it.
error.detailobjectOptional structured context (e.g. the offending topic name, a limit, a retry_after_ms). May be absent.

Success responses carry bare data — there is no {"status":"ok"} envelope. The presence of a top-level error key is the only success/failure discriminator.

Tombstones and gaps are NOT errors. Involuntary cap-eviction and TTL crossings surface as in-band 200 payload signals — a tombstone object in a diff response, or an event: tombstone frame in SSE. Data loss is always explicit, but never an HTTP error. A voluntary delete is silent (no tombstone at all).

Status codes

The complete status → error.code mapping. Clients branch on error.code, not the prose message.

CodeMeaningerror.code
200OK (read, idempotent write / create / delete)
201Created (topic/router created on this call)
400Malformed request (bad JSON, bad type, value out of range)invalid_request, batch_too_large, record_too_large
401Missing or invalid bearer tokenunauthorized
403Authenticated, but the key lacks the required scope, or the topic/router name is outside its prefix allowlistforbidden
404Topic/router does not exist (and was not auto-created)topic_not_found, router_not_found
405Wrong method for the pathmethod_not_allowed
406Accept not text/event-stream on an SSE GETnot_acceptable
409Conflict: router cycle, config conflict, queue op on a non-queue topicrouter_cycle, topic_exists_incompatible, topic_not_empty, not_a_queue
413Body exceeds the server hard limit (rejected pre-parse)payload_too_large
415Wrong or missing Content-Typeunsupported_media_type
422Semantically invalid — e.g. a write to a full discard:"reject" topictopic_full
429Elastic throttle under CPU pressure, or a resource cap reachedthrottled
500Internal error (a bug)internal
503Not ready (WAL replay on boot) or shutting downnot_ready, shutting_down

A few codes are worth calling out:

  • 401 vs 403401 means no/invalid token; 403 means the token authenticates but lacks the required scope, or names a topic/router outside its prefix allowlist (enforced on the path and relevant request-body names). On watch: when the session was created with auth enabled it is bound to the creating key, so the SSE GET must present that same bearer (header or the dev-only ?token=) — a wrong key or no bearer at all is 401 (a leaked wid alone is not a credential). Only an unauthenticated (dev-mode) session opens on the wid alone.
  • 409 not_a_queue — a queue endpoint (claim/ack/nack/extend/ work) was called on a plain "log" topic.
  • 409 topic_exists_incompatible — a PUT tried to change a topic’s immutable type (logqueue).
  • 409 router_cycle — creating the router would introduce a directed cycle; error.detail.cycle lists the offending path, e.g. ["A","B","A"].

429 — throttle and resource caps

429 throttled is the single backpressure signal, raised in two situations. Both carry a Retry-After header; the error.detail distinguishes them so a client can react correctly.

CPU-pressure throttle — the elastic scheduler is shedding load. The detail carries a suggested wait:

{ "error": { "code": "throttled", "message": "throttled under CPU pressure", "detail": { "retry_after_ms": 1500 } } }

Resource cap reached — a configurable resource limit (max topics, routers, watch sessions, SSE connections, in-flight requests per key, or total retained bytes) would be exceeded. The detail names the cap:

{ "error": { "code": "throttled", "message": "max topics reached", "detail": { "limit": "max_topics", "max": 100000 } } }
error.detail fieldWhenMeaning
retry_after_msCPU-pressure throttleSuggested wait before retrying (ms).
limitResource capWhich cap was hit (e.g. "max_topics", "max_total_bytes").
maxResource capThe configured ceiling for that cap.

Because both situations reuse the same 429 throttled signal, a client that already backs off on 429 needs no change. Bulk writers that prefer to push through CPU pressure may set "disable_backpressure": true in the write body (a trusted-loader opt-out): the server then admits the write but may queue it, trading latency for not failing. Resource caps are not bypassable this way.

503 — not ready / shutting down

503 is the lifecycle backpressure signal, raised by the readiness gate and by ordinary endpoints during boot or drain. It always carries a Retry-After header.

  • not_ready — boot-time WAL replay is in progress. The detail carries replay_progress (0.01.0):

    { "error": { "code": "not_ready", "message": "WAL replay in progress", "detail": { "replay_progress": 0.62 } } }
  • shutting_down — the server received SIGINT/SIGTERM and is draining in-flight work before writing a final snapshot and exiting.

Route traffic on /v0/ready so a 503-during-replay node is taken out of rotation until it flips to 200. See Health & Metrics and Recovery.

The performance block

Every JSON response — and most errors — includes a performance object, so per-request observability lives in the response rather than a side channel:

"performance": { "server_total_ms": 0.41, "wal_append_ms": 0.12, "fsync_ms": 0.0, "records_scanned": 128, "throttle_wait_ms": 0.0 }

Fields are best-effort and additive — clients must tolerate any subset, and each field is omitted when it does not apply.

FieldAlways?Meaning
server_total_msyesTotal server-side handling time for the request (ms).
wal_append_mswhen relevantTime spent enqueuing the WAL frame. A memory topic still enters the WAL write path like disk, so this is not always 0 for memory (only the WAL-less in-memory test engine reports 0).
fsync_mswhen relevantTime the ack was held for the group fsync. 0 for non-fsync topics (the fast path).
records_scannedon readsRecords examined to build the response (includes filtered/skipped ones).
throttle_wait_mswhen throttledTime parked behind the elastic scheduler before handling.
cold_segments_readon cold readsCold-tier segments touched to satisfy a read (omitted when none).

fsync_ms is the clearest signal of which durability class a topic is using: it is 0.0 on memory and disk topics (whose ack is not fsync-gated) and a real value on fsync topics. cold_segments_read > 0 tells you a read fell through to the cold tier.

See also

  • API Conventions — auth, content types, the $-metadata convention, idempotency.
  • Health & Metrics — the 503 readiness gate in detail.
  • Tombstones — why involuntary loss is a 200 signal, not an error.
  • Security — scopes and prefix allowlist behind 401/403.
Last updated on