Errors
Every non-2xx response carries one fixed JSON shape, and every error.code is a stable
machine-readable string you can branch on. This page is the canonical mapping of HTTP
status to error.code, plus the two backpressure signals (429, 503) and the
performance block that rides on success responses too.
The error envelope
Every non-2xx response — except in-stream SSE errors, which are delivered as
an event: error frame — carries this exact shape:
{
"error": {
"code": "topic_not_found",
"message": "topic \"orders\" does not exist",
"detail": { "topic": "orders" }
}
}| Field | Type | Meaning |
|---|---|---|
error.code | string | Stable, machine-readable snake_case string. Branch on this. |
error.message | string | Human-readable; may change between versions, never parse it. |
error.detail | object | Optional structured context (e.g. the offending topic name, a limit, a retry_after_ms). May be absent. |
Success responses carry bare data — there is no {"status":"ok"} envelope. The
presence of a top-level error key is the only success/failure discriminator.
Tombstones and gaps are NOT errors. Involuntary cap-eviction and TTL crossings
surface as in-band 200 payload signals — a tombstone object in a
diff response, or an event: tombstone frame in SSE.
Data loss is always explicit, but never an HTTP error. A
voluntary delete is silent (no tombstone at all).
Status codes
The complete status → error.code mapping. Clients branch on error.code, not the prose
message.
| Code | Meaning | error.code |
|---|---|---|
200 | OK (read, idempotent write / create / delete) | — |
201 | Created (topic/router created on this call) | — |
400 | Malformed request (bad JSON, bad type, value out of range) | invalid_request, batch_too_large, record_too_large |
401 | Missing or invalid bearer token | unauthorized |
403 | Authenticated, but the key lacks the required scope, or the topic/router name is outside its prefix allowlist | forbidden |
404 | Topic/router does not exist (and was not auto-created) | topic_not_found, router_not_found |
405 | Wrong method for the path | method_not_allowed |
406 | Accept not text/event-stream on an SSE GET | not_acceptable |
409 | Conflict: router cycle, config conflict, queue op on a non-queue topic | router_cycle, topic_exists_incompatible, topic_not_empty, not_a_queue |
413 | Body exceeds the server hard limit (rejected pre-parse) | payload_too_large |
415 | Wrong or missing Content-Type | unsupported_media_type |
422 | Semantically invalid — e.g. a write to a full discard:"reject" topic | topic_full |
429 | Elastic throttle under CPU pressure, or a resource cap reached | throttled |
500 | Internal error (a bug) | internal |
503 | Not ready (WAL replay on boot) or shutting down | not_ready, shutting_down |
A few codes are worth calling out:
401vs403—401means no/invalid token;403means the token authenticates but lacks the required scope, or names a topic/router outside its prefix allowlist (enforced on the path and relevant request-body names). On watch: when the session was created with auth enabled it is bound to the creating key, so the SSE GET must present that same bearer (header or the dev-only?token=) — a wrong key or no bearer at all is401(a leakedwidalone is not a credential). Only an unauthenticated (dev-mode) session opens on thewidalone.409 not_a_queue— a queue endpoint (claim/ack/nack/extend/work) was called on a plain"log"topic.409 topic_exists_incompatible— aPUTtried to change a topic’s immutabletype(log↔queue).409 router_cycle— creating the router would introduce a directed cycle;error.detail.cyclelists the offending path, e.g.["A","B","A"].
429 — throttle and resource caps
429 throttled is the single backpressure signal, raised in two situations. Both carry a
Retry-After header; the error.detail distinguishes them so a client can react
correctly.
CPU-pressure throttle — the elastic scheduler is shedding load. The detail carries a suggested wait:
{ "error": {
"code": "throttled",
"message": "throttled under CPU pressure",
"detail": { "retry_after_ms": 1500 } } }Resource cap reached — a configurable resource limit (max topics, routers, watch sessions, SSE connections, in-flight requests per key, or total retained bytes) would be exceeded. The detail names the cap:
{ "error": {
"code": "throttled",
"message": "max topics reached",
"detail": { "limit": "max_topics", "max": 100000 } } }error.detail field | When | Meaning |
|---|---|---|
retry_after_ms | CPU-pressure throttle | Suggested wait before retrying (ms). |
limit | Resource cap | Which cap was hit (e.g. "max_topics", "max_total_bytes"). |
max | Resource cap | The configured ceiling for that cap. |
Because both situations reuse the same 429 throttled signal, a client that already
backs off on 429 needs no change. Bulk writers that prefer to push through CPU pressure
may set "disable_backpressure": true in the write body (a trusted-loader opt-out): the
server then admits the write but may queue it, trading latency for not failing. Resource
caps are not bypassable this way.
503 — not ready / shutting down
503 is the lifecycle backpressure signal, raised by the
readiness gate and by ordinary endpoints
during boot or drain. It always carries a Retry-After header.
-
not_ready— boot-time WAL replay is in progress. The detail carriesreplay_progress(0.0–1.0):{ "error": { "code": "not_ready", "message": "WAL replay in progress", "detail": { "replay_progress": 0.62 } } } -
shutting_down— the server receivedSIGINT/SIGTERMand is draining in-flight work before writing a final snapshot and exiting.
Route traffic on /v0/ready so a 503-during-replay node is taken out of rotation until
it flips to 200. See Health & Metrics and
Recovery.
The performance block
Every JSON response — and most errors — includes a performance object, so per-request
observability lives in the response rather than a side channel:
"performance": {
"server_total_ms": 0.41,
"wal_append_ms": 0.12,
"fsync_ms": 0.0,
"records_scanned": 128,
"throttle_wait_ms": 0.0
}Fields are best-effort and additive — clients must tolerate any subset, and each field is omitted when it does not apply.
| Field | Always? | Meaning |
|---|---|---|
server_total_ms | yes | Total server-side handling time for the request (ms). |
wal_append_ms | when relevant | Time spent enqueuing the WAL frame. A memory topic still enters the WAL write path like disk, so this is not always 0 for memory (only the WAL-less in-memory test engine reports 0). |
fsync_ms | when relevant | Time the ack was held for the group fsync. 0 for non-fsync topics (the fast path). |
records_scanned | on reads | Records examined to build the response (includes filtered/skipped ones). |
throttle_wait_ms | when throttled | Time parked behind the elastic scheduler before handling. |
cold_segments_read | on cold reads | Cold-tier segments touched to satisfy a read (omitted when none). |
fsync_ms is the clearest signal of which durability class a
topic is using: it is 0.0 on memory and disk topics (whose ack is not fsync-gated) and
a real value on fsync topics. cold_segments_read > 0 tells you a read fell through to
the cold tier.
See also
- API Conventions — auth, content types, the
$-metadata convention, idempotency. - Health & Metrics — the
503readiness gate in detail. - Tombstones — why involuntary loss is a
200signal, not an error. - Security — scopes and prefix allowlist behind
401/403.