Mental Model
topics has five concepts, and they are the whole product: a Topic is a named
append-only log; a Record is one immutable event in it; seq is the cursor; a
Router forwards records between topics; and a Tombstone is the explicit “you
missed data” signal. One operation completes the picture — a Delete is a permanent,
point-in-time removal. Learn these and you understand the entire /v0 surface —
everything else is operational detail.
One rule underpins them all, and it is worth memorizing before you read on:
Involuntary, capacity-driven loss you didn’t ask for (cap eviction, TTL expiry) always produces a tombstone. Voluntary removal you did ask for (a permanent delete, your own node’s events) is silently filtered.
Everything below exists to keep that distinction crisp.
The $-prefixed convention
Before the concepts, one naming rule that runs through every JSON example on this site.
Server-computed fields are $-prefixed — $seq, $ts, $node, $tag —
so they can never collide with the user-controlled data namespace. (SSE distinguishes
payload kinds by the event name, e.g. event: record / event: tombstone, not a $type
field — there is no $type key on the wire.) On write you set
node and tag as plain top-level keys; on read the server echoes them back as the
canonical, immutable $node and $tag. The data and meta objects keep the same key
both ways (pure passthrough). All times are integer milliseconds since the Unix epoch
($ts) or integer millisecond durations (any *_ms field).
$node, $tag, and meta are omitted from a response when absent (absence, not
null). data is always present, though it may itself be JSON null.
Topic
A topic is an append-only log of records ordered by a monotonic seq, plus a small
config and a pair of derived watermarks. Think inbox/outbox. It is the unit of naming,
durability, retention, and priority — almost every decision in topics is a per-topic
decision.
Identity & naming
A topic is addressed by name in the path (/v0/topics/:topic); the name is the
identity. Names match ^[A-Za-z0-9][A-Za-z0-9._:-]{0,254}$ — 1–255 bytes, starting
alphanumeric, allowing . _ : -. The : is for namespacing (chat:general). Names
are case-sensitive and byte-exact, with no Unicode normalization. On disk, files are keyed
by a numeric topic id rather than the name, so a topic name can never become a path-traversal
vector.
Creation & lifecycle
A topic is created lazily on first write (turbopuffer-style ergonomics), or explicitly
with PUT /v0/topics/:topic when you want to set config up front. You can opt
out of lazy creation per-write with create: false, which returns 404 topic_not_found
against a missing topic (the Redis NOMKSTREAM lesson against typo-topics).
Deleting a topic tears down all of its records, its tag index, its dedupe
state, and any routers that reference it as source or dest. It is irreversible. A
later lazy-create makes a new, empty topic whose seq restarts at the base — and a stale
consumer pointed at the old topic detects the rewind exactly, via a reason:"recreated"
tombstone.
Config & durability
Topic config is { ttl_ms, cap_records, cap_bytes, discard, durable, durability, priority, auto_priority, auto_create, idempotency_window_ms, dedupe_node }. Every default is safe:
all caps and TTL are off (ttl_ms, cap_records, cap_bytes all 0), so an
out-of-the-topic topic loses nothing — silent loss must be a deliberate choice you opt into.
The load-bearing config knob is the durability commit class, resolved from the topic’s
current config (the topic type is immutable, but durability/config can be updated in place):
| Class | Where it lands | Ack timing | Survives a crash? |
|---|---|---|---|
memory | WAL, group-committed (same path as disk) | immediate; not fsync-gated; fsync_ms is 0 | Best-effort — records MAY survive OR be lost on restart (no guarantee; config always persists). |
disk | WAL, group-committed (no per-write fsync) | on WAL-frame enqueue, not fsync-gated; fsync_ms is 0 | Yes, minus the un-fsynced tail. |
fsync | WAL, fsync-gated | after the group fsync (real fsync_ms) | Yes, any crash — an acked write is recovered by WAL replay. |
These are not server-wide modes: a memory cache topic, a disk pub/sub feed, and an
fsync queue coexist in one process, each buying exactly the guarantee it needs. The
durable bool is a shorthand alias (true ⇒ fsync, false ⇒ disk); reach
memory only by setting durability: "memory" explicitly. See
Durability for the full treatment.
State & watermarks
GET /v0/topics/:topic returns the live state: head_seq (highest assigned seq; 0 if
never written), earliest_seq (lowest live seq; head_seq + 1 when empty), next_seq,
count, bytes, config, and effective_priority. The two watermarks are central to
the loss model and are covered under seq and Tombstone below.
Record
A record is one immutable event in a topic. Once it is assigned a seq, its fields
never change — neither deletion nor eviction mutates a record; a record is either
present and unchanged, or removed and gone.
Fields
| Field | Write key | Read key | Type | Origin | Required |
|---|---|---|---|---|---|
| Sequence | — | $seq | u64 | server | assigned at commit |
| Timestamp | — | $ts | u64 ms | server | assigned at commit |
| Origin node | node | $node | string | client | optional |
| Tag | tag | $tag | string | client | optional |
| Meta | meta | meta | object | client | optional |
| Data | data | data | arbitrary JSON | client | required |
A record returned by a read looks like this:
{ "$seq": 4096, "$ts": 1748470000123, "$node": "api-fra-1", "$tag": "order-7731",
"meta": { "content-type": "application/json" },
"data": { "sku": "AEROPRESS-GO", "qty": 1, "total": 3499 } }data is opaque — the product treats it as bytes and never inspects it. tag is the
match key for deletion. node is the origin id used for
loop-prevention. meta is small opaque
metadata/headers.
Size limits
These are hard limits enforced at write; a violation is a 400:
| Limit | Default |
|---|---|
data + meta (canonical bytes) | 1 MiB |
tag length | 256 bytes |
node length | 128 bytes |
meta total | 16 KiB, ≤ 64 keys |
| Records per write request | 10,000 |
| Total write body | 64 MiB |
A single record larger than the entire topic cap is a permanent 400 record_too_large (not
retryable) — distinct from a transient 422 topic_full.
How it composes
A record is the atom that flows through every other concept: a write assigns it a seq and
publishes it; a router copies it into another topic preserving $node; a
delete can remove it by seq or tag; and a read either delivers it or skips
it (silently, or with a tombstone standing in for a lost range).
seq
A seq is a per-topic monotonic u64 assigned by the server at commit. It is two things
at once: the order of the log and the cursor you read from. There is no opaque
cursor token for topic reads — the monotonic seq is the cursor, and you own your
position.
Assignment contract
Each topic has its own u64 counter starting at seq_base (default 1; 0 is reserved to
mean “no records”). On commit of a write of N records, the server atomically assigns
next_seq … next_seq + N − 1 and returns them in write order. A single write request is
atomic: all N records commit with contiguous seqs, or none do. Assignment happens at
commit, after WAL ordering, so seq order equals durable commit order equals delivery
order.
seq is strictly increasing and gap-free at assignment. But what a consumer
observes can have holes (4097, 4098, 4101, …): eviction, TTL, deletion, and node-filtering
all remove records from the visible set without the assigner ever skipping a number. You
MUST NOT assume received seqs are contiguous; you MAY assume they are strictly increasing.
Cursors
A cursor is a plain seq, interpreted as an exclusive lower bound: a read of
from_seq returns records with $seq > from_seq. from_seq = 0 means “from the
beginning of what’s retained”; a tail/only-new cursor is from_seq = head_seq at
subscription (the Redis $). Both diff reads and SSE also return an
explicit next_from_seq for convenience and batch boundaries.
Advancing your stored from_seq is the ack — there is no separate acknowledgement
call for cursor reads. And because skipped records (deleted, expired, node-filtered) still
advance the cursor, the reliable “no more right now” signal is the caught_up flag, not
records.length.
How it composes
seq ties everything together. Reads consume by seq. Deletes target a seq range
(before_seq). Routers reassign a fresh dst seq to each forwarded copy. SSE resume
encodes the per-topic cursor map in the id: field. And the two topic watermarks —
earliest_seq (first live seq) and evict_floor (the tombstone trigger) — are both seq
values that bound what you can still read. See
Ordering & Cursors.
Router
A router is a server-side forwarding rule source → dest: every record committed to
source is copied into dest. Routers fan out, and because the origin $node rides
through every forward untouched, N symmetric nodes can mirror to each other without echo or
loops.
Forward mechanics & ordering
Forwarding is async (off the source write/ack path) and derived. When record r
commits to source at seq s, a background per-router worker appends a forwarded copy to
dest, which assigns it a fresh dst.$seq from its own counter — unrelated to s. The
copy is derived: it is not separately WAL-logged, so one source append is one WAL
write regardless of fan-out, and the copies are re-derived on recovery by replaying from a
durable per-router cursor. A derived dest is single-source (a second router with a
different source into one dest is rejected 409 topic_exists_incompatible,
error.detail.reason: "router_dest_fan_in"). Delivery is at-least-once with per-source
FIFO: records from a single source arrive in dest in source commit order.
At-least-once means a crash between “appended to dest” and “advanced the router cursor”
can re-forward, producing duplicates in dest. Exactly-once is not offered.
Consumers must be idempotent — dedupe on $seq or a job-level key in meta.
What carries through
A forwarded copy preserves $node (verbatim — this is what makes loop-prevention work
across the route), $tag, meta, and data. Only $seq and $ts are reassigned by
dest. Because forwarding is async and the copies are derived (not WAL-logged), the source
ack never waits on the destination; the destination topic’s durability class governs only
how/whether the re-derived copy is retained and recovered — a memory dest keeps a
best-effort copy (may survive or be lost), an fsync dest fsyncs it. Deletes and node
filters are per-topic and do not propagate: a delete on source does not reach copies
already forwarded to dest.
Cycle control
Two complementary layers keep fan-out safe. Content-level node loop-prevention stops a
node from consuming its own events. Topology-level cycle control stops a record from
being forwarded around a cycle forever: creating a router that would introduce a directed
cycle is rejected at creation with 409 router_cycle. For intentional mirrors (A↔B), set
allow_cycle: true and the route uses a bounded hop-cap to guarantee forwarding
terminates. See Routers and the multi-master guide.
A full discard:"reject" (or otherwise erroring) destination is treated as backpressure:
the router does not advance its cursor and the record stays available in source, so a
chronically-failing dest lags behind its source until it recovers. Size a durable
dest ≥ source, or give dest discard:"old".
Tombstone
A tombstone is the explicit “you missed data” signal. If live records you wanted were
evicted (cap) or expired (TTL) before you read them, the read returns an in-band
tombstone with the exact missed range — at HTTP 200, never as a silent skip and never as
an HTTP error. It is the single mechanism for all non-silent loss.
The dual watermark
A tombstone exists because each topic tracks two floors, which decouples involuntary loss from voluntary deletion:
earliest_seq— the seq of the first currently-live record (not evicted, not expired, not deleted). It is monotonically non-decreasing and is advanced by eviction, TTL, and deletion. It ishead_seq + 1when the topic is empty.evict_floor— advanced only by involuntary loss: cap eviction and TTL expiry. It is the sole tombstone trigger. A voluntary delete advancesearliest_seqbut neverevict_floor.
The invariant evict_floor <= earliest_seq holds always, and it produces the whole
behavior:
- A cursor below
earliest_seqbut at/aboveevict_floorfell into a purely-deleted gap (voluntary) → the read is silent (tombstone: null); the cursor advances past the deleted seqs. - A cursor below
evict_floorlost live records to cap/TTL (involuntary) → a tombstone is emitted.
Precisely: a read for cursor from_seq emits a tombstone iff from_seq + 1 < evict_floor.
Shape
A tombstone is a small gap marker carrying the exact missed range. In a diff it rides in
the tombstone field of the response:
{ "gap_from": 478501, "gap_to": 479100,
"reason": "cap", "missed_estimate": 600,
"earliest_seq": 479101, "head_seq": 480234 }gap_from is what you asked for next; gap_to is the last seq before the first live
record; the range is inclusive on both ends. missed_estimate is the approximate dropped
count (eviction is segment-granular). reason ∈ cap | ttl | mixed | recreated | source_trim (diff; source_trim is a derived-router dest reflecting source-side eviction),
plus from_seq_too_old at SSE connect time. The reason is best-effort; the
[gap_from, gap_to] range is authoritative. In a diff this rides in the tombstone
field (null when none, at most one per response); in SSE it is a framed
event: tombstone carrying {topic, reason, gap_from, gap_to, earliest_seq, head_seq},
whose id: already advances your cursor past the gap so the resume is correct.
The four loss/removal kinds
| Kind | Detectable? | Mechanism | Consumer sees |
|---|---|---|---|
| Cap eviction (involuntary) | Yes, never silent | advances evict_floor | tombstone reason:"cap" |
| TTL expiry (involuntary) | Yes, never silent | advances evict_floor by clock | tombstone reason:"ttl" |
| Permanent deletion (voluntary) | No, intentionally silent | advances earliest_seq only | seqs simply absent |
| Node loop-prevention (voluntary) | No, intentionally silent | own-node records dropped | own seqs absent |
See Tombstones for the complete contract.
Delete
A delete is a permanent, point-in-time removal of records by seq range and/or tag
match. It is the one operation that removes data on your command — and it is deliberately
silent, so it never trips the gap alarm that tombstones own.
Five defining properties
- Permanent. Deleted records are gone for good; there is no un-delete in
/v0. To “resurrect,” write a new record. - Effective immediately. The delete is invisible to all reads at once —
diff, topic statecount/bytes, and SSE. A reader’s cursor simply advances past the deleted seqs. - Asynchronous, no compaction / no reclaim. Records are logically gone instantly (the work runs off the call path), but a deleted record stays on disk, just marked — there is no compaction and no per-record disk reclaim. On disk a delete flips an in-place delete-flag byte in segment files (the WAL stays append-only); the only space released is a whole segment dropped when a delete clears it entirely.
- Silent. A delete never produces a tombstone. It advances
earliest_seqbut notevict_floor, so reading across a purely-deleted gap returnstombstone: null. - Point-in-time. A
match-only delete is bounded by the current head at call time; future records with the same tag are never affected. It is not a standing filter.
Request grammar
At least one of before_seq or match is required (else 400 invalid_request):
before_seq(u64) — delete records with$seq < before_seq(snapshot/compaction by seq).match—["tag", "Eq", "X"]for an exact match, or["tag", "Glob", "X*"]for a trailing-prefix match (a single trailing*, no general globbing). A bare string"X"is shorthand for["tag", "Eq", "X"]. Records with no tag are never matched.
Combining the two ANDs them — match + before_seq deletes records that match the tag
and sit below the seq (e.g. publish v2 of a message, then delete its prior versions
while keeping the new one). A match delete is backed by a per-topic tag index
(tag → live seqs), so it is a point lookup or a bounded prefix range scan — never a full
log scan.
# Cancel one job by exact tag.
curl -X POST $TOPICS/v0/topics/transcode/delete \
-H 'content-type: application/json' \
-d '{ "match": ["tag", "Eq", "transcode-9001"] }'{ "topic": "transcode", "deleted": 1, "earliest_seq": 2, "head_seq": 2, "count": 1,
"performance": { "server_total_ms": 0.12 } }How it composes
A delete is committed and WAL-logged, so it survives a restart. It interacts cleanly with
everything else: it advances earliest_seq (never evict_floor), so the
tombstone machinery stays untouched; it does not propagate through
routers (delete dest separately if you need to); and it is the very same delete
path a queue ack reuses to permanently remove a completed job. See
Deletion.
How the pieces fit
Reading is a single pipeline, applied per candidate seq from your cursor, identically for
diff and SSE:
- Live-floor gate — skip if below the earliest live record. If
from_seq + 1 < evict_floor, emit a tombstone; a purely-deleted gap is skipped silently. - TTL — skip if expired (involuntary → feeds the tombstone via
evict_floor). - Deleted — skip if the slot was deleted (voluntary → silent).
- Node filter — skip if
$nodeis in the reader’s node set (voluntary → silent). - Deliver the surviving record, respecting batch
limit/ SSE flow.
Skipped seqs of any kind still advance next_from_seq. That is the whole engine: a topic
holds records ordered by seq; reads walk forward from a cursor; involuntary loss tombstones
and voluntary removal is silent; routers fan records out preserving origin; and durability
is a per-topic choice. Five concepts, one operation, one invariant.