Permanent Deletion
A delete removes records that exist in a topic right now, by seq range and/or by
tag match. It is permanent, effective immediately on every reader, asynchronous (with no
compaction / no reclaim — deleted records stay on disk, just marked),
silent (never a tombstone), and point-in-time (a future record with the same tag is
never touched). This page specifies all five properties, the request grammar, the three
patterns it enables, the per-topic tag index that makes it cheap, and the one watermark
interaction that keeps deletion from ever tripping the gap alarm.
The endpoint is POST /v0/topics/:topic/delete; the field-by-field reference lives in
Deletion API. This page is the why and the contract.
The five properties
A delete is a one-shot operation against the records present at call time — not a standing filter that keeps matching future writes. Five properties define it:
- Permanent. Deleted records are gone for good. There is no un-delete in
/v0. To bring data back, write a new record. A delete is committed and WAL-logged, so it survives restart — a recovered topic does not resurrect deleted records. - Effective immediately. The delete is invisible to all reads at once:
diff, topic state (count/bytes), and live SSE. A reader’s cursor simply advances past the deleted seqs. A lagging consumer cannot “miss a deletion,” because the record is gone for every reader simultaneously. - Asynchronous, no compaction / no reclaim. Records are logically gone instantly — the
payload and tag are freed and
count/bytesdrop immediately (the work runs off the call path) — but a deleted record stays on disk, just marked: there is no compaction and no per-record disk reclaim. In memory, front-of-log slots pop as the prefix becomes fully dead; on disk a record sealed into a segment has its delete-flag byte flipped in place (the WAL stays append-only — aDeleteframe is appended, never mutated). The only physical space released is a whole segment dropped when a delete clears it entirely. Logical correctness never waits on any of this. - Silent. A delete never produces a tombstone. Tombstones
are reserved for involuntary capacity loss (cap eviction, TTL expiry). Reading across
a purely-deleted gap returns
tombstone: null. - Point-in-time. A
match-only delete is bounded by the current head at call time (the bound ishead_seq + 1). Future records — even with a matching tag — are never deleted. Revoking a kicked user’s chat history removes only what exists at that instant, not a message an in-flight producer sends a moment later.
The defining distinction in topics: involuntary loss you did not ask for (cap / TTL) always tombstones; voluntary removal you requested (a delete, your own node’s records) is silently filtered. Deletion is voluntary, so it is silent — mixing the two would make the gap alarm useless. See Explicit Loss & Tombstones.
Request grammar — before_seq vs match
A delete targets records by seq range (before_seq), by tag (match), or both
(ANDed together). At least one of the two is required, or the call is 400 invalid_request.
| Field | Type | Meaning |
|---|---|---|
before_seq | u64 | Delete every record with $seq < before_seq. |
match | predicate | ["tag","Eq","X"] exact, or ["tag","Glob","X*"] trailing-prefix. A bare string is shorthand: "X*" (trailing *) ⇒ Glob, otherwise ⇒ Eq. |
The match predicate is deliberately narrow — there is no general globbing or regex:
["tag","Eq","X"]— exact, byte-for-byte equality. Resolves to a point lookup in the tag index.["tag","Glob","X*"]— trailing-prefix only. The pattern must end with a single*, which is stripped to a literal prefix; a record matches iff its$taghas that prefix. The trailing*is the only wildcard. This keeps a tag delete a point lookup or a bounded prefix range scan over the tag index — never a full-log scan, never a regex engine."X"— a bare string is shorthand for["tag","Eq","X"]; a bare string ending in*(e.g."user-1042:*") is shorthand for the correspondingGlobprefix.
Records with no tag are never matched by a match (they can only be removed by
before_seq).
A Glob value must end with a trailing *, or the call is rejected 400 invalid_request. That trailing * is the only wildcard: it is stripped to a literal
prefix, and any other character — including a * elsewhere in the pattern — is matched
byte-for-byte. There is no [char], ?, or interior wildcard.
Semantics by combination
The two fields combine into three precise behaviors:
| Body | Effect |
|---|---|
before_seq only | Delete every record with $seq < before_seq. The seq-snapshot / compaction path. |
match only | Delete every existing record whose tag matches, bounded by head_seq + 1 at call time. Point-in-time. |
match + before_seq | Delete records that satisfy both — tag matches and $seq < before_seq. |
The three patterns
These three shapes cover the practical uses of point-in-time deletion.
Snapshot / compaction
Snapshot / drop-prefix by seq. You have read and persisted everything below some seq
elsewhere; drop it from the topic. before_seq only is an O(1) marker delete — it never touches
the tag index. (This is logical removal, not physical compaction — see the no-reclaim note
above; disk space is only released when a whole segment is cleared.)
# Drop everything the durable consumer has already checkpointed past.
curl -X POST $TOPICS/v0/topics/orders/delete \
-H 'content-type: application/json' \
-d '{ "before_seq": 480001 }'{ "topic": "orders", "deleted": 14, "earliest_seq": 480001, "head_seq": 480234,
"count": 234, "bytes": 478820, "performance": { "server_total_ms": 0.12 } }The per-topic tag index
A match delete must not scan the whole log. Each topic keeps a tag index mapping
each live tag to its live seqs in ascending order — conceptually a
BTreeMap<String, Vec<u64>>:
- Exact
Eq "X"→ point lookup of the entry for"X". - Prefix
Glob "X*"→ range scan over keys in["X", next-key).
The index is populated on append (for tagged records), and pruned on delete and on
front reclaim. A tag delete therefore touches only the matching seqs — it frees their
payloads and prunes their index entries, nothing more. Since $tag is bounded to 256 bytes
the per-key cost is bounded. Measured: an exact tag match resolves in ~267 ns; a prefix
match over 100 tags in ~67.8 µs.
A before_seq delete does not use the tag index at all. It is applied in O(1) via a
delete_below marker (the maximum before_seq ever applied); subsequent reads start at
max(from_seq + 1, base_seq) and skip any remaining deleted, expired, or node-filtered
slots as they go.
How a delete moves the watermarks
This is the load-bearing interaction. A topic has a dual watermark:
earliest_seq— the seq of the first currently-live record (not evicted, not TTL-expired, not deleted). A delete that removes a prefix advancesearliest_seq.evict_floor— advanced only by involuntary loss: cap eviction and TTL expiry. A delete never touches it.
Because evict_floor <= earliest_seq always holds, and a delete moves only the upper of
the two:
- A cursor that falls below
earliest_seqbut stays at or aboveevict_floorfell into a purely-deleted gap → the read is silent (tombstone: null), the cursor advances past the deleted seqs. - A cursor below
evict_floorlost live records to cap/TTL → a tombstone fires.
That single inequality is why a delete can never be mistaken for capacity loss. The
read pipeline skips deleted slots silently while still
advancing the cursor, and the tombstone is computed solely from from_seq vs
evict_floor — entirely independent of any deletes.
Deletion does not propagate through routers. A delete on src removes
records from src, but a copy may already have been forwarded to dst. To remove it in
dst too, issue a delete on dst — which is exactly why the default router preserves the
$tag (preserve_tag: true), so the same match works on both sides.
Response
The response reports the count removed plus the post-delete topic state. count and bytes
already exclude the deleted records.
{ "topic": "orders",
"deleted": 14,
"earliest_seq": 480001,
"head_seq": 480234,
"count": 234,
"bytes": 478820,
"performance": { "server_total_ms": 0.12 } }| Field | Meaning |
|---|---|
deleted | Count of records removed by this call. |
earliest_seq | New first live seq (advanced past any deleted prefix). |
head_seq / count / bytes | Topic state after the delete. |
See also
- Deletion API — the endpoint, full field tables, and errors.
- Explicit Loss & Tombstones — the dual watermark and why deletes are silent.
- Ordering & Cursors — how skipped (deleted) seqs still advance the cursor.
- Routers API — why deletes don’t propagate, and
preserve_tag.