Skip to Content
Core GuaranteesPermanent Deletion

Permanent Deletion

A delete removes records that exist in a topic right now, by seq range and/or by tag match. It is permanent, effective immediately on every reader, asynchronous (with no compaction / no reclaim — deleted records stay on disk, just marked), silent (never a tombstone), and point-in-time (a future record with the same tag is never touched). This page specifies all five properties, the request grammar, the three patterns it enables, the per-topic tag index that makes it cheap, and the one watermark interaction that keeps deletion from ever tripping the gap alarm.

The endpoint is POST /v0/topics/:topic/delete; the field-by-field reference lives in Deletion API. This page is the why and the contract.

The five properties

A delete is a one-shot operation against the records present at call time — not a standing filter that keeps matching future writes. Five properties define it:

  • Permanent. Deleted records are gone for good. There is no un-delete in /v0. To bring data back, write a new record. A delete is committed and WAL-logged, so it survives restart — a recovered topic does not resurrect deleted records.
  • Effective immediately. The delete is invisible to all reads at once: diff, topic state (count / bytes), and live SSE. A reader’s cursor simply advances past the deleted seqs. A lagging consumer cannot “miss a deletion,” because the record is gone for every reader simultaneously.
  • Asynchronous, no compaction / no reclaim. Records are logically gone instantly — the payload and tag are freed and count / bytes drop immediately (the work runs off the call path) — but a deleted record stays on disk, just marked: there is no compaction and no per-record disk reclaim. In memory, front-of-log slots pop as the prefix becomes fully dead; on disk a record sealed into a segment has its delete-flag byte flipped in place (the WAL stays append-only — a Delete frame is appended, never mutated). The only physical space released is a whole segment dropped when a delete clears it entirely. Logical correctness never waits on any of this.
  • Silent. A delete never produces a tombstone. Tombstones are reserved for involuntary capacity loss (cap eviction, TTL expiry). Reading across a purely-deleted gap returns tombstone: null.
  • Point-in-time. A match-only delete is bounded by the current head at call time (the bound is head_seq + 1). Future records — even with a matching tag — are never deleted. Revoking a kicked user’s chat history removes only what exists at that instant, not a message an in-flight producer sends a moment later.

The defining distinction in topics: involuntary loss you did not ask for (cap / TTL) always tombstones; voluntary removal you requested (a delete, your own node’s records) is silently filtered. Deletion is voluntary, so it is silent — mixing the two would make the gap alarm useless. See Explicit Loss & Tombstones.

Request grammar — before_seq vs match

A delete targets records by seq range (before_seq), by tag (match), or both (ANDed together). At least one of the two is required, or the call is 400 invalid_request.

FieldTypeMeaning
before_sequ64Delete every record with $seq < before_seq.
matchpredicate["tag","Eq","X"] exact, or ["tag","Glob","X*"] trailing-prefix. A bare string is shorthand: "X*" (trailing *) ⇒ Glob, otherwise ⇒ Eq.

The match predicate is deliberately narrow — there is no general globbing or regex:

  • ["tag","Eq","X"] — exact, byte-for-byte equality. Resolves to a point lookup in the tag index.
  • ["tag","Glob","X*"]trailing-prefix only. The pattern must end with a single *, which is stripped to a literal prefix; a record matches iff its $tag has that prefix. The trailing * is the only wildcard. This keeps a tag delete a point lookup or a bounded prefix range scan over the tag index — never a full-log scan, never a regex engine.
  • "X" — a bare string is shorthand for ["tag","Eq","X"]; a bare string ending in * (e.g. "user-1042:*") is shorthand for the corresponding Glob prefix.

Records with no tag are never matched by a match (they can only be removed by before_seq).

A Glob value must end with a trailing *, or the call is rejected 400 invalid_request. That trailing * is the only wildcard: it is stripped to a literal prefix, and any other character — including a * elsewhere in the pattern — is matched byte-for-byte. There is no [char], ?, or interior wildcard.

Semantics by combination

The two fields combine into three precise behaviors:

BodyEffect
before_seq onlyDelete every record with $seq < before_seq. The seq-snapshot / compaction path.
match onlyDelete every existing record whose tag matches, bounded by head_seq + 1 at call time. Point-in-time.
match + before_seqDelete records that satisfy both — tag matches and $seq < before_seq.

The three patterns

These three shapes cover the practical uses of point-in-time deletion.

Snapshot / drop-prefix by seq. You have read and persisted everything below some seq elsewhere; drop it from the topic. before_seq only is an O(1) marker delete — it never touches the tag index. (This is logical removal, not physical compaction — see the no-reclaim note above; disk space is only released when a whole segment is cleared.)

# Drop everything the durable consumer has already checkpointed past. curl -X POST $TOPICS/v0/topics/orders/delete \ -H 'content-type: application/json' \ -d '{ "before_seq": 480001 }'
{ "topic": "orders", "deleted": 14, "earliest_seq": 480001, "head_seq": 480234, "count": 234, "bytes": 478820, "performance": { "server_total_ms": 0.12 } }

The per-topic tag index

A match delete must not scan the whole log. Each topic keeps a tag index mapping each live tag to its live seqs in ascending order — conceptually a BTreeMap<String, Vec<u64>>:

  • Exact Eq "X" → point lookup of the entry for "X".
  • Prefix Glob "X*" → range scan over keys in ["X", next-key).

The index is populated on append (for tagged records), and pruned on delete and on front reclaim. A tag delete therefore touches only the matching seqs — it frees their payloads and prunes their index entries, nothing more. Since $tag is bounded to 256 bytes the per-key cost is bounded. Measured: an exact tag match resolves in ~267 ns; a prefix match over 100 tags in ~67.8 µs.

A before_seq delete does not use the tag index at all. It is applied in O(1) via a delete_below marker (the maximum before_seq ever applied); subsequent reads start at max(from_seq + 1, base_seq) and skip any remaining deleted, expired, or node-filtered slots as they go.

How a delete moves the watermarks

This is the load-bearing interaction. A topic has a dual watermark:

  • earliest_seq — the seq of the first currently-live record (not evicted, not TTL-expired, not deleted). A delete that removes a prefix advances earliest_seq.
  • evict_floor — advanced only by involuntary loss: cap eviction and TTL expiry. A delete never touches it.

Because evict_floor <= earliest_seq always holds, and a delete moves only the upper of the two:

  • A cursor that falls below earliest_seq but stays at or above evict_floor fell into a purely-deleted gap → the read is silent (tombstone: null), the cursor advances past the deleted seqs.
  • A cursor below evict_floor lost live records to cap/TTL → a tombstone fires.

That single inequality is why a delete can never be mistaken for capacity loss. The read pipeline skips deleted slots silently while still advancing the cursor, and the tombstone is computed solely from from_seq vs evict_floor — entirely independent of any deletes.

Deletion does not propagate through routers. A delete on src removes records from src, but a copy may already have been forwarded to dst. To remove it in dst too, issue a delete on dst — which is exactly why the default router preserves the $tag (preserve_tag: true), so the same match works on both sides.

Response

The response reports the count removed plus the post-delete topic state. count and bytes already exclude the deleted records.

{ "topic": "orders", "deleted": 14, "earliest_seq": 480001, "head_seq": 480234, "count": 234, "bytes": 478820, "performance": { "server_total_ms": 0.12 } }
FieldMeaning
deletedCount of records removed by this call.
earliest_seqNew first live seq (advanced past any deleted prefix).
head_seq / count / bytesTopic state after the delete.

See also

Last updated on