Durability Classes
Durability in topics is a per-topic decision, not a server-wide mode. Each topic picks one
of four commit classes — ephemeral, memory, disk, or fsync — and they form a
weak → strong spectrum that trades write latency against crash survival.
Where “ok” lands
Every write either stays resident-only in RAM or travels the disk path — from your process, to the kernel, to the physical disk. The classes differ in where the record lands and how far topics waits before it acks:
ephemeralpublishes resident-only records and skips the WAL / segment path. The topic config persists, records are intentionally gone after restart, and seqs remain monotonic.memoryuses the WAL path but makes no durability promise. Fastest disk-like class; after a restart its records may survive or be gone.diskacks once the write is in the group-committed write-ahead log and on its way to the kernel. It survives a process crash, minus the most recent un-fsynced tail. The default.fsyncwaits for the data to be fsynced to the platter before acking, so it survives any crash, including power loss.
Pick the weakest class a topic can tolerate and it pays the least latency; reach for fsync
only where an acknowledgment must be a promise you can never take back. Because the choice is
per topic, a throwaway cache (memory), a pub/sub feed (disk), and a financial ledger
(fsync) coexist with a RAM-only live feed (ephemeral) in one process without taxing each
other.
The class is resolved from the topic’s current config (durability_class()) and reported on
every topic-state and topic-create response. The topic type is immutable, but durability/config can
be updated in place — the resolved class always reflects the current config.
The four classes
The defining promise across all four: an acked write is published under the class you selected, and topics never claims a stronger crash guarantee than that class provides. A write that fails to commit publishes nothing. What differs between classes is where the record lands, when the ack fires, and what a crash costs.
ephemeral
Resident-only records. A durability:"ephemeral" topic publishes from RAM and intentionally
skips the record WAL, snapshots, and HOT segments. It is fully queryable while the process is
running, including getState, getDifference, SSE, and router destinations, but records are
empty after restart by design.
The topic config always persists as a control frame, and checkpoints preserve the published
head without payloads. That means post-restart writes keep moving forward and do not reuse seqs,
even though the old resident records are gone. The ack is never fsync-gated, so fsync_ms is
0.
Reach for ephemeral for RAM-only live fan-out where the wire contract and monotonic seqs
matter, but replay across restart does not. It is reachable only by setting
durability:"ephemeral" explicitly.
memory
“disk-like but best-effort.” A memory topic takes the same group-committed WAL write
and recovery path as disk and is fully queryable (getState / getDifference / SSE) —
but it carries NO durability guarantee. The ack is never fsync-gated, so fsync_ms is
0 (the fastest path).
After a restart its records MAY survive OR be lost — recovery is gradual / best-effort: it does not block readiness and does not guarantee completeness or emptiness. The topic config always persists (it is a control frame in the WAL). The one hard bound is no-fabrication / no-future-seq: a recovered record is always one that was actually written, and the head never hands out a seq past what was acked (a best-effort restart may legitimately regress the head if the un-fsynced tail was lost — there is no durable seq reservation).
Effectively disk minus the durability promise. Reach for memory for caches and scratch
state where occasional loss is an acceptable trade for the lowest disk-like latency. It is
reachable only by setting durability:"memory" explicitly.
disk
Records are written to the WAL and group-committed — no per-write fsync — so fsync_ms
is 0 (the fast path). The write is acked as soon as its frame is enqueued to its topic’s
WAL-shard writer (the WAL is sharded); the ack is not
fsync-gated. The shard writer then group-commits and fdatasyncs the batch shortly after.
A crash loses the un-fsynced tail — the frames that were enqueued but not yet
group-fsynced. Everything older is recovered by WAL replay. This is the default and the
pub/sub workhorse: durable enough that a clean shutdown loses nothing, fast enough that the
common feed isn’t paying for a guarantee it doesn’t want. disk is today’s durable:false.
fsync
The ack is fsync-gated: it is held until the WAL frame is durably synced, so the response
carries a real fsync_ms. The write survives any crash — an acked write is always
recovered by WAL replay.
This is the class for job queues, financial events, and anything where an acknowledgment is a
promise you can never take back. It is today’s durable:true. The per-write cost is the disk’s
fsync latency, but adaptive group commit amortizes one fsync across a whole batch of
concurrent durable writers, so the per-event cost approaches a sequential disk append rather
than one fsync per record. See WAL & group commit.
fsync durability is bounded by the disk’s fdatasync latency. On a laptop’s APFS NVMe
that floor is roughly 5 ms; server-grade NVMe is typically 50–500 µs (about 10×
faster). This is a hardware property, not a design cost — group commit hides it under
concurrency, but a single serial durable writer pays the floor per write. See
Performance.
Comparison
| Class | WAL? | Ack fires | Survives a crash? | fsync_ms |
|---|---|---|---|---|
ephemeral | records: no; config/control frames: yes | immediately, not fsync-gated | records: no; config persists and seqs do not reuse after checkpointed heads | 0 |
memory | yes, group-committed (same path as disk) | immediately, not fsync-gated | best-effort — records MAY survive OR be lost (no guarantee); config always persists | 0 |
disk | yes, group-committed (no per-write fsync) | on WAL-frame enqueue, not fsync-gated | yes, minus the un-fsynced tail | 0 |
fsync | yes, fsync-gated | after the group fsync | yes, any crash — an acked write is always recovered | real value |
Choosing and configuring a class
Set the class with the durability field at create time:
# An fsync-gated ledger: every ack is a promise the write survives power loss.
curl -X PUT $TOPICS/v0/topics/payments \
-H 'content-type: application/json' \
-d '{ "durability": "fsync", "cap_records": 0, "ttl_ms": 0 }'{ "topic": "payments", "created": true,
"config": { "ttl_ms": 0, "cap_records": 0, "cap_bytes": 0, "discard": "old",
"durable": true, "durability": "fsync", "auto_create": true,
"idempotency_window_ms": 120000, "dedupe_node": true },
"performance": { "server_total_ms": 0.22 } }The class defaults to disk when neither durability nor durable is set, so an
auto-created topic (one materialized lazily on first write) is disk. To get ephemeral or
memory, you must say so explicitly.
The durable bool
durable is a shorthand alias for the common disk / fsync cases:
durable: true⇒fsyncdurable: false⇒diskephemeralis reachable only viadurability:"ephemeral"memoryis reachable only viadurability:"memory"
An explicit durability always wins over durable. On the way out, durable is normalized
to durable == (durability == "fsync"), so the boolean reports whether the topic is
fsync-gated regardless of how the topic was configured.
Internally, is_durable() is simply class == "fsync".
Durability only governs persistence across a crash or restart. It is independent of
retention: ttl_ms and the cap_records/cap_bytes caps still apply to every class, and
involuntary eviction or expiry of live records always surfaces as a
tombstone — even on ephemeral and memory topics, for as long
as those topics live.
Routers and dead-letter honor the destination class
Router forwarding is async (off the source write/ack path) and derived — the forwarded copy is not separately WAL-logged, so one source append is one WAL write regardless of fan-out, and copies are re-derived on recovery by replaying from a durable per-router cursor. The destination topic’s commit class governs how/whether that re-derived copy (and a dead-lettered job) is retained and recovered — not the source’s:
- A
fsyncdestination retains/recovers the copy under fsync semantics. - A
diskdestination retains it group-committed, recovered minus the un-fsynced tail. - A
memorydestination keeps a best-effort copy — may survive or be lost on a restart (the dest config always persists), exactly like a direct write to that topic. - An
ephemeraldestination keeps the copy resident-only while the process is running and loses it on restart by design.
So if you route an fsync source into a memory or ephemeral destination, the forwarded
copies follow that weaker destination contract while the source recovers in full. The
durability of the copy follows where it lands, not where it came from. Size and class your
destinations deliberately. See Routers and the
Multi-master guide.
Router forwarding is at-least-once via the durable per-router cursor. A persistently-failing
or full discard:"reject" destination is held as backpressure (the record stays available
in source, the cursor does not advance), so a stuck destination lags until it recovers.
Queue lease durability is best-effort (leases_durable defaults false); a transient WAL
error on a lease append degrades to the queue’s baseline at-least-once rather than losing or
duplicating work. See the maturity notes in the Introduction.
How a crash actually recovers
On restart the engine loads the latest snapshot, replays the WAL forward from the snapshot’s
checkpoint, truncates a torn tail (any frame whose length runs past EOF or whose XXH3-64
checksum fails to verify), and reclaims orphaned segments. An acked durable write is, by
construction, a complete, checksum-valid WAL frame — so it is never lost. The only data a
crash can cost is what a class explicitly does not promise: an ephemeral topic’s resident
records, a memory topic’s records, or a disk topic’s un-fsynced tail. Those losses surface
to consumers as ordinary eviction-style gaps. See Recovery.
See also
- Tombstones — how lost data (including a crash-dropped tail) appears to a consumer.
- WAL & group commit — how
diskandfsyncwrites are committed off-lock. - Recovery — snapshot load, WAL replay, and torn-tail truncation.
- Configure a topic — the full config field table including
durability.