3 min read

Using Aeron Cluster as a source of truth

Most teams reaching for a consensus system are looking for coordination, leader election, distributed locks, configuration management. Aeron Cluster handles all of that. But there's a less obvious use case we've been running in production: using the cluster log itself as the primary source of truth for application state.

This post is about what that looks like, what it buys you, and where it bites back.

What Aeron Cluster Actually Is

Aeron is a high-performance messaging library built for low-latency, high-throughput scenarios: primarily used in financial systems and anything where microseconds matter. Aeron Cluster adds Raft-based consensus on top, giving you a replicated state machine across a set of nodes.

The core abstraction: you have a cluster of nodes. One is the leader. All writes go through the leader, get appended to a replicated log, and are applied to state in log order. Followers replay the log. If the leader fails, a new one is elected and the log continues from the last committed position.

Standard Raft, but with Aeron's transport underneath, which means the messaging layer is genuinely fast.

The Unconventional Part: Log as Database

The typical pattern with a consensus system is: use it for coordination, use a separate database for state. Leader election via Raft, data in Postgres or Cassandra. That separation makes sense in most architectures.

What we explored was collapsing that separation. The Aeron Cluster log IS the database. Every state change is a command appended to the log. State is reconstructed by replaying from the beginning, or from a snapshot plus the delta since that snapshot. There is no external storage system. The cluster nodes are the storage.

This is the event sourcing pattern taken to its logical extreme, with a consensus layer ensuring every node agrees on the sequence of events.

What You Get

  • Linearisability by default: every read and write goes through the cluster, so you never have stale reads or split-brain scenarios
  • Full audit history: the log is your state and your change history simultaneously; you can replay to any point in time
  • Deterministic recovery: a node that falls behind simply replays the log to catch up, no complex reconciliation needed
  • Extremely low write latency on the hot path, Aeron's transport is purpose-built for this

The linearisability point is the one that matters most for us. We were operating in a domain where consistency was non-negotiable. Using a separate database always meant carefully choreographing writes to keep the database and the cluster state in sync. Collapsing them removed an entire class of consistency bugs.

What You Give Up

Reads have to go through the leader. There's no read scaling: you can't distribute read load across followers without accepting stale reads, which defeats the point. If your workload is read-heavy, this is a serious constraint.

Log compaction and snapshotting are your responsibility. Aeron Cluster supports snapshotting, but you have to implement the serialisation and deserialisation of your state machine. For complex state this is non-trivial. Get it wrong and recovery becomes a debugging exercise in production.

The operational complexity is real. Aeron Cluster is not Postgres: there's no ecosystem of backup tools, ORMs, or monitoring integrations. Everything that a database gives you for free, you're building or operating yourself.

Cluster membership changes also require care. Adding a node to a running cluster, handling split-brain scenarios during network partitions, managing disk usage as the log grows, all of it needs deliberate design.

When It Makes Sense

We found this model works well for state that is write-heavy, consistency-critical, and bounded in size. Financial positions, order books, game state: things where the value of the data is in its correctness and recoverability, not in its queryability.

If you need arbitrary queries, secondary indexes, or high read throughput, a traditional database is the right tool. Aeron Cluster as a database is not a general-purpose solution, it's a specialised pattern for a narrow set of requirements that happen to align well with certain domains.

The Bottom Line

Running Aeron Cluster as your source of truth is unconventional, and deliberately so. It trades general-purpose flexibility for tight consistency guarantees and excellent write performance. The operational overhead is front-loaded, there's a lot to build and understand before it runs reliably in production.

But for the right workload, the simplicity you gain by eliminating the synchronisation layer between your messaging system and your database is significant. Fewer moving parts, one fewer thing that can get out of sync. For us, that trade was worth it.