In the context of event-driven architectures (EDAs), we are currently seeing increasing popularity in using the Outbox Pattern, as it allows us to ensure consistency between published events and database records. Although this pattern seems to be a good fit in many scenarios, it comes with a significant downside.
The problem at hand
First things first. Why do we need the Outbox Pattern in the first place? Let's consider the simple example of a Customer Service, which needs to store a new Customer in a database and also needs to publish an event to an Event Stream like Apache Kafka or Apache Pulsar, that a new Customer was added.
This sounds like a straightforward scenario — until we want to ensure that a new Customer which was emitted to the Event Stream was also always created in the database, and vice versa. This is a well-known problem in EDAs, as we cannot span a single transaction over a database and Event Streams (theoretically we could, but most modern Event Streams do not support complex two-phase commit protocols like XA — for good reasons).
Let's consider the following examples which show the problem:
- We store a new customer in the database. Our service dies before we can forward the event to our Event Stream. The new Customer is only reflected in the database.
- We send the event first to our Event Stream, but the database-write fails. We cannot take back the event anymore. The new Customer is only reflected in the Event Stream.
- We start a database transaction and store the new Customer in the database. Sending the event to the event steam succeeds. Unfortunately, the final database commit fails. Again, the new Customer is only reflected in the Event Stream.
In all scenarios, we leave our system in an inconsistent state. And even if this list of examples is not complete, the result stays the same.
The Outbox Pattern, FTW
The Outbox Pattern (often implemented as Transaction Log Tailing Pattern 1) is a simple way to overcome this problem by ensuring that one database transaction is used to ensure that the database update and the publishing to an Event Stream were both either successful or not executed.
This works by introducing an Outbox Table, which is used to store events that are later on forwarded to the Event Stream. Observing the Outbox can be achieved by manually implementing a scheduler or using a Change-Data-Capture (CDC) mechansimn like Debezium 2.
Since I often observe this as a misunderstanding in my daily work with teams: Please be aware that the Outbox Pattern usually only guarantees at-least-once-delivery semantics. It does not ensure exactly-once-delivery as (3) could fail, and an event could be picked up again and sent multiple times. Thus it is recommended to have an idempotence-id within the events, with which the consumer can filter out duplicates.
The downsides of the Outbox Pattern
As shown above, this pattern works well and solves the initially described problem. Unfortunately, it comes with a significant downside. It turns the database into the system's bottleneck — something we wanted to avoid with EDAs in the first place. Combined with widespread use, this can become a significant problem for the resilience and elasticity of our overall system.
Start to Listen to Yourself
There is another pattern which is not so well known, but solves the same issue in a much more efficient way in many cases. We tend to call it the "listen to yourself" pattern 3. In the context of our initial example, it would look like this.
The Customer Service only forwards the event to the Event Stream, and subsequent listens to its own events to store them in the database — that's not rocket science. Still, it ensures that the corresponding Customer was always also created in the database if an event was successfully emitted to the Event Stream.
By isolating the database from being our bottleneck, we gain better performance and can utilize the Event Stream's full potential. Under consideration of the drawbacks which this pattern comes with like (i) events and database writes are required to be idempotent to avoid duplicates, (ii) and the increase in eventual consistency, the performance, resilience, and elasticity advantages usually exceed.
Or, let others listen to you
Taking this idea further could even lead us to Event Sourcing 4, where basically the events act as a source of truth. This philosophy requires no duplication to the database anymore, and everyone interested in customer changes could simply stream-process the Customer Event Store owned by the Customer Service.
In conclusion both mentioned approaches are good alternatives to the Outbox Pattern. Thus think twice if you really need to introduce an Outbox for your problem at hand or if one of the proposed alternatives does not solve the same issue in a much simpler and more performant way.
PS: If you like to not only listen to yourself (pun intended) but also to the experience we have in building scaled event-driven, reactive and cloud-native systems, feel free to reach out. 👋
- Transaction Log Tailing Pattern: https://microservices.io/patterns/data/transaction-log-tailing.html
- The Outbox Pattern with CDC and Debezium: https://debezium.io/blog/2019/02/19/reliable-microservices-data-exchange-with-the-outbox-pattern/
- Listen to Yourself Pattern: https://medium.com/@odedia/listen-to-yourself-design-pattern-for-event-driven-microservices-16f97e3ed066
- Martin Fowler — Event Sourcing: https://martinfowler.com/eaaDev/EventSourcing.html