Markian Mumba Mwangi

One of the sneakiest pitfalls in distributed systems is the dual write problem.

This happens when you try to perform two separate writes — one to your database and one to your message broker — without any atomic guarantees.

Sounds harmless, until you realise how quickly things can fall apart.

The Problem

Let’s say we have this flow:

•
A client calls the payment service to transfer $100.
•
The service writes the transaction to its database.
•
It then attempts to publish an event to a message broker (e.g., Kafka).
•
The credit service consumes the event to update the customer’s credit offer.
•
The credit service updates the customer’s credit offer in its database.

Now imagine the payment service successfully writes to the database in step 2, but then fails to publish the event in step 3 because Kafka is having a bad day.

The result? The payment is recorded in the database, but the credit service never hears about it. That’s a silent failure — no error for the user, but your systems are now out of sync.

Ways to Combat the Dual Write Problem

Thankfully, there are patterns to make this safer. Three common ones are:

1. Transactional Outbox

Here’s the trick: instead of writing to the database and the broker separately, you keep it all in one database transaction.

You do this by having two tables:

•
The main entity table (e.g., transactions)
•
An outbox table (e.g., transaction_events)

When you process the payment, you insert a row into the entity table and a row into the outbox table in the same transaction. If one fails, they both fail — atomicity guaranteed.

Then, a separate outbox processor (could be a background job or microservice) periodically polls the outbox table for events with a “pending” status, and publishes them to Kafka (or whatever broker you use).

This way, your payment service’s main logic isn’t blocked by broker issues, and you have a reliable retry mechanism for event publishing.

2. Event Sourcing

Event sourcing flips the whole approach: instead of just storing the current state, you store every single change as an event.

For example, instead of having a balance column that says “$500”, you have an event log like:

•
Deposited $200
•
Deposited $300

The current state ($500) is rebuilt by replaying these events.

Now, when a payment happens, it’s recorded in the event log database. A separate component periodically polls this log for new events with “pending” status and publishes them to Kafka.

Because the event log is the source of truth, even if publishing fails temporarily, no data is lost — you just retry later.

3. Listen to Yourself

This one leans on eventual consistency.

You write to your database first, then have an asynchronous processor (could be a change data capture tool or a scheduled job) that reads from your database and writes to your message broker.

The trade-off is that there’s a delay before other services see the update, but you avoid the risk of losing data completely.

Choosing the Right Approach

•
If you need strong guarantees that the database and broker stay in sync, go with Transactional Outbox.
•
If your system is naturally event-driven and you can rebuild state from events, Event Sourcing is a great fit.
•
If you’re okay with eventual consistency and want simpler implementation — Listen to Yourself can work well.

The key lesson is this: never trust two separate writes to succeed independently in a distributed system. Something will fail eventually, and your architecture needs to expect and handle that