System Design Interview: Partial Failure & Durability

May 8, 2026

After talking about idempotency, I thought I’d share another good follow-up question:

“What happens if your service crashes in the middle of a request?”

Let’s say:

transfer record inserted ✅
ledger update started ❌
database connection drops
service dies

Now you have partial data.

And in payment systems, partial success is usually worse than failure.

What if your service crashes in the middle of a request?

The first answer most people give #

👉 Use database transactions (ACID).

Yes — that’s correct. And it’s the first layer.

Wrap related operations inside a transaction so either everything commits or everything rolls back. No partial writes, no inconsistent state.

But real systems taught me something important:

Durability isn’t only an application problem.

The layer most people miss #

Even if your code is perfect, databases can still fail:

primary node crashes
network partitions happen
infrastructure restarts unexpectedly

That’s why production systems rely heavily on database failover.

Postgres High Availability setups — EnterpriseDB (which I’ve personally used to build a local cloud database environment), Patroni, and others — or managed services like AWS RDS Multi-AZ can promote a replica to primary within milliseconds.

From the service perspective, the database almost never truly “goes down”.

Reliable systems are layered #

Reliable systems usually come from multiple layers working together:

Idempotency → retries are safe
Transactions → no partial writes
Database failover → infrastructure survives failure

You don’t build systems assuming things won’t fail. You build systems assuming they definitely will.