During applying kafka brokers in real production project, it’s obvious that there is situation where you need to make them durable enough to meet business requirement.
What does it really mean? Let compare following two failure scenarios. Let’s assume we have a simple cluster state as following diagram. Producer keeps sending the events to a partition with two replica, one leader, the other is follower. Then the follower is failed by any certain reason.
In the first scenario, you expect to bring the brokers back to be available as soon as possible. Then, the business continuity could be remained high. This is the situation where missing the certain events is not a critical impact to the business. For instance, social network notification.
For this scenario, you need to configure the brokers to match at least two portions.
+ One replica could be promoted to leader even it is out of sync. Kafka supports this with a configured property named unclean.leader.election.enable
+ The business feature continues to run even there is only one replica available. Then the minimum in-sync replica is only one.
For the second scenario, the feature requires a durability over availability. Meaning the feature doesn’t need to run if it potentially causes data integrity, data consistency issues. It runs into the harmful situations when there is less replica for reliable processing than expected.
With that expectation, we need to force stop the business right a particular threshold meet.
It concludes you need to choose the appropriate configuration for each feature you have in the system. Each of them has relevant tradeoff.