Tradeoffs in quorum-based replication

Isaac_Haseley · February 13, 2024, 8:50pm

From this page: “for the larger value of r, we focus more on availability and compromise consistency.” Is this backwards?

Higher values of r mean waiting to hear back from more replicas before returning a read result. This improves consistency by making sure more replicas have the latest write before returning any data.
Lower values of r mean waiting for fewer replicas before returning a read result. This improves availability since you don’t have to wait as long, but risks returning stale data if not all replicas have the latest write.

Course: Grokking Modern System Design Interview for Engineers & Managers - Learn Interactively
Lesson: Versioning Data and Achieving Configurability

Ali_Hassan · March 5, 2024, 4:42am

Hi Issac,

for the larger value of r, the system ensures that more replicas must respond for a read operation to be considered successful. This increases the likelihood of availability, as the system can still respond even if some replicas are temporarily unavailable or inconsistent.

However, this might compromise consistency since it’s possible for different replicas to have different versions of the data due to asynchrony in updates.

Thank you.

Isaac_Haseley · March 5, 2024, 7:44pm

Hey @Ali_Hassan!

for larger values of r, … the system can still respond even if some replicas are temporarily unavailable or inconsistent

Do you mean lower values of r? For larger values, the system won’t respond if replicas are down.

However, [larger values of r] might compromise consistency since it’s possible for different replicas to have different versions of the data due to asynchrony in updates.

Do you mean lower values of r? For larger values, a read operation ensures that many nodes are in sync, and therefore have the same versions of the data, before returning successful.

Ali_Hassan · April 4, 2024, 10:20am

Hi Isaac,

I apologize if my previous response caused any confusion. Let me clarify it:

Lower Values of r:
When r is low, the system can still respond even if some replicas are temporarily unavailable or inconsistent. However, this approach may compromise consistency. Different replicas might have varying versions of the data due to asynchrony in updates.

Larger Values of r:
For larger values of r, the system won’t respond if replicas are down. In this scenario, a read operation ensures that many nodes are in sync and therefore have the same versions of the data before returning a successful response.

Thank you.

Isaac_Haseley · April 4, 2024, 10:13pm

Hey @Ali_Hassan!

Thanks, that checks out! So to the original question, is the original statement from the lesson backwards?

“for the larger value of r, we focus more on availability and compromise consistency.”

Ali_Hassan · April 17, 2024, 12:06pm

Hi Isaac,

No, the original statement from the lesson is not backward. In the context of distributed systems, ‘r’ refers to the minimum number of nodes required for a successful read operation.

Availability:

Larger r (More Replicas): Increases availability. If one replica fails, reads and writes can still be performed on other replicas. The system remains accessible with some redundancy.
Smaller r (Fewer Replicas): Decreases availability. If the single replica (r=1) fails, the entire system becomes unavailable for both reads and writes.

Consistency:

Larger r (More Replicas): Reduces consistency. Updates (writes) need to be propagated to all replicas, which takes time. Users might see slightly outdated data on some replicas until updates are fully synchronized.
Smaller r (Fewer Replicas): Improves consistency. Updates are faster to propagate to the single replica (or a smaller number). Users are more likely to see the latest data.

Thank you.

Isaac_Haseley · April 17, 2024, 10:33pm

Hey @Ali_Hassan!

Thanks for sticking with me, admittedly I’m still confused here:

Availability:

“If the single replica (r=1) fails, the entire system becomes unavailable for both reads and writes.”

If r = 1, this doesn’t mean the system only has one replica, it means it only requires that one replica respond in order to report a successful read op to the user and consider the system available. Say we have 10 servers and r = 1. Then only one of our ten servers has to respond successfully, so our chances of success are high, so our availability is high. In contrast, say r = 10. Then all ten servers have to respond successfully, so our chances of success are low, so our availability is low.

Consistency:

You mention that larger r means writes need to propagate to all replicas, and lower r means writes just need to propagate to a single replica. Do you mean larger/lower w? w refers to the number of replicas we need to write to, whereas r refers to the number we need to read from.

Ali_Hassan · April 22, 2024, 7:34am

Hi Isaac,

Thank you for sticking with us for so long to clear up the confusion. After a detailed discussion with some senior resources, we ended up with the following response to your queries related to tradeoffs in quorum-based replication:

“r” refers to the number of replicas required to respond successfully for a read operation to be successful in a system. So, the value of “r” affects the availability and consistency as follows:

The larger value of “r”

Decreases availability and improves consistency: If “r” replicas are required to respond, and there are fewer than “r” functional servers, the read operation fails. The system becomes less available as the probability of encountering unavailable servers increases with a higher “r” value. Similarly, updates (writes) need to be propagated to at least “r” replicas, reducing the chance of users seeing outdated data and improving consistency.

Example: r = 5, all 5 servers need to respond successfully for a read operation. This ensures strong consistency (users see the latest data) but reduces availability. If even one server fails out of 5, the system becomes unavailable for reads and writes.

The smaller value of “r”

Increases availability and decreases consistency: Even if some servers are down, as long as “r” replicas respond, the read operation succeeds. The system remains more available with a lower“r” value. Updates might only be written to a smaller subset of replicas, potentially leading to situations where users see inconsistent data across different replicas until updates are fully propagated.

Example: let’s say r = 1, the system remains available as long as at least one server can respond to a read request. This offers high availability but comes at the cost of potential inconsistency (users might see outdated data if the responding server hasn’t received the latest update).

Similarly, The parameter “w” refers to the number of replicas that need to acknowledge the write operation before it’s considered successful. This is known as the write quorum. Here’s what it means in practice:

If “w” is set to a larger value, the write operation needs to be acknowledged by more replicas. This can improve consistency because more replicas will have the latest data. However, it can decrease availability if some replicas are slow or unavailable because the system has to wait for more acknowledgments before it can consider the write operation successful.
If “w” is set to a smaller value, the write operation needs to be acknowledged by fewer replicas. This can improve availability because the system can respond to write requests more quickly. However, it can decrease consistency if some replicas don’t get updated with the latest data.

So, the choice of w involves a trade-off between consistency and availability.

So as a conclusion, the statement “The reason is that for the larger value of r, we focus more on availability and compromise consistency” is backward compatible as well, and we’ll add/update such details for a better understanding of the learners in our next revamp soon.

We hope this clears up the confusion.

Thank you.