Not able to understand Vector clock usage example

Sai2 · October 1, 2023, 7:35am

I tried to read “Vector clock usage example” multiple times, still not able to understand. Can someone please explain. How it became ([A,2],[B,1]) and ([A,2],[C,1]) and then how it became [A, 5] ???

Javeria_Tariq · October 2, 2023, 2:02am

Hi @Sai2 !!
In the “Vector clock usage example,” vector clocks are used to track the causal relationships between events in a distributed system. Let’s break down the example step by step:

Initial Event E1 ([A,1]):
- Node A handles the first version of the write request, denoted as E1.
- The corresponding vector clock for this event is [A,1], indicating that Node A has performed one event.
Second Event E2 ([A,2]):
- Node A handles another write for the same object, which we denote as E2.
- Since Node A has now performed two events, the vector clock for E2 is [A,2].
Network Partition and Handling by Nodes B and C:
- Suppose a network partition occurs, and the request is now handled by two different nodes, Node B and Node C.
- These nodes each create their own events, E3 and E4, and update the object.
- The vector clocks for E3 and E4 are [A,2], [B,1] and [A,2], [C,1], respectively.
Network Partition Repaired:
- After the network partition is repaired, the client requests a write again, but now we have conflicts because the events E3, E4, and their related clocks ([A,2], [B,1]) and ([A,2], [C,1]) are now in the system.
Conflict Resolution:
- The context of the conflicts, which includes the vector clocks ([A,2], [B,1], [C,1]), is returned to the client.
- After the client performs reconciliation and Node A coordinates the write, we have Event E5.
- The vector clock for E5 becomes [A,4], indicating that Node A has performed four events.

So, to summarize:

Node A initially performed two events (E1 and E2) with vector clocks [A,1] and [A,2].
During the network partition, Node B and Node C each performed one event (E3 and E4), resulting in vector clocks [B,1] and [C,1].
After conflict resolution, when Node A coordinates the write, it updates its vector clock to [A,4] for Event E5.
I hope it helps. Happy Learning

Sai2 · October 2, 2023, 1:40pm

@Javeria_Tariq Thanks for your time. I have few doubts

In step 3, What exactly do you mean “on saying network partition occurs” ? Is it same as horizontal scaling?
In step 3, when network partition happens and say a new request came, then for the same single request, are both the nodes B and C responding? Why both the nodes?

Javeria_Tariq · October 3, 2023, 4:08am

@Sai2 In step 3, network partition refers to a situation where a distributed system, which consists of multiple nodes or servers, becomes divided into isolated sub-networks due to network issues or failures.

It is not the same as horizontal scaling. Horizontal scaling involves adding more servers or nodes to a system to handle increased load or traffic, whereas a network partition is an unintended situation where existing nodes become temporarily disconnected from each other.

Network Partition Handling (Step 3):
- In Step 3, when a network partition occurs, it means that some nodes in the distributed system are temporarily unable to communicate with other nodes. In this scenario, Node B and Node C are both involved because they are the nodes that are available and able to handle the request due to the network partition.
Handling the New Request (Step 3):
- When the network partition happens, the client’s request for a write operation is processed by the nodes that are available and can communicate with the client. In this case, it’s Node B and Node C.
- Both Node B and Node C independently handle the request, which results in separate events E3 and E4, respectively.
- This means that during the network partition, multiple nodes may handle requests simultaneously, and each node keeps its own record of the events it processes.

The key point is that network partitions can lead to situations where multiple nodes handle requests independently due to the temporary isolation of parts of the distributed system. This can result in divergent event histories, which is why vector clocks are used to track and reconcile these events when the network partition is resolved.

Vector clocks help establish causality and determine the order of events, even when multiple nodes process requests independently during a network partition. Once the network partition is repaired, the system needs to reconcile these divergent event histories, as explained in the example.
I hope it helps. Happy Learning

Sai2 · October 4, 2023, 4:08am

Hello @Javeria_Tariq,
Loving all your explanations, let me be more conceptually clear by clarifying these

After the network partition is repaired why cant the clock be [A, 3] and why it is [A, 4] ?
What advantage did we get by using vector clocks here rather than just timestamps ?
Can you explain a bit about the reconciliation process ?

Sai2 · October 5, 2023, 2:20pm

Please clarify this soon🙃

Sai2 · October 13, 2023, 12:50pm

@Javeria_Tariq please respond.
I am waiting from 10 days

vivekshankar.ram · October 13, 2023, 5:56pm

Hi Sai2,

This is how i understand for your question as to why it is [A, 4] instead of [A, 3]. If you read the article, there is a statement - “Suppose the network partition is repaired, and the client requests a write again, but now we have conflicts. The context ([A,3],[B,1],[C,1]) of the conflicts are returned to the client.”

From above, what i understand is that the client has initiated a write again which went through the node A, thus incrementing the node A counter from 2 to 3, hence the representation [A, 3] and then the entire conflict ([A,3],[B,1],[C,1]) is sent back to client which reconciles and initiates another write (reconciled value) to A, thus ending up with representation [A,4] i.e. from 3 to 4. Hope this helps.

For your 2nd and 3rd question, i would suggest reading this white paper on Dynamo DB by Amazon → https://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf. This entire article is based on this white paper!

Thanks,
Vivek

James_Gatenby · December 15, 2023, 12:25am

I think I understand the write reconciliation, but what happens when there’s a read operation while there’s still a conflict? For example, after the network partition is resolved, we are left with ([A,2],[B,1],[C,1]) if we try to do a read (three different values for the provided key). Is this context returned to the client making the read request, and they decide how to reconcile and repair the conflict with a write?

Javeria_Tariq · December 15, 2023, 4:09am

I apologize for the delayed response, @Sai2. Thank you for your patience. Let’s address your additional questions:
1. After the network partition is repaired why cant the clock be [A, 3] and why it is [A, 4] ?
In the reconciliation process, when the network partition is repaired, the vector clocks are used to merge the divergent event histories from different nodes. The vector clock [A, 4] indicates that Node A has processed four events, not just three.

The events during the network partition were as follows:

Event E1: [A, 1]
Event E2: [A, 2]
Event E3 (handled by Node B): [A, 2], [B, 1]
Event E4 (handled by Node C): [A, 2], [C, 1]

When the network partition is repaired, and the reconciliation process takes place, Node A coordinates the write. The resulting Event E5 should account for the events that occurred during the partition as well. Since Node A itself processed two events (E1 and E2), and it needs to incorporate the events processed by Nodes B and C (E3 and E4), the vector clock is updated to [A, 4].
2. What advantage did we get by using vector clocks here rather than just timestamps ?
Vector clocks provide a more detailed and context-aware way of tracking causality in a distributed system compared to simple timestamps. In a distributed environment, events can occur concurrently on different nodes, and it’s crucial to establish the causal relationships between these events.

Advantages of Vector Clocks over Timestamps:

Concurrency Detection: Vector clocks can detect concurrent events accurately. In the example, if two nodes independently processed events during the network partition, the vector clocks capture this concurrency by including the node-specific counters.
Causality Tracking: Vector clocks not only tell us the order of events but also the causal relationships. This is important for understanding how events relate to each other, especially in scenarios involving distributed databases or systems where nodes may operate independently for a time.
Resolution of Conflicts: When conflicts arise due to concurrent events, vector clocks help in resolving these conflicts during the reconciliation process. Timestamps might not provide enough information to resolve conflicts based on causality.
3: Can you explain a bit about the reconciliation process ?
The reconciliation process involves merging the divergent event histories from different nodes to create a consistent and agreed-upon state of the system. Here’s a simplified overview:
Identify Divergent Histories: Nodes share their local event histories, including vector clocks, which may have diverged during the network partition.
Conflict Detection: Nodes identify conflicting events, i.e., events that occurred concurrently or in a different order on different nodes.
Resolution: Using the information in vector clocks, the system determines the causal relationships between conflicting events. This can involve choosing a specific version of an event based on the vector clock information.
Update State: The agreed-upon events are used to update the state of the system consistently across all nodes. This ensures that, after reconciliation, all nodes have a common understanding of the events and their order.

In the example, after reconciliation, the vector clock for the coordinated write (Event E5) is updated to reflect the total number of events processed by Node A, including those that occurred during the network partition. This helps maintain a coherent and causally consistent state across the distributed system.
I hope it helps. Happy Learning

Javeria_Tariq · December 15, 2023, 4:10am

@James_Gatenby Absolutely, your understanding is on the right track. When a read operation encounters conflicting values for a given key, the context of the conflict is often returned to the client. The client can then make decisions on how to reconcile or handle the conflict based on the application’s requirements.

Here’s how the process typically unfolds:

Read Operation:
- The client initiates a read operation for a specific key.
- The read operation may involve multiple nodes if the system is distributed.
Conflict Detection:
- If the key has conflicting values due to events processed independently during a network partition or other issues, the system may return the conflicting values along with their respective vector clocks.
- In your example, if the system is left with vector clocks ([A,2],[B,1],[C,1]), it indicates that events with conflicting values occurred on nodes A, B, and C.
Context Returned to Client:
- The conflicting values and their vector clocks are returned to the client as part of the response.
- The client now has the context needed to understand the conflicting events and their causal relationships.
Client-Side Reconciliation:
- The client, based on its reconciliation strategy or application logic, can decide how to handle the conflicting values.
- This may involve choosing the most recent value, merging values, prompting user interaction for manual resolution, or applying other conflict resolution strategies.
Optional Write Operation:
- Depending on the client’s decision and the application’s requirements, the client may initiate a write operation to resolve the conflict.
- The write operation would include the reconciled value and an updated vector clock to maintain causality.

This approach empowers the client to make decisions about conflict resolution, as the client often has a better understanding of the application’s logic and the significance of the conflicting values. It allows for flexibility in handling conflicts based on the specific use case and business requirements.
I hope it helps. Happy Learning

James_Gatenby · December 15, 2023, 5:52am

Wow, what a great response!!! Thanks so much, @Javeria_Tariq! This is super detailed and very helpful. Much appreciated!