educative.io

The Spectrum of Failure Models - Grokking Modern System Design Interview for

Fail-stop
In this type of failure, a node in the distributed system halts permanently. However, the other nodes can still detect that node by communicating with it.

If a node has halted, how could other nodes communicate with it?
Thank you!

Hi @Luis_Tung

Here halt means that the node detected that due to some failure, it can no longer perform its tasks. So it halts doing its work. Though it can still reply to simple ping-style queries or say it is no longer operational. Due to this thing, it is easy to manage, otherwise, remotely detecting a failure of a node is very difficult to do reliably.

Happy learning!

1 Like

So in situation of “Crash”, ping-style queries can’t even be replied?
What is the difference between “Crash” and “omission failures”?
thank you!

Hi again @Luis_Tung

We decided to update the lesson’s content to clarify it for interested learners like yourself. Please be patient with us till we add examples to each failure type. In the meantime, please continue to explore the course.

Happy learning!