Hi @unybble
In simple terms, a distributed system is considered reliable if it keeps delivering its services even when one or several of its software or hardware components fail. Thus, reliability represents one of the main characteristics of any distributed system. Any failing machine can always be replaced by another healthy one in such systems, ensuring the completion of the requested task.
Reliability is the ability for a system to remain available over a period of time. Reliable systems are those that can continuously perform their core functions without service disruptions, errors, or significant reductions in performance. However, there are many different ways a system can fail, especially as a system becomes larger, more dynamic, and more complex. Our systems—and the people operating those systems—must be able to recover from these failures. This recoverability is called resilience. In order to maximize availability, systems must be both reliable and resilient.
Hope it will clear your confusion, Happy Learning