What's Live? Understanding Distributed Consensus
Distributed consensus algorithms such as Paxos have been studied extensively. They all use a same definition of safety. Liveness is especially important in practice despite well-known theoretical impossibility results. However, many different liveness properties and assumptions have been stated, and there are no systematic comparisons for better understanding of these properties. This paper studies and compares different liveness properties stated for over 30 well-known consensus algorithms and variants. We build a lattice of liveness properties combining a lattice of the assumptions used and a lattice of the assertions made, and we compare the strengths and weaknesses of algorithms that ensure these properties. Our precise specifications and systematic comparisons led to the discovery of a range of problems in various stated liveness properties, from lacking assumptions or too weak assumptions for which no liveness assertions can hold, to too strong assumptions making it trivial or uninteresting to achieve the assertions. We also developed TLA+ specifications of these liveness properties. We show that model checking execution steps using TLC can illustrate liveness patterns for single-valued Paxos on up to 4 proposers and 4 acceptors in a few hours, but becomes too expensive for multi-valued Paxos or more processes.
READ FULL TEXT