SEER: Performance-Aware Leader Election in Single-Leader Consensus
Modern stateful web services and distributed SDN controllers rely on log replication to omit data loss in case of fail-stop failures. In single-leader execution, the leader replica is responsible for ordering log updates and the initiation of distributed commits, in order to guarantee log consistency. Network congestions, resource-heavy computation, and imbalanced resource allocations may, however, result in inappropriate leader election and increased cluster response times. We present SEER, a logically centralized approach to performance prediction and efficient leader election in leader-based consensus systems. SEER autonomously identifies the replica that minimizes the average cluster response time, using prediction models trained dynamically at runtime. To balance the exploration and exploitation, SEER explores replicas' performance and updates their prediction models only after detecting significant system changes. We evaluate SEER in a traffic management scenario comprising [3..7] Raft replicas, and well-known data-center and WAN topologies. Compared to the Raft's uniform leader election, SEER decreases the mean control plane response time by up to 32 election procedure and a slight increase in leader reconfiguration frequency, the latter being tunable with a guaranteed upper bound. No safety properties of Raft are invalidated by SEER.
READ FULL TEXT