Multi-level Forwarding and Scheduling Recovery Algorithm in Rapidly-changing Network for Erasure-coded Clusters
A key design goal of erasure-coded clusters is to reduce the repair time. The existing Erasure-coded data repair schemes are roughly classified into two categories: 1. Designing rapid data repair (e.g., PPR) in a homogeneous environment. 2. Constructing data repair (e.g., PPT) based on bandwidth in a heterogeneous environment. However, these solutions are difficult to cope with the heterogeneous and Rapidly-changing network in erasure-coded clusters. To address this problem, a bandwidth-aware multi-level forwarding repair algorithm, called BMFRepair, is proposed. BMFRepair monitors the network bandwidth in real time when data is forwarded, and selects idle nodes with high-bandwidth links to assist in forwarding. Thus, it can reduce the time bottleneck caused by low link transmission. At the same time, multi-node repair becomes very complicated when the bandwidth changes drastically. A multi-node scheduling repairing algorithm, called MSRepair, is proposed for multi-node repairing problems, which can repair multiple failed blocks in parallel by scheduling node resources. The two algorithms can flexibly adapt to the rapidly changing network environment and make full use of the bandwidth resources of idle nodes. Most importantly, algorithms can continuously adjust the repair plan according to the bandwidth change in fast and dynamic network. The algorithms have been evaluated by both simulations on Mininet and real experiments on Aliyun cloud platform ECS. Results show that compared with the state-of-the-art repair schemes PPR and PPT, the algorithms can significantly reduce the repair time in rapidly-changing network.
READ FULL TEXT