Explicit constructions of MSR codes for clustered distributed storage: The rack-aware storage model
The paper is devoted to the problem of erasure coding in distributed storage. We consider a model of storage that assumes that nodes are organized into equally sized groups, called racks, that within each group the nodes can communicate freely without taxing the system bandwidth, and that the only information transmission that counts is the one between the racks. This assumption implies that the nodes within each of the racks can collaborate before providing information to the failed node. The main emphasis of the paper is on code construction for this storage model. We present an explicit family of MDS array codes that support recovery of a single failed node from any number of helper racks using the minimum possible amount of inter-rack communication (such codes are said to provide optimal repair). The codes are constructed over finite fields of size comparable to the code length. We also derive a bound on the number of symbols accessed at helper nodes for the purposes of repair, and construct a code family that approaches this bound, while still maintaining the optimal repair property. Finally, we present a construction of scalar Reed-Solomon codes that support optimal repair for the rack-oriented storage model.
READ FULL TEXT