Synergy via Redundancy: Adaptive Replication Strategies and Fundamental Limits
The maximum possible throughput (or the rate of job completion) of a multi-server system is typically the sum of the service rates of individual servers. Recent work shows that launching multiple replicas of a job and canceling them as soon as one copy finishes can boost the throughput, especially when the service time distribution has high variability. This means that redundancy can, in fact, create synergy among servers such that their overall throughput is greater than the sum of individual servers. This work seeks to find the fundamental limit of the throughput boost achieved by job replication and the optimal replication policy to achieve it. While most previous works consider upfront replication policies, we expand the set of possible policies to delayed launch of replicas. The search for the optimal adaptive replication policy can be formulated as a Markov Decision Process, using which we propose two myopic replication policies, MaxRate and AdaRep, to adaptively replicate jobs. In order to quantify the optimality gap of these and other policies, we derive upper bounds on the service capacity, which provide fundamental limits on the throughput of queueing systems with redundancy.
READ FULL TEXT