Achievable Stability in Redundancy Systems
We consider a system with N parallel servers where incoming jobs are immediately replicated to, say, d servers. Each of the N servers has its own queue and follows a FCFS discipline. As soon as the first job replica is completed, the remaining replicas are abandoned. We investigate the achievable stability region for a quite general workload model with different job types and heterogeneous servers, reflecting job-server affinity relations which may arise from data locality issues and soft compatibility constraints. Under the assumption that job types are known beforehand we show for New-Better-than-Used (NBU) distributed speed variations that no replication (d=1) gives a strictly larger stability region than replication (d>1). Strikingly, this does not depend on the underlying distribution of the intrinsic job sizes, but observing the job types is essential for this statement to hold. In case of non-observable job types we show that for New-Worse-than-Used (NWU) distributed speed variations full replication (d=N) gives a larger stability region than no replication (d=1).
READ FULL TEXT