A Model of Job Parallelism for Latency Reduction in Large-Scale Systems
Processing computation-intensive jobs at multiple processing cores in parallel is essential in many real-world applications. In this paper, we consider an idealised model for job parallelism in which a job can be served simultaneously by d distinct servers. The job is considered complete when the total amount of work done on it by the d servers equals its size. We study the effect of parallelism on the average delay of jobs. Specifically, we analyze a system consisting of n parallel processor sharing servers in which jobs arrive according to a Poisson process of rate n λ (λ <1) and each job brings an exponentially distributed amount of work with unit mean. Upon arrival, a job selects d servers uniformly at random and joins all the chosen servers simultaneously. We show by a mean-field analysis that, for fixed d ≥ 2 and large n, the average occupancy of servers is O(log (1/(1-λ))) as λ→ 1 in comparison to O(1/(1-λ)) average occupancy for d=1. Thus, we obtain an exponential reduction in the response time of jobs through parallelism. We make significant progress towards rigorously justifying the mean-field analysis.
READ FULL TEXT