Memory at Your Service: Fast Memory Allocation for Latency-critical Services
Co-location and memory sharing between latency-critical services, such as key-value store and web search, and best-effort batch jobs is an appealing approach to improving memory utilization in multi-tenant datacenter systems. However, we find that the very diverse goals of job co-location and the GNU/Linux system stack can lead to severe performance degradation of latency-critical services under memory pressure in a multi-tenant system. We address memory pressure for latency-critical services via fast memory allocation and proactive reclamation. We find that memory allocation latency dominates the overall query latency, especially under memory pressure. We analyze the default memory management mechanism provided by GNU/Linux system stack and identify the reasons why it is inefficient for latency-critical services in a multi-tenant system. We present Hermes, a fast memory allocation mechanism in user space that adaptively reserves memory for latency-critical services. It advises Linux OS to proactively reclaim memory of batch jobs. We implement Hermes in GNU C Library. Experimental result shows that Hermes reduces the average and the 99^th percentile memory allocation latency by up to 54.4 latency-critical services, Hermes reduces both the average and the 99^th percentile tail query latency by up to 40.3 jemalloc and TCMalloc, Hermes reduces Service Level Objective violation by up to 84.3
READ FULL TEXT 
  
  
     share
 share