nanoBench: A Low-Overhead Tool for Running Microbenchmarks on x86 Systems
We present nanoBench, a tool for evaluating small microbenchmarks using hardware performance counters on Intel and AMD x86 systems. Most existing tools and libraries are intended to either benchmark entire programs, or program segments in the context of their execution within a larger program. In contrast, nanoBench is specifically designed to evaluate small, isolated pieces of code. Such code is common in microbenchmark-based hardware analysis techniques. Unlike previous tools, nanoBench can execute microbenchmarks directly in kernel space. This allows to benchmark privileged instructions, and it enables more accurate measurements. The reading of the performance counters is implemented with minimal overhead avoiding functions calls and branches. As a consequence, nanoBench is precise enough to measure individual memory accesses. We illustrate the utility of nanoBench at the hand of two case studies. First, we briefly discuss how nanoBench has been used to determine the latency, throughput, and port usage of more than 12,000 instruction variants on recent x86 processors. Second, we show how to generate microbenchmarks to precisely characterize the cache architectures of ten Intel Core microarchitectures. This includes the most comprehensive analysis of the employed cache replacement policies to date.
READ FULL TEXT