A Loop-Based Methodology for Reducing Computational Redundancy in Workload Sets
The design of general purpose processors relies heavily on a workload gathering step in which representative programs are collected from various application domains. Processor performance, when running the workload set, is profiled using simulators that model the targeted processor architecture. However, simulating the entire workload set is prohibitively time-consuming, which precludes considering a large number of programs. To reduce simulation time, several techniques in the literature have exploited the internal program repetitiveness to extract and execute only representative code segments. Existing so- lutions are based on reducing cross-program computational redundancy or on eliminating internal-program redundancy to decrease execution time. In this work, we propose an orthogonal and complementary loop- centric methodology that targets loop-dominant programs by exploiting internal-program characteristics to reduce cross-program computational redundancy. The approach employs a newly developed framework that extracts and analyzes core loops within workloads. The collected characteristics model memory behavior, computational complexity, and data structures of a program, and are used to construct a signature vector for each program. From these vectors, cross-workload similarity metrics are extracted, which are processed by a novel heuristic to exclude similar programs and reduce redundancy within the set. Finally, a reverse engineering approach that synthesizes executable micro-benchmarks having the same instruction mix as the loops in the original workload is introduced. A tool that automates the flow steps of the proposed methodology is developed. Simulation results demonstrate that applying the proposed methodology to a set of workloads reduces the set size by half, while preserving the main characterizations of the initial workloads.
READ FULL TEXT