Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks
Statically estimating the number of processor clock cycles it takes to execute a basic block of assembly instructions in steady state (throughput) is important for compiler backend optimizations such as register allocation, instruction selection and instruction scheduling. This is complicated specially in modern x86-64 Complex Instruction Set Computer (CISC) machines with sophisticated processor microarchitectures. Traditionally, compiler writers invest time experimenting and referring to processor manuals to analytically model modern processors with incomplete specifications. This is tedious, error prone and should be done for each processor generation. We present Ithemal, the first automatically learnt estimator to statically predict throughput of a set of basic block instructions using machine learning. Ithemal uses a novel Directed Acyclic Graph-Recurrent Neural Network (DAG-RNN) based data-driven approach for throughput estimation. We show that Ithemal is accurate than state-of-the-art hand written tools used in compiler backends and static machine code analyzers. In particular, our model has a worst case average error of 10.53 of 19.57 machine code analyzer when compared on three different microarchitectures, while predicting throughput values at a faster rate than aforementioned tools. We also show that Ithemal is portable, learning throughput estimation for Intel Nehalem, Haswell and Skylake microarchitectures without requiring changes to its structure.
READ FULL TEXT