Automatically Tuning the GCC Compiler to Optimize the Performance of Applications Running on Embedded Systems
This paper introduces a novel method for automatically tuning the selection of compiler flags to optimize the performance of software intended to run on embedded hardware platforms. We begin by developing our approach on code compiled by the GNU C Compiler (GCC) for the ARM Cortex-M3 (CM3) processor; and we show how our method outperforms the industry standard -O3 optimization level across a diverse embedded benchmark suite. First we quantify the potential gains by using existing iterative compilation approaches that time-intensively search for optimal configurations for each benchmark. Then we adapt iterative compilation to output a single configuration that optimizes performance across the entire benchmark suite. Although this is a time-consuming process, our approach constructs an optimized variation of -O3, which we call -Ocm3, that realizes nearly two thirds of known available gains on the CM3 and significantly outperforms a more complex state-of-the-art predictive method in cross-validation experiments. Finally, we demonstrate our method on additional platforms by constructing two more optimization levels that find even more significant speed-ups on the ARM Cortex-A8 and 8-bit AVR processors.
READ FULL TEXT