Characterizing Performance Bugs in Deep Learning Systems
Deep learning (DL) has been increasingly applied to a variety of domains. The programming paradigm shift from traditional systems to DL systems poses unique challenges in engineering DL systems. Performance is one of the challenges, and performance bugs(PBs) in DL systems can cause severe consequences such as excessive resource consumption and financial loss. While bugs in DL systems have been extensively investigated, PBs in DL systems have hardly been explored. To bridge this gap, we present the first comprehensive study to characterize symptoms, root causes, and introducing and exposing stages of PBs in DL systems developed in TensorFLow and Keras, with a total of 238 PBs collected from 225 StackOverflow posts. Our findings shed light on the implications on developing high performance DL systems, and detecting and localizing PBs in DL systems. We also build the first benchmark of 56 PBs in DL systems, and assess the capability of existing approaches in tackling them. Moreover, we develop a static checker DeepPerf to detect three types of PBs, and identify 488 new PBs in 130 GitHub projects.62 and 18 of them have been respectively confirmed and fixed by developers.
READ FULL TEXT