Path-Based Function Embedding and its Application to Specification Mining

02/21/2018
by   Daniel Defreez, et al.
0

Relationships among program elements is useful for program understanding, debugging, and analysis. One such kind of relationship is synonymous functions. Function synonyms are functions that play a similar role in code; examples include functions that perform initialization for different device drivers, and functions that implement different symmetric-key encryption schemes. Function synonyms are not necessarily semantically equivalent and can be syntactically dissimilar; consequently, approaches for identifying code clones or functional equivalence cannot be used to identify them. This paper presents func2vec, an algorithm that maps each function to a vector in a vector space such that function synonyms are grouped together. We compute the function embedding by training a neural network using sentences generated using random walks of the interprocedural control-flow graph. We show the effectiveness of func2vec in identifying function synonyms in the Linux kernel. Furthermore, we show how knowing function synonyms enables mining error-handling specifications with high support in Linux file systems and drivers.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset