How Different Are Different diff Algorithms in Git? Use --histogram for Code Changes

02/07/2019
by   Yusuf Sulistyo Nugroho, et al.
0

Automatic identification of the differences between two versions of a file is a common and basic task in several applications of mining code repositories. Git, a version control system, has a diff utility and users can select algorithms of diff from the default algorithm Myers to the advanced Histogram algorithm. From our systematic mapping, we identified three popular applications of diff in recent studies. On the impact on code churn metrics in 14 Java projects, we obtained different values in 1.7 the different diff algorithms. Regarding bug-introducing change identification, we found 6.0 of bug-introducing changes from 10 Java projects. For patch application, we found that the Histogram is more suitable than Myers for providing the changes of code, from our manual analysis. Thus, we strongly recommend using the Histogram algorithm when mining Git repositories to consider differences in source code.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset