Repeated Builds During Code Review: An Empirical Study of the OpenStack Community
Code review is a popular practice where developers critique each others' changes. Since automated builds can identify low-level issues (e.g., syntactic errors, regression bugs), it is not uncommon for software organizations to incorporate automated builds in the code review process. In such code review deployment scenarios, submitted change sets must be approved for integration by both peer code reviewers and automated build bots. Since automated builds may produce an unreliable signal of the status of a change set (e.g., due to “flaky” or non-deterministic execution behaviour), code review tools, such as Gerrit, allow developers to request a “recheck”, which repeats the build process without updating the change set. We conjecture that an unconstrained recheck command will waste time and resources if it is not applied judiciously. To explore how the recheck command is applied in a practical setting, in this paper, we conduct an empirical study of 66,932 code reviews from the OpenStack community. We quantitatively analyze (i) how often build failures are rechecked; (ii) the extent to which invoking recheck changes build failure outcomes; and (iii) how much waste is generated by invoking recheck. We observe that (i) 55 code reviews invoke the recheck command after a failing build is reported; (ii) invoking the recheck command only changes the outcome of a failing build in 42 of the cases; and (iii) invoking the recheck command increases review waiting time by an average of 2,200 enough compute resources to compete with the oldest land living animal on earth.
READ FULL TEXT