Moving Beyond the Mean: Analyzing Variance in Software Engineering Experiments
Software Engineering (SE) experiments are traditionally analyzed with statistical tests (e.g., t-tests, ANOVAs, etc.) that assume equally spread data across treatments (i.e., the homogeneity of variances assumption). Differences across treatments' variances in SE are not seen as an opportunity to gain insights on technology performance, but instead, as a hindrance to analyze the data. We have studied the role of variance in mature experimental disciplines such as medicine. We illustrate the extent to which variance may inform on technology performance by means of simulation. We analyze a real-life industrial experiment on Test-Driven Development (TDD) where variance may impact technology desirability. Evaluating the performance of technologies just based on means (as traditionally done in SE) may be misleading. Technologies that make developers resemble more to each other (i.e., technologies with smaller variances) may be more suitable if the aim is minimizing the risk of adopting them in real practice.
READ FULL TEXT