Are University Rankings Statistically Significant? A Comparison among Chinese Universities and with the USA
Purpose: We address the question of whether differences are statistically significant in the rankings of universities. We propose methods measuring the statistical significance among different universities and illustrate the results by empirical data. Design/methodology/approach: Based on z-testing and overlapping confidence intervals, and using data about 205 Chinese universities included in the Leiden Rankings 2020, we argue that three main groups of Chinese research universities can be distinguished. Findings: When the sample of 205 Chinese universities is merged with the 197 US universities included in Leiden Rankings 2020, the results similarly indicate three main groups: high, middle, low. Using this data (Leiden Rankings and Web-of-Science), the z-scores of the Chinese universities are significantly below those of the US universities albeit with some overlap. Research limitations: We show empirically that differences in ranking may be due to changes in the data, the models, or the modeling effects on the data. The scientometric groupings are not always stable when we use different methods. R D policy implications: Differences among universities can be tested for their statistical significance. The statistics relativize the values of decimals in the rankings. One can operate with a scheme of low/middle/high in policy debates and leave the more fine-grained rankings of individual universities to operational management and local settings. Originality/value: In the discussion about the rankings of universities, the question of whether differences are statistically significant, is, in our opinion, insufficiently addressed.
READ FULL TEXT