Constructing Temporal Networks of OSS Programming Language Ecosystems
One of the primary factors that encourage developers to contribute to open source software (OSS) projects is the collaborative nature of OSS development. However, the collaborative structure of these communities largely remains unclear, partly due to the enormous scale of data to be gathered, processed, and analyzed. In this work, we utilize the World Of Code dataset, which contains commit activity data for millions of OSS projects, to build collaboration networks for ten popular programming language ecosystems, containing in total over 290M commits across over 18M projects. We build a collaboration graph representation for each language ecosystem, having authors and projects as nodes, which enables various forms of social network analysis on the scale of language ecosystems. Moreover, we capture the information on the ecosystems' evolution by slicing each network into 30 historical snapshots. Additionally, we calculate multiple collaboration metrics that characterize the ecosystems' states. We make the resulting dataset publicly available, including the constructed graphs and the pipeline enabling the analysis of more ecosystems.
READ FULL TEXT