Structure and Stability of Internet Top Lists
Active Internet measurement studies rely on a list of targets to be scanned. While probing the entire IPv4 address space is feasible for scans of limited complexity, more complex scans do not scale to measuring the full Internet. Thus, a sample of the Internet can be used instead, often in form of a "top list". The most widely used list is the Alexa Global Top1M list. Despite their prevalence, use of top lists is seldomly questioned. Little is known about their creation, representativity, potential biases, stability, or overlap between lists. As a result, potential consequences of applying top lists in research are not known. In this study, we aim to open the discussion on top lists by investigating the aptness of frequently used top lists for empirical Internet scans, including stability, correlation, and potential biases of such lists.
READ FULL TEXT