Size Generalizability of Graph Neural Networks on Biological Data: Insights and Practices from the Spectral Perspective
We investigate the question of whether the knowledge learned by graph neural networks (GNNs) from small graphs is generalizable to large graphs in the same domain. Prior works suggest that the distribution shift, particularly in the degree distribution, between graphs of different sizes can lead to performance degradation in the graph classification task. However, this may not be the case for biological datasets where the degrees are bounded and the distribution shift of degrees is small. Even with little degree distribution shift, our observations show that GNNs' performance on larger graphs from the same datasets still degrades, suggesting other causes. In fact, there has been a lack of exploration in real datasets to understand the types and properties of distribution shifts caused by various graph sizes. Furthermore, previous analyses of size generalizability mostly focus on the spatial domain. To fill these gaps, we take the spectral perspective and study the size generalizability of GNNs on biological data. We identify a distribution shift between small and large graphs in the eigenvalues of the normalized Laplacian/adjacency matrix, indicating a difference in the global node connectivity, which is found to be correlated with the node closeness centrality. We further find that despite of the variations in global connectivity, graphs of different sizes share similar local connectivity, which can be utilized to improve the size generalizability of GNNs. Based on our spectral insights and empirical observations, we propose a model-agnostic strategy, SIA, which uses size-irrelevant local structural features, i.e., the local closeness centrality of a node, to guide the learning process. Our empirical results demonstrate that our strategy improves the graph classification performance of various GNNs on small and large graphs when training with only small graphs.
READ FULL TEXT