Sampling connected subgraphs: nearly-optimal mixing time bounds, nearly-optimal ε-uniform sampling, and perfect uniform sampling
We study the connected subgraph sampling problem: given an integer k ≥ 3 and a simple graph G=(V,E), sample a connected induced k-node subgraph of G (also called k-graphlet) uniformly at random. This is a fundamental graph mining primitive, with applications in social network analysis and bioinformatics. In this work we give several results that improve substantially upon the current state of the art: 1) We settle the mixing time of the graphlet random walk. We show it is Θ̃(t(G) ·ρ(G)^k-1), where t(G) is the mixing time of G and ρ(G) is the ratio between its maximum and minimum degree. As a consequence, we obtain state-of-the-art running time bounds for random-walk graphlet sampling. 2) We give the first efficient algorithm for perfectly uniform graphlet sampling. The algorithm has a preprocessing phase that uses time O(n + m logΔ) and space O(n), and a sampling phase that uses k^O(k)O(logΔ) time per sample. 3) We give a nearly-optimal algorithm for ϵ-uniform graphlet sampling. The preprocessing runs in time k^O(k)O(1/ϵ n log n ), which is sublinear when G is not too sparse, and space O(n). The sampling takes k^O(k)O((1/ϵ)^10log1/ϵ) expected time per sample. This is nearly optimal, as any ϵ-uniform algorithm has worst-case expected running time Ω(n).
READ FULL TEXT