Kudu: An Efficient and Scalable Distributed Graph Pattern Mining Engine
This paper proposes Kudu, a general distributed execution engine with a well-defined abstraction that can be integrated with various existing single-machine graph pattern mining (GPM) systems. With this approach, the programming interfaces and codes based on existing GPM systems do not change and Kudu can transparently enable the distributed execution. The key novelty is extendable embedding which can express pattern enumeration algorithm and enable fine-grained task scheduling. To enable efficient scheduling, we propose a novel BFS-DFS hybrid exploration method that generates sufficient concurrent tasks without incurring high memory consumption. The computation and communication of Kudu can be further optimized with several effective techniques. We implemented two scalable distributed GPM systems by porting Automine and GraphPi on Kudu. Our evaluation shows that Kudu-based systems significantly outperform state-of-the-art graph partition-based GPM systems by up to three orders of magnitude, achieve similar or even better performance compared with the fastest graph replication-based systems, and scale to large datasets with graph partitioning.
READ FULL TEXT