Sparse Partitioning Around Medoids
Partitioning Around Medoids (PAM, k-Medoids) is a popular clustering technique to use with arbitrary distance functions or similarities, where each cluster is represented by its most central object, called the medoid or the discrete median. In operations research, this family of problems is also known as facility location problem (FLP). FastPAM recently introduced a speedup for large k to make it applicable for larger problems, but the method still has a runtime quadratic in N. In this chapter, we discuss a sparse and asymmetric variant of this problem, to be used for example on graph data such as road networks. By exploiting sparsity, we can avoid the quadratic runtime and memory requirements, and make this method scalable to even larger problems, as long as we are able to build a small enough graph of sufficient connectivity to perform local optimization. Furthermore, we consider asymmetric cases, where the set of medoids is not identical to the set of points to be covered (or in the interpretation of facility location, where the possible facility locations are not identical to the consumer locations). Because of sparsity, it may be impossible to cover all points with just k medoids for too small k, which would render the problem unsolvable, and this breaks common heuristics for finding a good starting condition. We, hence, consider determining k as a part of the optimization problem and propose to first construct a greedy initial solution with a larger k, then to optimize the problem by alternating between PAM-style "swap" operations where the result is improved by replacing medoids with better alternatives and "remove" operations to reduce the number of k until neither allows further improving the result quality. We demonstrate the usefulness of this method on a problem from electrical engineering, with the input graph derived from cartographic data.
READ FULL TEXT