GRMR: Generalized Regret-Minimizing Representatives
Extracting a small subset of representative tuples from a large database is an important task in multi-criteria decision making. The regret-minimizing set (RMS) problem is recently proposed for representative discovery from databases. Specifically, for a set of tuples (points) in d dimensions, an RMS problem finds the smallest subset such that, for any possible ranking function, the relative difference in scores between the top-ranked point in the subset and the top-ranked point in the entire database is within a parameter ε∈ (0,1). Although RMS and its variations have been extensively investigated in the literature, existing approaches only consider the class of nonnegative (monotonic) linear functions for ranking, which have limitations in modeling user preferences and decision-making processes. To address this issue, we define the generalized regret-minimizing representative (GRMR) problem that extends RMS by taking into account all linear functions including non-monotonic ones with negative weights. For two-dimensional databases, we propose an optimal algorithm for GRMR via a transformation into the shortest cycle problem in a directed graph. Since GRMR is proven to be NP-hard even in three dimensions, we further develop a polynomial-time heuristic algorithm for GRMR on databases in arbitrary dimensions. Finally, we conduct extensive experiments on real and synthetic datasets to confirm the efficiency, effectiveness, and scalability of our proposed algorithms.
READ FULL TEXT