On Range Summary Queries
We study the query version of the approximate heavy hitter and quantile problems. In the former problem, the input is a parameter ε and a set P of n points in ℝ^d where each point is assigned a color from a set C, and we want to build a structure s.t. given any geometric range γ, we can efficiently find a list of approximate heavy hitters in γ∩ P, i.e., colors that appear at least ε |γ∩ P| times in γ∩ P, as well as their frequencies with an additive error of ε |γ∩ P|. In the latter problem, each point is assigned a weight from a totally ordered universe and the query must output a sequence S of 1+1/ε weights s.t. the i-th weight in S has approximate rank iε|γ∩ P|, meaning, rank iε|γ∩ P| up to an additive error of ε|γ∩ P|. Previously, optimal results were only known in 1D [WY11] but a few sub-optimal methods were available in higher dimensions [AW17, ACH+12]. We study the problems for 3D halfspace and dominance queries. We consider the real RAM model with integer registers of size w=Θ(log n) bits. For dominance queries, we show optimal solutions for both heavy hitter and quantile problems: using linear space, we can answer both queries in time O(log n + 1/ε). Note that as the output size is 1/ε, after investing the initial O(log n) searching time, our structure takes on average O(1) time to find a heavy hitter or a quantile! For more general halfspace heavy hitter queries, the same optimal query time can be achieved by increasing the space by an extra log_w1/ε (resp. loglog_w1/ε) factor in 3D (resp. 2D). By spending extra log^O(1)1/ε factors in time and space, we can also support quantile queries.
READ FULL TEXT