Coresets for Data Discretization and Sine Wave Fitting

03/06/2022
by   Alaa Maalouf, et al.
58

In the monitoring problem, the input is an unbounded stream P=p_1,p_2⋯ of integers in [N]:={1,⋯,N}, that are obtained from a sensor (such as GPS or heart beats of a human). The goal (e.g., for anomaly detection) is to approximate the n points received so far in P by a single frequency sin, e.g. min_c∈ Ccost(P,c)+λ(c), where cost(P,c)=∑_i=1^n sin^2(2π/N p_ic), C⊆ [N] is a feasible set of solutions, and λ is a given regularization function. For any approximation error ε>0, we prove that every set P of n integers has a weighted subset S⊆ P (sometimes called core-set) of cardinality |S|∈ O(log(N)^O(1)) that approximates cost(P,c) (for every c∈ [N]) up to a multiplicative factor of 1±ε. Using known coreset techniques, this implies streaming algorithms using only O((log(N)log(n))^O(1)) memory. Our results hold for a large family of functions. Experimental results and open source code are provided.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset