Trapezoidal Sketch: A Sketch Structure for Frequency Estimation of Data Streams

12/16/2019
by   Ning Li, et al.
0

The sketch is one of the typical and widely used data structures for estimating the frequencies of items in data streams. However, since the counter sizes in the traditional rectangular-structure sketch (rstructure sketch) are the same, it is hard to achieve small space usage, high capacity (i.e., the maximum counter size of the sketch), and high estimation accuracy simultaneously. Moreover, when considering the high skewness of data streams, this will become even worse [15]. Therefore, in this paper, we firstly propose the trapezoidal-structure sketch (t-structure sketch). In the t-structure sketch, different from the r-structure sketch, the counter sizes in different layers are different. Based on this innovation, the low space usage and high capacity can be achieved simultaneously in the t-structure sketch. Then, based on the basic t-structure sketch, we propose the space saving t-structure sketch and the capacity improvement t-structure sketch, and analyze the properties of these two t-structure sketches. Finally, considering the estimation accuracy in the original version of the t-structure sketch is slightly worse than that in the r-structure sketch, we propose the probabilistic based estimation error reducing algorithm to improve the estimation accuracy in t-structure sketch. Compared with the CM sketch, CU sketch, C sketch, and A sketch, the simulation results show that the performances on space usage, capacity, and estimation accuracy are improved successfully by t-structure sketch.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset