Intermediate Value Linearizability: A Quantitative Correctness Criterion
Big data processing systems often employ batched updates and data sketches to estimate certain properties of large data. For example, a CountMin sketch approximates the frequencies at which elements occur in a data stream, and a batched counter counts events in batches. This paper focuses on the correctness of concurrent implementations of such objects. Specifically, we consider quantitative objects, whose return values are from a totally ordered domain, with an emphasis on (e,d)-bounded objects that estimate a quantity with an error of at most e with probability at least 1 - d. The de facto correctness criterion for concurrent objects is linearizability. Under linearizability, when a read overlaps an update, it must return the object's value either before the update or after it. Consider, for example, a single batched increment operation that counts three new events, bumping a batched counter's value from 7 to 10. In a linearizable implementation of the counter, an overlapping read must return one of these. We observe, however, that in typical use cases, any intermediate value would also be acceptable. To capture this degree of freedom, we propose Intermediate Value Linearizability (IVL), a new correctness criterion that relaxes linearizability to allow returning intermediate values, for instance 8 in the example above. Roughly speaking, IVL allows reads to return any value that is bounded between two return values that are legal under linearizability. A key feature of IVL is that concurrent IVL implementations of (e,d)-bounded objects remain (e,d)-bounded. To illustrate the power of this result, we give a straightforward and efficient concurrent implementation of an (e, d)-bounded CountMin sketch, which is IVL (albeit not linearizable). Finally, we show that IVL allows for inherently cheaper implementations than linearizable ones.
READ FULL TEXT