Equivalence-Invariant Algebraic Provenance for Hyperplane Update Queries
The algebraic approach for provenance tracking, originating in the semiring model of Green et. al, has proven useful as an abstract way of handling metadata. Commutative Semirings were shown to be the "correct" algebraic structure for Union of Conjunctive Queries, in the sense that its use allows provenance to be invariant under certain expected query equivalence axioms. In this paper we present the first (to our knowledge) algebraic provenance model, for a fragment of update queries, that is invariant under set equivalence. The fragment that we focus on is that of hyperplane queries, previously studied in multiple lines of work. Our algebraic provenance structure and corresponding provenance-aware semantics are based on the sound and complete axiomatization of Karabeg and Vianu. We demonstrate that our construction can guide the design of concrete provenance model instances for different applications. We further study the efficient generation and storage of provenance for hyperplane update queries. We show that a naive algorithm can lead to an exponentially large provenance expression, but remedy this by presenting a normal form which we show may be efficiently computed alongside query evaluation. We experimentally study the performance of our solution and demonstrate its scalability and usefulness, and in particular the effectiveness of our normal form representation.
READ FULL TEXT