Assessing Achievability of Queries and Constraints
Assessing and improving the quality of data in data-intensive systems are fundamental challenges that have given rise to numerous applications targeting transformation and cleaning of data. However, while schema design, data cleaning, and data migration are nowadays reasonably well understood in isolation, not much attention has been given to the interplay between the tools that address issues in these areas. Our focus is on the problem of determining whether there exist sequences of data-transforming procedures that, when applied to the (untransformed) input data, would yield data satisfying the conditions required for performing the task in question. Our goal is to develop a framework that would address this problem, starting with the relational setting. In this paper we abstract data-processing tools as black-box procedures. This abstraction describes procedures by a specification of which parts of the database might be modified by the procedure, as well as by the constraints that specify the required states of the database before and after applying the procedure. We then proceed to study fundamental algorithmic questions arising in this context, such as understanding when one can guarantee that sequences of procedures apply to original or transformed data, when they succeed at improving the data, and when knowledge bases can represent the outcomes of procedures. Finally, we turn to the problem of determining whether the application of a sequence of procedures to a database results in the satisfaction of properties specified by either queries or constraints. We show that this problem is decidable for some broad and realistic classes of procedures and properties, even when procedures are allowed to alter the schema of instances.
READ FULL TEXT