Integrity Constraints Revisited: From Exact to Approximate Implication
Integrity constraints such as functional dependences (FD), and multi-valued dependences are fundamental in database schema design. Likewise, probabilistic conditional independences (CI) are crucial for reasoning about multivariate probability distributions. The implication problem studies whether a set of constraints (antecedents) implies another constraint (consequent), and has been investigated in both the database and the AI literature, under the assumption that all constraints hold exactly. However, many applications today consider constraints that hold only approximately. In this paper we define an approximate implication as a linear inequality between the degree of satisfaction of the antencedents and consequent, and we study the relaxation problem: when does an exact implication relax to an approximate implication? We use information theory to define the degree of satisfaction, and prove several results. First, we show that any implication between FDs can be relaxed to a simple linear inequality where all coefficients are 1; we call it a unit relaxation. Second, we prove that there exists an implication between CIs that does not admit any relaxation; however, we prove that every implication between CIs relaxes "in the limit". Third, we restrict the information theoretic inequalities to Shannon-inequalities, and show that in that case the implication problem between CIs has a unit relaxation (under some additional assumptions). Finally, we relax the information theoretic inequalities to rank-inequalities, and show that every implication problem between CIs has a unit relaxation.
READ FULL TEXT