New Algorithms for Unordered Tree Inclusion
The tree inclusion problem is, given two node-labeled trees P and T (the "pattern tree" and the "text tree"), to locate every minimal subtree in T (if any) that can be obtained by applying a sequence of node insertion operations to P. The ordered tree inclusion problem is known to be solvable in polynomial time while the unordered tree inclusion problem is NP-hard. The currently fastest algorithm for the latter is from 1995 and runs in O(poly(m,n) · 2^2d) = O^∗(4^d) time, where m and n are the sizes of the pattern and text trees, respectively, and d is the degree of the pattern tree. Here, we develop a new algorithm that improves the exponent 2d to d by considering a particular type of ancestor-descendant relationships and applying dynamic programming, thus reducing the time complexity to O^∗(2^d). We then study restricted variants of the unordered tree inclusion problem where the number of occurrences of different node labels and/or the input trees' heights are bounded and show that although the problem remains NP-hard in many such cases, if the leaves of P are distinctly labeled and each label occurs at most c times in T then it can be solved in polynomial time for c = 2 and in O^∗(1.8^d) time for c = 3.
READ FULL TEXT