Hardness of Approximation of (Multi-)LCS over Small Alphabet
The problem of finding longest common subsequence (LCS) is one of the fundamental problems in computer science, which finds application in fields such as computational biology, text processing, information retrieval, data compression etc. It is well known that (decision version of) the problem of finding the length of a LCS of an arbitrary number of input sequences (which we refer to as Multi-LCS problem) is NP-complete. Jiang and Li [SICOMP'95] showed that if Max-Clique is hard to approximate within a factor of s then Multi-LCS is also hard to approximate within a factor of Θ(s). By the NP-hardness of the problem of approximating Max-Clique by Zuckerman [ToC'07], for any constant δ>0, the length of a LCS of arbitrary number of input sequences of length n each, cannot be approximated within an n^1-δ-factor in polynomial time unless P=. However, the reduction of Jiang and Li assumes the alphabet size to be Ω(n). So far no hardness result is known for the problem of approximating Multi-LCS over sub-linear sized alphabet. On the other hand, it is easy to get 1/|Σ|-factor approximation for strings of alphabet Σ. In this paper, we make a significant progress towards proving hardness of approximation over small alphabet by showing a polynomial-time reduction from the well-studied densest k-subgraph problem with perfect completeness to approximating Multi-LCS over alphabet of size poly(n/k). As a consequence, from the known hardness result of densest k-subgraph problem (e.g. [Manurangsi, STOC'17]) we get that no polynomial-time algorithm can give an n^-o(1)-factor approximation of Multi-LCS over an alphabet of size n^o(1), unless the Exponential Time Hypothesis is false.
READ FULL TEXT