Efficiently Testing Simon's Congruence
Simon's congruence ∼_k is defined as follows: two words are ∼_k-equivalent if they have the same set of subsequences of length at most k. We propose an algorithm which computes, given two words s and t, the largest k for which s∼_k t. Our algorithm runs in linear time O(|s|+|t|) when the input words are over the integer alphabet {1,...,|s|+|t|} (or other alphabets which can be sorted in linear time). This approach leads to an optimal algorithm in the case of general alphabets as well. Our results are based on a novel combinatorial approach and a series of efficient data structures.
READ FULL TEXT