14. Knuth-Morris-Pratt Pattern Matching

Like with everyone in Computer Science, we can do better. This is where Knuth's, Morris', and Pratt's idea of if one has matched the prefix of $P$ (pattern) with $k$ characters of $T$ (text), then one will know the these first $k$ characters of $T$ are equal to those of the prefix $P[0 \cdots k-1]$ of $P$. From this information we can deduce the place where to start the next comparison.

public int KunthMorrisPrattFind(String text, String pattern) {
    int N = text.length();
    int M = pattern.length();
    int[] ot = new int[M];
    ot[0] = 0;

    int i = 0;
    int j = 0;

    for (int pos = 1; pos < M; pos++){
        int prev_overlap = ot[pos - 1];

        if (pattern.charAt(pos) == pattern.charAt(prev_overlap)) {
            ot[pos] = prev_overlap + 1;
        } else {
            while ((pattern.charAt(pos) != pattern.charAt(prev_overlap)) && prev_overlap >= 1) {
                prev_overlap = ot[prev_overlap - 1];
            }

            if (pattern.charAt(pos) == pattern.charAt(prev_overlap)) {
                ot[pos] = prev_overlap + 1;
            }
        }
    }

    while (i < N) {
        while (j < M && i < N && text.charAt(i) == pattern.charAt(j)) {
            i++;
            j++;
        }

        if (j == M) {
            return (i - M);
        }

        if (j == 0) {
            i++;
        } else {
            j = ot[j - 1];
        }
    }
    return -1;
}

This algorithm has $O(n)$ time complexity.

Next Page →