14. Knuth-Morris-Pratt Pattern Matching
Like with everyone in Computer Science, we can do better. This is where Knuth's, Morris', and Pratt's idea of if one has matched the prefix of $P$ (pattern) with $k$ characters of $T$ (text), then one will know the these first $k$ characters of $T$ are equal to those of the prefix $P[0 \cdots k-1]$ of $P$. From this information we can deduce the place where to start the next comparison.
public int KunthMorrisPrattFind(String text, String pattern) {
int N = text.length();
int M = pattern.length();
int[] ot = new int[M];
ot[0] = 0;
int i = 0;
int j = 0;
for (int pos = 1; pos < M; pos++){
int prev_overlap = ot[pos - 1];
if (pattern.charAt(pos) == pattern.charAt(prev_overlap)) {
ot[pos] = prev_overlap + 1;
} else {
while ((pattern.charAt(pos) != pattern.charAt(prev_overlap)) && prev_overlap >= 1) {
prev_overlap = ot[prev_overlap - 1];
}
if (pattern.charAt(pos) == pattern.charAt(prev_overlap)) {
ot[pos] = prev_overlap + 1;
}
}
}
while (i < N) {
while (j < M && i < N && text.charAt(i) == pattern.charAt(j)) {
i++;
j++;
}
if (j == M) {
return (i - M);
}
if (j == 0) {
i++;
} else {
j = ot[j - 1];
}
}
return -1;
}
This algorithm has $O(n)$ time complexity.