15. Boyer-Moore Pattern Matching
Instead of comparing the pattern from front to back, we now do back to front. Boyer-Moore has various rules that establish the underlying algorithm. These rules allows Boyer-Moore to skip wasted comparisons.
- Bad Character Rule: If we mismatch, use knowledge of the mismatched text character to skip alignments1.
- Good Suffix Rule: If we match some characters, use knowledge of the matched characters to skip alignments1.
import java.util.*;
public int BoyerMooreFind(String text, String pattern) {
Map<Character, Integer> right = new HashMap<Character, Integer>();
int M = pattern.length();
int N = text.length();
for (int i = M - 1; i >= 0; i--) {
if (right.get(pattern.charAt(i)) == null) {
right.put(pattern.charAt(i), i);
}
}
int skip;
for (int i = 0; i <= N - M; i += skip) {
skip = 0;
for (int j = M - 1; j >= 0; j--) {
if (pattern.charAt(j) != text.charAt(i + j)) {
if (right.get(text.charAt(i)) == null) {
skip = j + 1;
} else {
skip = Math.max(1, j - right.get(text.charAt(i)));
}
break;
}
}
if (skip == 0) {
return i;
}
}
return -1;
}
This allows the algorithm to be sublinear time complexity on the average. However, with a bad suffix rule it will be $O(MN)$ and the good suffix rule guarantees $O(N)$.