15. Boyer-Moore Pattern Matching

Instead of comparing the pattern from front to back, we now do back to front. Boyer-Moore has various rules that establish the underlying algorithm. These rules allows Boyer-Moore to skip wasted comparisons.

Bad Character Rule: If we mismatch, use knowledge of the mismatched text character to skip alignments¹.
Good Suffix Rule: If we match some characters, use knowledge of the matched characters to skip alignments¹.

import java.util.*;

public int BoyerMooreFind(String text, String pattern) {
    Map<Character, Integer> right = new HashMap<Character, Integer>(); 

    int M = pattern.length();
    int N = text.length();

    for (int i = M - 1; i >= 0; i--) {
        if (right.get(pattern.charAt(i)) == null) {
            right.put(pattern.charAt(i), i);
        }
    }

    int skip;
    for (int i = 0; i <= N - M; i += skip) {
        skip = 0;
        for (int j = M - 1; j >= 0; j--) {
            if (pattern.charAt(j) != text.charAt(i + j)) {
                if (right.get(text.charAt(i)) == null) {
                    skip = j + 1;
                } else {
                    skip = Math.max(1, j - right.get(text.charAt(i)));
                }
                break;
            }
        }
        if (skip == 0) {
            return i;
        }
    }
    return -1;
}

This allows the algorithm to be sublinear time complexity on the average. However, with a bad suffix rule it will be $O(MN)$ and the good suffix rule guarantees $O(N)$.

Next Page →

From John Hopkins Whiting School of Engineering, slides. ↩↩