Rabin Karp Algorithm

Given a text string and a pattern string, implement the Rabin-Karp algorithm to find all occurrences of the pattern in the text. The Rabin-Karp algorithm uses hashing to find any substring in a text that matches the pattern.

Input/Output Examples

plaintext
Example 1:
Input:
text = "abracadabra"
pattern = "abra"
Output: [0, 7]
Explanation: The pattern abra is found at indices 0 and 7 in the text.

Example 2:
Input:
text = "hello world"
pattern = "world"
Output: [6]
Explanation: The pattern world is found at index 6 in the text.

Example 3:
Input:
text = "abcde"
pattern = "xyz"
Output: []
Explanation: The pattern xyz is not found in the text.

Approach to Implement Rabin-Karp Algorithm

  1. Compute Hash for Pattern and Initial Text Window:
    • Calculate the hash value of the pattern and the first window of the text (of the same length as the pattern) using a rolling hash function.
  2. Slide Over the Text:
    • For each window in the text, compare the hash value of the window with the hash value of the pattern.
    • If the hash values match, verify by comparing the actual substring to the pattern to avoid false positives.
  3. Use Rolling Hash for Efficient Rehashing:
    • Use a rolling hash to compute the hash of the next window by removing the first character of the current window and adding the next character in the text.
  4. Store and Return All Occurrences:
    • Store the starting indices where the pattern matches the text and return the list of indices.

C++ Program to Implement Rabin-Karp Algorithm

cpp
#include <iostream>
#include <vector>
#include <string>
using namespace std;

const int d = 256; // Number of characters in the input alphabet
const int q = 101; // A prime number

// Function to find pattern occurrences using Rabin-Karp algorithm
vector<int> rabinKarp(const string& text, const string& pattern) {
    int m = pattern.length();
    int n = text.length();
    int p = 0; // Hash value for pattern
    int t = 0; // Hash value for text
    int h = 1;
    vector<int> result;

    // Calculate h = pow(d, m-1) % q
    for (int i = 0; i < m - 1; i++) {
        h = (h * d) % q;
    }

    // Calculate the hash value of the pattern and first window of text
    for (int i = 0; i < m; i++) {
        p = (d * p + pattern[i]) % q;
        t = (d * t + text[i]) % q;
    }

    // Slide the pattern over text one by one
    for (int i = 0; i <= n - m; i++) {
        if (p == t) {
            if (text.substr(i, m) == pattern) {
                result.push_back(i);
            }
        }

        if (i < n - m) {
            t = (d * (t - text[i] * h) + text[i + m]) % q;
            if (t < 0) t = (t + q);
        }
    }

    return result;
}

int main() {
    string text = "abracadabra";
    string pattern = "abra";
    vector<int> result = rabinKarp(text, pattern);

    cout << "Pattern found at indices: ";
    for (int index : result) {
        cout << index << " ";
    }
    cout << endl;

    return 0;
}

Java Program to Implement Rabin-Karp Algorithm

java
import java.util.ArrayList;
import java.util.List;

public class RabinKarpAlgorithm {

    private static final int d = 256; // Number of characters in the input alphabet
    private static final int q = 101; // A prime number

    // Function to find pattern occurrences using Rabin-Karp algorithm
    public static List<Integer> rabinKarp(String text, String pattern) {
        int m = pattern.length();
        int n = text.length();
        int p = 0; // Hash value for pattern
        int t = 0; // Hash value for text
        int h = 1;
        List<Integer> result = new ArrayList<>();

        for (int i = 0; i < m - 1; i++) {
            h = (h * d) % q;
        }

        for (int i = 0; i < m; i++) {
            p = (d * p + pattern.charAt(i)) % q;
            t = (d * t + text.charAt(i)) % q;
        }

        for (int i = 0; i <= n - m; i++) {
            if (p == t) {
                if (text.substring(i, i + m).equals(pattern)) {
                    result.add(i);
                }
            }

            if (i < n - m) {
                t = (d * (t - text.charAt(i) * h) + text.charAt(i + m)) % q;
                if (t < 0) t = (t + q);
            }
        }

        return result;
    }

    public static void main(String[] args) {
        String text = "hello world";
        String pattern = "world";
        List<Integer> result = rabinKarp(text, pattern);

        System.out.println("Pattern found at indices: " + result);
    }
}

Python Program to Implement Rabin-Karp Algorithm

python
# Function to find pattern occurrences using Rabin-Karp algorithm
def rabin_karp(text, pattern):
    d = 256   # Number of characters in the input alphabet
    q = 101   # A prime number
    m = len(pattern)
    n = len(text)
    p = 0     # Hash value for pattern
    t = 0     # Hash value for text
    h = 1
    result = []

    for i in range(m - 1):
        h = (h * d) % q

    for i in range(m):
        p = (d * p + ord(pattern[i])) % q
        t = (d * t + ord(text[i])) % q

    for i in range(n - m + 1):
        if p == t:
            if text[i:i + m] == pattern:
                result.append(i)

        if i < n - m:
            t = (d * (t - ord(text[i]) * h) + ord(text[i + m])) % q
            if t < 0:
                t += q

    return result

# Example usage
if __name__ == "__main__":
    text = "abcdeabcde"
    pattern = "cde"
    result = rabin_karp(text, pattern)
    print("Pattern found at indices:", result)
  • Time Complexity: O((n - m + 1) * m) in the worst case due to hash collisions, where n is the length of the text and m is the length of the pattern. However, average-case complexity is O(n + m).
  • Space Complexity: O(1), as the algorithm only requires a few variables for hashing and does not use additional space proportional to the input size.

DSA

4016

565

Related Articles