< data structure > KMP algorithm

Posted by galvin on Sat, 04 Dec 2021 04:56:43 +0100

next array

definition

  • Strict definition: next[i] means the maximum K of the substring s[0...k] == s[i-k...i] (the prefix and suffix can overlap, but not s[0..i]
  • Meaning: the subscript of the Prefix suffix such as the longest appearance. If not, it is given - 1
  • Graphical explanation: s[0] starts to find a longest substring, which satisfies a condition that when the substring is pulled to the end, it can completely coincide with the parent string

solve

recursion

The above judgment can be summarized as a recursive process:
Read two lines of substring, one line provides prefix and the other line provides suffix.
When a new character s[i] is read, the suffix line continues to slide to the left.
If it can be matched, the subscript of the last matching element in the suffix line is the next value
Otherwise, the suffix line slides to the right until an exact match is found

give an example

If you know next[0]~next[3], how to recursively find next[4] and next[5]

next[4]:
It is known that next[3]=1. Since s[4]==s[next[3]+1], the prefix of longest phase is extended, and next[4]=next[3]+1
If j=next[3], the above two formulas become s [4] = s [J + 1], and next [4] = j + 1

next[5]:
It is known that next[4]=2, s [5]= S [J + 1] at this time, the Prefix suffix such as the longest phase cannot be expanded. It is necessary to slide the suffix string to a certain position to the right to make it meet "s[5]==s[j+1]", as shown in the rightmost figure of figure 12-3

Now determine j: the essence is to determine~
Since ~ is obtained by sliding "aba" to the right, it is the prefix of aba
Since ~ is also the suffix of "aba", as shown in the rightmost figure of figure 12-3, it can be seen that ~ is the longest prefix of "aba"
"aba" is subscript 0-2 in the suffix line, so j = next[2] (understand it again in combination with the definition of next array) = next[next[4]] = j '(j value when calculating next[4])

Therefore, when solving next[5], just make next[5]=next[2], and then judge whether s[5] == s[j+1] is true
If true, next[5]=next[j]+1
Otherwise, keep j=next[j] until j=-1 or s[5] == s[j+1] is established

realization

step

  1. Initialize next array, next[0] = j = -1
  2. Repeat 3.4 for i from 1 - (len-1)
  3. Keep making j = next[j], know J=- 1 or s[i] == s[j+1],
  4. If s[i] == s[j+1], next[i] = j+1

code

//getNext solves the next array of string s with length len
void getNext(char s[], int len){
    int j = -1;
    next[0] = -1;  //Initialization j = next[0] = -1
    for(int i = 1; i < len; i++){
        while(j != -1 && s[i] != s[j+1]){  //Solve next[1] ~ next[len-1]
            j = next[j];  //Repeat j = next[j]
        }  //Until j goes back to - 1, or s[i] == s[j+1]
        if(s[i] == s[j+1]){
            j++; //Then next[i] = j + 1, shilling J points to this position
        }
        next[i] = j;  //Make next[i] = j
    }
}

It is not difficult to find that j is an intermediate variable used to assign a value to next[i] and to record the previous next value in the process of recursive solution (the code uses a loop instead of recursion, but the essence is the idea of recursion)

KMP algorithm

analysis

String matching, matched string: text string text, matched string: pattern string patten

Initialize so that j = - 1 and I = 0.

As shown in the following figure, traverse text. When text[i] == patten[j+1], I and j keep moving to the right

As shown in the following figure, when text [i]= When patten [J + 1], slide patten to the right until the condition text[i] == patten[j+1] is met,
It is not difficult to find that this process is very similar to the mismatch when solving the next array. The same idea as when solving the next array is to make j = next[j], so that patten can quickly move to the corresponding position. It can be seen that next[j] is the position where j should fall back in case of current j mismatch.
Finally, if j == 5 is also matched successfully, it indicates that patten is a substring of text

realization

step

  1. Initialization j=1
  2. Let I traverse the text array and execute 3.4. For each I to try to match text[i] and patten[j+1]
  3. Keep j = next[j] until j == -1 or text[i] == patten[j+1]
  4. If text[i] == patten[j+1], make j + +; When j== m-1, Patten is a text substring

code

//KMP algorithm to judge whether the pattern array is a substring of text
/*O(m+n)*/
bool KMP(char text[], char patten[]){
    int n = strlen(text), m = strlen(patten);  //String length
    getNext(patten, m);  //Calculate the next array of patten
    int j = -1;  //Initializing j to - 1 indicates that no bit has been matched
    for(int i = 0; i < n; i++){  //Attempt to match text[i]
        while(j != -1 && text[i] != patten[j+1]){
            j = next[j];  //Keep going back until j returns to - 1 or text[i] == patten[j+1]
        }
        if(text[i] == patten[j+1]){
            j++;  //text[i] matches patten successfully, make j plus 1
        }
        if(j == m-1){
            return true;  //Patten matches exactly, indicating that patten is a substring of text
        }
    }
    return false;  //After executing text, the matching is not successful, indicating that patten is not a substring of text
}

Complete code

#include<stdio.h>
#include<string.h>
const int MaxLen = 100;
int next[MaxLen];
//getNext solves the next array of string s with length len
void getNext(char s[], int len){
    int j = -1;
    next[0] = -1;  //Initialization j = next[0] = -1
    for(int i = 1; i < len; i++){
        while(j != -1 && s[i] != s[j+1]){  //Solve next[1] ~ next[len-1]
            j = next[j];  //Repeat j = next[j]
        }  //Until j goes back to - 1, or s[i] == s[j+1]
        if(s[i] == s[j+1]){
            j++; //Then next[i] = j + 1, shilling J points to this position
        }
        next[i] = j;  //Make next[i] = j
    }
}

//KMP algorithm to judge whether the pattern array is a substring of text
/*O(m+n)*/
bool KMP(char text[], char patten[]){
    int n = strlen(text), m = strlen(patten);  //String length
    getNext(patten, m);  //Calculate the next array of patten
    int j = -1;  //Initializing j to - 1 indicates that no bit has been matched
    for(int i = 0; i < n; i++){  //Attempt to match text[i]
        while(j != -1 && text[i] != patten[j+1]){
            j = next[j];  //Keep going back until j returns to - 1 or text[i] == patten[j+1]
        }
        if(text[i] == patten[j+1]){
            j++;  //text[i] matches patten successfully, make j plus 1
        }
        if(j == m-1){
            return true;  //Patten matches exactly, indicating that patten is a substring of text
        }
    }
    return false;  //After executing text, the matching is not successful, indicating that patten is not a substring of text
}

relationship

The process of solving nex array is the process of pattern string patten self matching

Topics: data structure