KMP learning puzzles, self-study, self-closure, self-question and self-answer

Posted by Paghilom on Sat, 27 Jul 2019 12:01:35 +0200

These two days, I watched KMP algorithm and searched various explanations, blogs and videos.
It seems that the implementation methods are different, and most of them fail to mention the key points.
(I don't quite understand aha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha awkward laugh)
KMP, a classical algorithm, explains the solution of next array in depth

Let's start with a question.

HDOJ Problem-1686
Finding the Number of Words W in Text T

AC code

#include<iostream>
#include<cstdio>
#include<algorithm>
#include<cstring>
using namespace std;
char w[10005],t[1000005];
int nex[10005];
void getnext(char *B,int n)
{
	nex[0]=0;
	for(int i=1,j=0;i<n;i++)/* n B string length */
	{
//		j=nex[i-1];j is the next (or maximum common string length) of the largest common string corresponding to the previous bit to be calculated.
		while(B[j]!=B[i]&&j>0)//If the match is not good, try to reduce the common substring, because the j characters before the B string I (B[i-j]~B[i-1]) are the same as the j characters at the beginning (B[0]~B[j-1]), and they are found directly in the j characters before the B string I.
			j=nex[j-1];//Find the next largest common substring that is the same as B[i], and you can't use j -, so that the matching Prefix suffix from B[0] to B[i-1] is always the same at present.
		if(B[j]==B[i])//The next character in the common string is the same, the common maximum length + 1
			j++;
		nex[i]=j;//J is the maximum common string length from 0 to i, if the final B[j]!=B[i],j=0
	}
}//https://www.cnblogs.com/c-cloud/p/3224788.html
int kmp(int m,int n)//m is the length of A string and n is the length of B string.
{
	int ans=0,cmp=0;
	for(int i=0;i<m;i++){
		while(cmp>0&&t[i]!=w[cmp])//cmp location matching failed
			cmp=nex[cmp-1];//The next round starts from the next of the largest common string that matches the successful string
		if(t[i]==w[cmp])//The current character matching succeeds and continues to the next one
			cmp++;
		if(cmp==n){//Substring Matching Successfully
			ans++;
			cmp=nex[cmp-1];//There can be a common part between the parts of this topic.
		}
	}
	return ans;
}
int main()
{
	int n;
	cin>>n;
	getchar();
	while(n--){
		gets(w);
		gets(t);
		getnext(w,strlen(w));
		printf("%d\n",kmp(strlen(t),strlen(w)));
	}
	return 0;
}

Interpretation of Doubts in Finding next Array

One difficulty is finding next arrays.

void getnext(char *B,int n)
{
	nex[0]=0;
	for(int i=1,j=0;i<n;i++)/* n B string length */
	{
//		j=nex[i-1];j is the next (or maximum common string length) of the largest common string corresponding to the previous bit to be calculated.
		while(B[j]!=B[i]&&j>0)//If the match is not good, try to reduce the common substring, because the j characters before the B string I (B[i-j]~B[i-1]) are the same as the j characters at the beginning (B[0]~B[j-1]), and they are found directly in the j characters before the B string I.
			j=nex[j-1];//Find the next largest common substring that is the same as B[i], and you can't use j -, so that the matching Prefix suffix from B[0] to B[i-1] is always the same at present.
		if(B[j]==B[i])//The next character in the common string is the same, the common maximum length + 1
			j++;
		nex[i]=j;//J is the maximum common string length from 0 to i, if the final B[j]!=B[i],j=0
	}
}

The next array stores a value that is the maximum common substring length of the substring up to the subscript, since the string starts at 0 and corresponds to the subscript of the next character in the prefix part of the maximum common substring of the substring.
When next[i], next[0]~next[i-1] has been found, so that j is the next[i-1] of the largest common string corresponding to str[i-1].
Note: The J characters before I (str [i-j] ~str [i-1]) are the same as the j characters at the beginning (str[0]~str[j-1]).
(1) If str[j]==str[i], the character added at the end makes the last maximum common substring one bit later, next [i]= j + 1 (at this time j=next[i-1])

(2) If str[j]!=str[i], it means that the characters can not be connected after the last maximum common substring, and try to narrow the common substring, so the maximum common substring of the maximum common substring can be obtained continuously until the character after the maximum common substring prefix string matches str[i] or j= 0 (there is no maximum common substring in the str[i] segment). So exit the loop

Note: j cannot be used--

while(B[j]!=B[i]&&j>0)
	j=nex[j-1];//It can't be replaced by j--

The Simplest Understanding of KMP Algorithms
This blog mentions it, but it doesn't say why.
Because the guarantees obtained by using j - - are the common substrings of str[0] to str[i-1], but the Prefix suffix of the string str[0] to str[i-1] is the first XX characters of the maximum common substring, the newly added str[i] character can not be followed by this paragraph.

Programmer Think

KMP learning puzzles, self-study, self-closure, self-question and self-answer

Let's start with a question.

AC code

Interpretation of Doubts in Finding next Array

Other questions

How to better understand and master KMP algorithm? - Salted fish white answer - Know

Is the principle of KMP correct?

Hot Topics