0x14 content introduction and example exercises

Posted by All4172 on Thu, 03 Feb 2022 08:41:30 +0100

Hash string

Here we mainly introduce string hashing, which is a violent algorithm. When we are dealing with string type problems, if we don't have ideas, we might as well try hashing. String hash in O ( n ) O(n) O(n) after processing the hash values of all prefixes of the string, you can O ( 1 ) O(1) O(1) query the hash value of any substring.
Generally, our hash function is designed to take a fixed hash P P P value, which treats the string as P P P-ary number, and assign a value greater than 0 to represent each character, and then take a fixed value M M M. Find the P P P-ary number pair M M The remainder of M, i.e M M M modulus as the hash value of the string.
Generally speaking, we take P P When the P value is 131 or 13331, the probability of conflict is almost 0; We take M M M is 2 64 2^{64} 264. For the convenience of code writing, we define the hash value type as u n s i g n e d unsigned unsigned l o n g long long l o n g long long.
Let's take a look at examples and exercises to experience the simple application of string hash.

[example] rabbit and rabbit (AcWing138)

Title Link
Idea: if you want to quickly query whether two substrings are equal, it is not difficult for us to think of using string hash. After preprocessing the hash values of all prefixes, it can be realized O ( 1 ) O(1) O(1) query the hash value of any substring. The specific implementation is relatively simple, see the code.

AC Code:

#include<bits/stdc++.h>
#define N 1000005
#define ull unsigned long long

using namespace std;

const ull P = 13131;

int n;
char s[N];
ull h[N];

ull qpow(ull a,int b){
    ull res = 1;
    while(b){
        if(b & 1) res *= a;
        a *= a;
        b >>= 1;
    }
    return res;
}

void solve(){
    scanf("%s",s + 1);
    n = strlen(s + 1);
    
    for(int i = 1;i <= n;i ++){
        h[i] = h[i - 1] * P + s[i] - 'a';    
    }
    
    int q;scanf("%d",&q);
    while(q --){
        int l1,r1,l2,r2;
        scanf("%d%d%d%d",&l1,&r1,&l2,&r2);
        if(h[r1] - h[l1 - 1] * qpow(P,r1 - l1 + 1) == h[r2] - h[l2 - 1] * qpow(P,r2 - l2 + 1))
            puts("Yes");
        else 
            puts("No");
    }
}

int main(){
    solve();
    return 0;
}

[example] maximum length of palindrome substring

Title Link
Idea: first of all, we need to unify the processing methods of odd and even palindrome substrings. The common method is to add a special character that has not appeared in the middle of every two characters to double the length of the original string.
Then, the hash values of all prefixes of the new string and its inverse string are preprocessed.
Finally, we enumerate each location as the palindrome center, with a maximum length of two. For each length m i d mid For the determination of mid, we can judge whether the hash value of the corresponding substring in the inverse string is the same as that of the corresponding substring in the new string. The corresponding substring position can be drawn by yourself, which is easy to find.

AC Code:

#include<bits/stdc++.h>
#define N 2000005
#define ull unsigned long long

using namespace std;

int n;
char s[N],sr[N];
ull h[N],hr[N];
ull p[N];
int kase;

ull get(ull h[], int l, int r){
    return h[r] - h[l - 1] * p[r - l + 1];
}

void solve(){
    n = strlen(s + 1);
    for(int i = 2 * n;i > 0;i -= 2){
        s[i] = s[i / 2];
        s[i - 1] = 'z' + 1;
    }
    
    n *= 2;
    for(int i = 1, j = n;i <= n;i ++, j --) sr[j] = s[i];
    p[0] = 1;
    
    for(int i = 1;i <= n;i ++)
    {
        h[i] = h[i - 1] * 13131 + s[i] - 'a';
        hr[i] = hr[i - 1] * 13131 + sr[i] - 'a';
        p[i] = p[i - 1] * 13131;
    }

    int ans = 0;
    for(int i = 1;i <= n;i ++){
        int l = 0,r = min(i - 1,n - i);
        while(l < r){
            int mid = l + r + 1 >> 1;
            if(get(h, i - mid, i - 1) == get(hr, n - (i + mid) + 1, n - (i + 1) + 1)) l = mid;
            else r = mid - 1;
        }
        if(s[i - l] == 'z' + 1) ans = max(ans, l);
        else ans = max(ans,l + 1);
    }
    
    printf("Case %d: %d\n", ++kase, ans);
}

int main(){
    while(scanf("%s",s + 1) && !(s[1] == 'E' && s[2] == 'N' && s[3] == 'D'))
        solve();
    return 0;
}

[example] suffix array (AcWing140)

Title Link
Idea: the time complexity of hash method of suffix array is one more than that of Radix sorting method l o g log log is also a classic. You can learn about it. We use the most simple idea to sort all suffixes, and use our own defined comparison function to quickly sort all suffixes. Because we can O ( 1 ) O(1) O(1) query the hash value of any substring, then we can divide the length of the longest common prefix of two different suffixes, and the next position of this length is the first different character position of the two different suffixes. We can compare the sizes of two different suffixes by comparing the sizes of the two characters. The complexity is O ( l o g n ) O(logn) O(logn). about h e i g h t height height array is also as like as two peas in two.

AC Code:

#include<bits/stdc++.h>

using namespace std;

typedef unsigned long long ULL;
const int N = 300005;
const int base = 131;

int n;
char s[N];
ULL h[N],p[N];
int sa[N];

ULL get(int l,int r){
    return h[r] - h[l - 1] * p[r - l + 1];   
}

bool cmp(int a,int b){
    int l = 0,r = min(n - a + 1,n - b + 1);
    while(l < r){
        int mid = l + r + 1 >> 1;
        if(get(a,a + mid - 1) == get(b,b + mid - 1)) l = mid;
        else r = mid - 1;
    }
    return s[a + l] < s[b + l];
}

void solve(){
    scanf("%s",s + 1);
    n = strlen(s + 1);
    
    p[0] = 1;
    for(int i = 1;i <= n; i ++){
        p[i] = p[i - 1] * base;
        h[i] = h[i - 1] * base + s[i] - 'a';
        sa[i] = i;
    }   
    
    sort(sa + 1, sa + 1 + n, cmp);
    
    for(int i = 1; i <= n;i ++){
        if(i != 1) printf(" ");
        printf("%d",sa[i] - 1);
    }
    puts("");
    
    printf("0");
    for(int i = 2;i <= n;i ++){
        int l = 0,r = min(n - sa[i] + 1,n - sa[i - 1] + 1);
        while(l < r){
            int mid = l + r + 1 >> 1;
            if(get(sa[i],sa[i] + mid - 1) == get(sa[i - 1],sa[i - 1] + mid - 1)) l = mid;
            else r = mid - 1;
        }
        printf(" %d",l);
    }
}

int main(){
    solve();
    return 0;
}

Topics: data structure