Dynamic Planning: Edit Distance

Posted by jgires on Sat, 22 Jan 2022 13:29:42 +0100

//d[i][j] i s the longest subsequence containing s[i-1] elements in s[0, i-1] and t[0, j-1]

1. Problem Guidance 1: Judging Subsequences

392.Judgement Subsequence
 Given string s and t ,judge s Is it t Subsequence.

A subsequence of a string is a new string formed by the original string deleting some (or no) characters without changing the relative position of the remaining characters. (For example,"ace"yes"abcde"A subsequence, and"aec"No).

It's easy to do this with a double pointer, but what if you use dynamic programming?

1. dp array meaning

d[i][j] denotes the length of a substring ending in s[i-1] contained in the string t ending in the following label j-1
For example, if t i s bnmncla and s i s abc, d[i][j]=2, because looking backwards and forwards, C i s found first and only b i s before c, so d[i][j]=2

2. Recursive Formula

  • If (s[i-1] == t[j-1]), then dp[i][j] = dp[i-1][j-1] + 1; Because the same character is found, +1 on the basis of dp[i-1][j-1]
  • If (s[i-1]!= t[j-1]), then at this point in the t-sequence, the elements numbered J-1 are the same as those not numbered j-1, so dp[i][j] = dp[i][j-1].

Note why dp[i-1][j] is not involved here because:
dp[i][j] i s the length of a substring ending in s[i-1] contained in the string t ending in the subscript j-1 below;
dp[i-1][j] i s the length of a substring ending in s[i-2] contained in the string t ending in the subscript j-1;

  • For example, s-string abc and T-String abeft, if the length of a substring ending in b is found in the t-string, its length is 2; However, if a substring ending in c is found in t, the result is 0; So dp[i][j] is not related to dp[i-1][j]

  • If you look for a B-terminated string in abef, its length is 2. If you look for a B-terminated string in abef t, its length is 2 (t is not equal to b, so its length is equal to that of the B-terminated substring in abef), so dp[i][j] is related to dp[i][j-1]

3. Initialization

Based on the recursive formula, we need to initialize dp[i][0] and dp[0][j], by definition:

4. Traversal order

Left to right, top to bottom

5. Return Value

See if dp[s.size()][t.size()] is equal to s.size(). Equality means finding substrings or not

6. Code implementation

class Solution {
public:
    bool isSubsequence(string s, string t) {
        vector<vector<int>> dp(s.size() + 1, vector<int>(t.size() + 1, 0));

        for(int i = 1; i <= s.size(); i++){
            for(int j = 1; j <= t.size(); j++){
                if(s[i - 1] == t[j - 1]) dp[i][j] = dp[i - 1][j - 1] + 1;
                else dp[i][j] = dp[i][j - 1];
            }
        }
        if(dp[s.size()][t.size()] == s.size()) return true;
        return false;
    }
};

2. Different Subsequences

115. Different Subsequences
 Given a string s And a string t ,Calculated in s In a subsequence t Number of occurrences.

A subsequence of a string is a new string that consists of deleting some (or no) characters without interfering with the relative position of the remaining characters. (For example,"ACE" yes "ABCDE" A subsequence, and "AEC" No)

Topic data guarantees answers to 32-bit signed integer ranges.

1. dp array meaning

dp[i][j]: The number of t with j-1 appearing in s-subsequence ending with i-1, note that this i s the number of subsequences, not the length.

2. Recursive Formula

When dp[i][j] is evaluated, there are two cases:

The first i s s[j-1] == t[i-1], where there are two sources:

  • The first i s to use s[j-1], where there are dp[i-1][j-1] subsequences
  • The second i s that s[j-1] i s not used, at which point there are dp[i][j-1] subsequences
    Therefore, at this time, the number of subsequences of t ending in J-1 has dp[i-1][j-1] + dp[i][j-1]

The second case is s[j-1]!= T[i-1], s[j-1] i s definitely not available at this time, so dp[i][j] = dp[i][j-1]

3. Initialization of dp arrays

Based on the recursive formula, we need to initialize dp[i][0] and dp[0][j]
Unlike the previous question, according to the definition of dp array in this question:

  • dp[i][0]: How many substrings end in I-1 are contained in an empty string, of course all 0. With one exception, there is also a substring that ends with -1 (and is also an empty string) in an empty string
  • dp[0][j]: how many empty strings are contained in the string ending with j-1. An empty string can only be made if all characters in the string are deleted, so a string ending in J-1 contains an empty string

So the initialization method is:

vector<vector<uint64_t>> dp(t.size() + 1, vector<uint64_t>(s.size() + 1, 0));

for(int j = 0; j <= s.size(); j++){
    dp[0][j] = 1;
    }

4. Traversal method

Front to back, top to bottom

5. Return Value

Based on the DP array definition, the final return value is dp[t.size()][s.size()]

6. Code implementation

class Solution {
public:
    int numDistinct(string s, string t) {
		/---dp Array Initialization---/
        vector<vector<uint64_t>> dp(t.size() + 1, vector<uint64_t>(s.size() + 1, 0));
        for(int j = 0; j <= s.size(); j++){
            dp[0][j] = 1;
        }

		/---Recursive Formula---/
        for(int i = 1; i <= t.size(); i++){
            for(int j = 1; j <= s.size(); j++){
                if(s[j - 1] == t[i - 1]) dp[i][j] = dp[i - 1][j - 1] + dp[i][j - 1];
                else dp[i][j] = dp[i][j - 1];
            }
        }

        return dp[t.size()][s.size()];
    }
};

3. Deletion of two strings

583. Deletion of two strings
 Give two words word1 and word2,Find Make word1 and word2 With the same minimum number of steps required, one character in any string can be deleted at each step.

1. dp array meaning

dp[i][j]: string word1 ending with i-1 and string word2 ending with j-1 bits, the minimum number of times an element needs to be deleted to be equal

2. Recursive Formula

When word1[i-1] is the same as word2[j-1], dp[i][j] = dp[i-1][j-1];

When word1[i-1] is different from word2[j-1], there are three situations:

  • Scenario 1: Delete word1[i-1], the minimum number of operations is dp[i-1][j] + 1 (delete word1[i-1] is dp[i-1][j], plus delete word1[i-1], the total number of operations is dp[i][j])

  • Scenario 2: Delete word2[j-1], the minimum number of operations is dp[i][j-1] + 1

  • Case 3: Delete both word1[i-1] and word2[j-1], the minimum number of operations is dp[i-1][j-1] + 2

3. dp Initialization

By definition, the dp array initialization results are:

vector<vector<int>> dp(word1.size() + 1, vector<int>(word2.size() + 1));
for (int i = 0; i <= word1.size(); i++) dp[i][0] = i;  //For word1[i], delete I times to become word2[0]
for (int j = 0; j <= word2.size(); j++) dp[0][j] = j;

4. Traversal order

Left to right, top to bottom

5. Return Value

Depending on the meaning of the DP array, the return value is dp[word1.size()][word2.size()]

6. Code implementation

class Solution {
public:
    int minDistance(string word1, string word2) {
        vector<vector<int>> dp(word1.size() + 1, vector<int>(word2.size() + 1, 0));

        for(int i = 1; i <= word1.size(); i++){
            dp[i][0] = i;
        }

        for(int j = 1; j <= word2.size(); j++){
            dp[0][j] = j;
        }

        for(int i = 1; i <= word1.size(); i++){
            for(int j = 1; j <= word2.size(); j++){
                if(word1[i - 1] == word2[j - 1]) dp[i][j] = dp[i - 1][j - 1];
                else dp[i][j] = min(dp[i - 1][j] + 1, min(dp[i][j - 1] + 1, dp[i - 1][j - 1] + 2));
            }
        }

        return dp[word1.size()][word2.size()];
    }
};

4. Edit Distance

72. Edit Distance
 Give you two words word1 and word2,Please calculate that the word1 convert to word2 Minimum number of operations used.
You can do three things with a word:
Insert a character
 Delete a character
 Replace a character

1. dp array

Dp[i][j] denotes a string word1 ending with the following label i-1 and a string word2 ending with the following label j-1 with the closest editing distance of dp[i][j]

2. Recursive Formula

There are four cases:

In the first case, word1[i-1] == word2[j-1] means that without any editing, dp[i][j] should be dp[i-1][j-1], that is, dp[i][j] = dp[i-1][j-1];

The second case is Word1[i-1]!= Word2[j-1], there are three options

  • Word1 adds an element so that its word1[i-1] is the same as word2[j-1], so that is the closest editing distance of word1 ending with i-2 and word2 ending with I-1 plus an operation to add an element. That is, dp[i][j] = dp[i-1][j] + 1; (which is equivalent to deleting an element in word2)
  • Word2 adds an element so that its word1[i-1] is the same as word2[j-1]. That is, the closest editing distance of word1 ending with I-1 and word2 ending with j-2 plus an operation to add an element, that is, dp[i][j] = dp[i][j-1] + 1 (which is equivalent to deleting an element with word1)
  • Replace element, word1 replaces word1[i-1], making it the same as word2[j-1], without adding elements at this time, then the closest editing distance of word1 ending with i-2 and j-2 plus an operation to replace element. That is, dp[i][j] = dp[i-1][j-1] + 1;

To sum up, the recursive formula is:

if(word1[i - 1] == word2[j - 1]) dp[i][j] = dp[i - 1][j - 1];
else dp[i][j] = min({dp[i - 1][j], dp[i][j - 1], dp[i - 1][j - 1]}) + 1;

3. Initialization

By definition, the dp array initialization results are:

vector<vector<int>> dp(word1.size() + 1, vector<int>(word2.size() + 1));
//For word1[i], delete I times to become word2[0]; The latter word2[0] adds I elements to make word1[i]
for (int i = 0; i <= word1.size(); i++) dp[i][0] = i;  
for (int j = 0; j <= word2.size(); j++) dp[0][j] = j;

4. Traversal order

Front to Back, Left to Right

5. Return Value

Depending on the meaning of the DP array, the return value is dp[word1.size()][word2.size()]

6. Code implementation

class Solution {
public:
    int minDistance(string word1, string word2) {
    	/----dp Array Initialization---/
        vector<vector<int>> dp(word1.size() + 1, vector<int>(word2.size() + 1, 0));

        for(int i = 1; i <= word1.size(); i++){
            dp[i][0] = i;
        }

        for(int j = 1; j <= word2.size(); j++){
            dp[0][j] = j;
        }
		/---Recursive Formula---/
        for(int i = 1; i <= word1.size(); i++){
            for(int j = 1; j <= word2.size(); j++){
                if(word1[i - 1] == word2[j - 1]) dp[i][j] = dp[i - 1][j - 1];
                else dp[i][j] = min({dp[i - 1][j], dp[i][j - 1], dp[i - 1][j - 1]}) + 1;
            }
        }

        return dp[word1.size()][word2.size()];
    }
};

Topics: data structure