HKUST - algorithm analysis and design - probabilistic algorithm review knowledge points

Posted by steanders on Tue, 04 Jan 2022 13:18:30 +0100

2, Probabilistic algorithm

Probabilistic algorithm classification:

Digital algorithm: approximate solution
Monte Carlo algorithm: solve the decision problem or the problem with only one correct solution. The answer may not be correct.
Las Vegas algorithm: never return wrong answers.
Sherwood algorithm: always give the right answer.

2.1 digital algorithm

Find the approximate solution of the numerical problem

$\ Pi $ value calculation: dart throwing simulation, unit circle and circumscribed square.

\[\ frac {fall in circle} {total} = \ frac{k}{n} =\frac{\pi r^2}{4r^2} = \frac{\pi}{4}\ \Rightarrow \pi = 4 \frac{k}{n} \]

Darts(n){
    k=0;
    for i=1 to n do {
        x = uniform(0,1); y = uniform(0,1);
        if(x*x+y*y<=1) then k++;
    }
    return 4k/n;
}

Digital integration: Monte Carlo integration, but not the MC algorithm in this lesson.

Calculate integral \ (S=\int_0^!f(x)dx \)

HitorMiss algorithm: darts are thrown into a square per unit area. k falls below the integral function, \ (\ frac{k}{n}=\frac{S}{1} \)

HitorMiss(f, n){
    k=0;
    for i=1 to n do {
        x = uniform(0,1); y = uniform(0,1);
        if(y<=f(x)) then k++;
    }
    return k/n;
}

Crude algorithm: randomly take points on the integral interval and multiply the arithmetic mean by the interval width\ (\int_a^bf(x)dx=(b-1)\frac{1}{n}\sum_{i=1}^{n}f(x_i)\)

Crude(f,n,a,b){
    sum = 0;
    for i=1 to n do{
        x = uniform(a,b);
        sum = sum + f(x);
    }
    return (b-a)*sum/n;
}

Relatively speaking, the variance of Crude algorithm is smaller, but the number of iterations of HitorMiss algorithm is more at the same time.

Solution ideas: \ (\ pi=4\int_0^1\sqrt{1-x^2}dx \)

Trapezoid deterministic algorithm: Trapezoidal algorithm, which divides the interval into n-1 sub intervals

$S=\delta*(f(a+\delta)+f(a+2\delta)+\cdots+\frac{f(a)+f(b)}{2})$

Trapezoid(f,n,a,b){
    // Suppose n > = 2
    delta = (b-a)/(n-1);
    sum = (f(a)+f(b))/2;
    for(x=a+delta; x<=b-delta; x+=delta){
        sum = sum + f(x);
    }
    return sum * delta;
}

The deterministic algorithm sometimes can not find the solution, and the probabilistic algorithm is suitable for high-dimensional definite integral.

Probability count: an estimate of an integer value, such as the size of a set or an estimate of the number of different objects in a set.

Find the size of the set: there are randomly selected elements put back, and k is the number selected before the first occurrence of repeated elements.

$when n is large enough, k=\beta\sqrt{n}=\sqrt{\frac{\pi}{2}n}, that is, n=\frac{2k^2}{\pi} $

SetCount(X){
    k = 0; S = {}; a = uniform(X);
    do{
        k++;
        S = S U {a};
        a = uniform(X);
    }while(a Not belong to S)
    return 2k**2/pi
}

Estimation of the number of different objects in multiple sets: map the word to the bit string with length m, \ (\ PI (y, b) = \ begin {cases}i, y [i] = the minimum I of B \ \ K + 1, y [i]= b \end{cases}\)

WordCount(){
    y[1...m+1] = 0;
    for Every word on the tape x do {
        i = pi(h(x), 1); // The minimum position equal to 1 in the hash value of X, indicating that x is expressed in 00 01.... initial 
        y[i] = 1;
    }
    return pi(y, 0); // Returns the smallest position in y equal to 0
}
2^(k-2)<=n<=2^k，More accurate n=2^k/1.54703

2.2 Sherwood algorithm

Always give the correct answer and smooth the execution time of different input instances.

Purpose: the average performance is better, but the worst performance is poor. For example, selection and sorting are sensitive to the relative order of elements.

Basic process:

Transform the solved instance to a random instance. (pretreatment)
Using the deterministic algorithm to solve the random example, a solution is obtained.
Transform this solution to the solution of the original instance. (post treatment)

RH(x){
    // Calculating f(x) with Sherwood algorithm
    n = length(x); // The size of x is n
    r = uniform(A[n]); // Select an element at random
    y = u(x,r); // Convert original instance x to random instance y
    s = f(y); // Finding the solution s of y with deterministic algorithm
    return v(r,s); // Restore the solution of s to the solution of x
}

Sherwood algorithm for selection and sorting: scramble the input instances without post-processing. Shuffle the cards before calling the deterministic algorithm

Shuffle(T){
    n = length(T);
    for i=1 to n-1 do {
        // Randomly select an element in T[1...n] and put it on T[i]
        j = uniform(1...n);
        sqap(T[i],T[j]);
    }
}

Discrete logarithm calculation: \ (a=g^x mod \ p, calculate x = log {g, P} a \)

Simple algorithm:

Calculate \ (g^x mod \ p,0 \le x\le p-1 \), X has limited values, because the discrete logarithm \ (< g > \) is a cyclic group
Verify that \ (a=g^x mod \ p \) is valid

dlog(g,a,p){ // When such an algorithm does not exist, the algorithm returns p
    x = 0; y = 1;
    do{
        x++;
        y = y*g; // Calculate y=g**x
    }while(a!=y%p && x!=p);
    return x;
}

Theorem:

$log_{g,p}(st \% p)=(log_{g,p}s+log_{g,p}t)\%(p-1)$
$log_{g,p}(g^r \% p)=r,0\le r \le p-2$

Sherwood transformation algorithm:

dlogRH(g, a, p){
    r = uniform(0..p-2);
    b = g^r mod p; // Power module b=g**r%p
    c = ba mod p; // Theorem 1
    y = dlog(g,c,p); // Log using deterministic algorithm_ {g,p}c, y=r+x
    return (y-r) mod (p-1); // Find x
}

\[\begin{align} y &= log_{g,p}(ba \% p) = (log_{g,p}b+log_{g,p}a)\%(p-1) \\ &= (log_{g,p}(g^r \% p)+log_{g,p}a)\%(p-1) \\ &= (r+x)\%(p-1) \end{align} \]

Search ordered list: numeric array and pointer array form an ordered static linked list.

Determination algorithm with time \ (O(n) \): direct sequential search, average time \ (O(\frac{n}{2}) \)

Search(x, i){
    while x > val[i] do
    	i = ptr[i];
     return i;
}
A(x){
    return Search(x, head);
}

Probabilistic algorithm with time of \ (O(n) \): start sequential search from about a random point, and the average time is \ (O(\frac{n}{3}) \)

D(x){
    i = uniform(1..n);
    y = val[i];
    case{
        x<y: return Search(x, head);   // Search from scratch
        x>y: return Search(x, ptr[i]); // Search from next location
        otherwise: return i; // case3 x==y
    }
}

The determination algorithm with an average time of \ (O(\sqrt{n}) \): find the subscript of the largest integer that is not greater than x in the first \ (\ left \lfloor \sqrt{n} \right \rfloor \) and look back from this number.

B(x){ // Let x be in val[1..n]
    i = head;
    max = val[i]; // The initial value of max is the minimum value in table val
    for j=1 to [sqrt(n)] do { // The subscript of the largest integer y greater than x cannot be found in the number n of the first radical
        y = val[j];
        if max < y <= x then {
            i = y; max = y;        
        }
    }
    return Search(x, i);
}

2.3 Las Vegas algorithm

characteristic:

More efficient algorithm
The upper bound of time may not exist
The solution may not be found, but if it is found, it must be correct
The probability of success increases with the increase of execution time

General form of algorithm: \ (LV(x,y,success),x is input, y is output \)

Convention mark:

$p(x), probability of algorithm success $
$s(x), expected time when the algorithm succeeds $
$e(x), expected time when the algorithm fails $

Generally, the LV algorithm will be run many times until it is successful

Obstinate(x){
    repeat
    	LV(x,y,success);
    until success;
    return y;
}

Expected time to find the correct solution:

\[\begin{align} t(x) &= p(x)s(x)+(1-p(x))(e(x)+t(x)) \\ &=s(x)+\frac{1-p(x)}{p(x)}e(x) \end{align} \]

2.3.1 eight queens problem

requirement:

Row no conflict: equal row numbers
Columns do not conflict: column numbers are equal
The main diagonals do not conflict: the difference between row and column numbers is equal
The secondary diagonals do not conflict: the sum of row and column numbers is equal

Traditional backtracking: decide where to put each line separately

i = j = 1;
while i ≤ 8 do { // Current line number i ≤ 8
    Check current row i: From current column j Try to find the safe train number one by one from the beginning to the back;
    if Secure column number found then {
        Place the queen, record the column number on the stack, and set the next row as the current row(i++);
        j=1; //The current column is set to 1
    } else {
        Backtracking to the previous line, i.e i--;
        Remove the placed queen in the row and take the next column of the Queen's column as the current column;
    }
}

Las Vegas algorithm:

appointment:

$try[1...8] indicates that the ith queen is placed on (i,try[i]) $
$try[1...k] is called k-promising $, if and only if the requirements of K are met
$\forall i!=j \in [1,k],try[i]-try[j]\notin \{i-j,0,j-i\}$

Basic steps:

Whether it is feasible to traverse the positions of all the columns to change the row, \ (\ frac{1}{nb} \) this column is selected under the probability
If there is a solution, put it. If there is no solution, exit

QueensLV(success){
    // Greedy LV algorithm, all queens are placed randomly
    col, diag45, diag135 = {}; // Columns and two diagonals are initialized to an empty set
    k = 0; // Line number
    repeat
        nb = 0; // Counter, nb value is the total number of open positions of (k+1)th queen
    	// Traverse the column number to test whether (k+1,i) is safe
        for i=1 to 8 do {
            if (i!∈col) and (i-k-1!∈diag45) and (i+k+1!∈diag135) then {
                // Column i is available for (i+1)th queen, but it is not placed immediately
                nb = nb + 1;
                if uniform(1..nb)=1 then // Maybe in column i
                	j = i;
            }
        }
        if(nb>0) then {
            // No safe position when nb=0
            // In all nb secure locations, the probability of (k+1)th queen selecting position j column is 1/nb
            k = k+1;
            try[k] = j;
            col = col ∪ {j};
            diag45 = diag45 ∪ {j-k};
            diag135 = diag135 ∪ {j+k};
        }
    until nb=0 or k=8; // Exit when the appropriate location cannot be found or try is 8-promising
    success = (nb>0);
}

In line 12 of the algorithm, at all possible column positions, the probability of selecting a column is \ (\ frac{1}{nb} \), which is pushed backward from the last item.

Improved algorithm: Joint backtracking method and LV algorithm, specifying that LV algorithm is used to determine only the position of the first few lines.

...Above algorithm...
until nb=0 or k=stepVegas; // stepVegas queens have been placed randomly
if nb>0 then
    backtrace(k, col, diag45, diag135, success);
else
    success = false;

2.3.2 modular p square root

$x \equiv y^2 (mod \ p),x\in[1,p-1],p is an odd prime number $, then

x is the quadratic residue of module p
y is the square root of x module p

Theorem:

The quadratic residue of any module p has two different square roots \ (b,(p-b) \)
Quadratic residue of module p when the integer between 1 and p-1 is exactly half
$\ forall x \in [1,p-1],p is an odd prime number, with x^{(p-1)/2} \equiv \pm1(mod \ p) $
$x is the quadratic residue of module p if and only if x^{(p-1)/2} \equiv 1(mod \ p) $

How to determine whether x is the quadratic residue of module p? Calculate whether \ (x ^ {\ frac {P-1} {2} (MOD \ P) \) is 1.

How to calculate the two square roots of x module p?

$if \ p\equiv 3 (mod \ 4) $, then the square root is \ (\ pm x^{(p+1)/4} \)
$if \ p\equiv 1 (mod \ 4) $, only LV algorithm can be used

Example: \ (p=53\equiv1(mod \ 4),x=7 \), find the square root of module 53 of 7.

\[\begin{align} & \frac{p-1}{2}=26 \\ & (2+\sqrt{7})^{26} \equiv 0+41\sqrt{7} (mod \ 53) \\ &Find the unique y such that 41y\equiv 1 mod \ 53,1\le y \le 52\ &Using Euclid algorithm, 41 \times22 \equiv 1 (mod \ 53)\ &\ therefore22 is a square root and 53-22 = 31 is another square root \end{align} \]

Basic steps:

Random \ (a\in[1,p-1] \)
Calculate c and d, \ ((a+\sqrt{x})^{(p-1)/2} \equiv c+d\sqrt{x} \ (mod \ p) \)
If \ (d=0 \), false is returned
If \ (c=0 \), find y, \ (dy \equiv 1 \ (mod \ p) \)

rootLV(x, p, y, success){ // Calculate y
    a = uniform(1..p-1); // I don't know how much a is
    if a**2 % p == x % p then { // Very unlikely
        success = true; y =a;
    }else{
        calculation c and d，Make 0<=c,d<=p-1，(a+sqrt(x))**((p-1)/2) % p == c+d*sqrt(x) % p;
        if d==0 then success = false;
        else{ // c=0
            success = true;
            calculation y So, d*y % p == 1, 1<=y<=p-1 // Modified Euclid algorithm
        }
    }
}

The probability of success of the algorithm is greater than 0.5, so the square root of x can be obtained by calling twice on average.

2.3.3 factorization of integers

Prime factorization of integers, \ (n=p_1^{m_1}p_2^{m_2}\cdots p_k^{m_k} \)

Naive split algorithm: find a nontrivial factor of n

split(n){
    // N is a prime number, return 1, otherwise return the minimum nontrivial factor of found n
    for i=2 to [sqrt(n)] do
        if n % i == 0 then return i;
    return 1;
}

Quadratic residue of composite number: the quadratic residue in the previous section requires that p is an odd prime number, then there are exactly two different square roots. If \ (n=pq,p and q are different prime numbers \), the quadratic residue has exactly 4 square roots.

Dixon factorization algorithm:

Basic steps:

Find the integers a and B coprime with n so that \ (a^2 \equiv b^2 \ (mod \ n) and a! \equiv \pm b \ (mod \ n)\)
Then the greatest common factor of N and a+b is a nontrivial factor, \ (x=gcd(n, a+b) \).

Dixon(n,x,success){ // Find a nontrivial factor of composite number n
    if n It's an even number then {
        x = 2; success = true;    
    }else{
        for i=2 to [log_{2}n] do {
            if n**(1/i) Is an integer then {
                x = n**(1/i); success = true; return;
            }
        }
        find a, b，bring a**2 % n == b**2 % n;
        if a % n == +-b % n then success = false;
        else {
            x = gcd(a+b, n); // Maximum common factor
            success = true;
        }
    }
}

K-smoothing: all prime factors of an integer x are in the first k prime numbers.

Determine the values of a and b: select n and k in advance.

Randomly select \ (x\in[1,n-1] \).
- If x is not coprime with n, a nontrivial factor is found
- Otherwise, \ (y=x^2 \ mod \ n \), if y is k smooth, the factorization of X and Y is saved in the table.
- Repeat until k+1 different integers x are selected.
Find a nonempty subset \ (\ {y_1y_2\cdots y_k \} \) in the k+1 factorization, so that the exponents of the first k primes in the product of the corresponding factorization are even (including 0), such as \ (y_1y_2y_4y_8=2^63^45^47^011^213^217^2 \)
If \ (s = \ {y_{s_1}, y_{s_2}, \ cdots, y_{s_k} \}, a = \ prod {y_i \ in s} x_i, B = \ sqrt {\ prod {y_i \ in s} y_i} \), and \ (if \ a! \ equiv \ PM B \ (MOD \ n) \), you only need to find the maximum common factor of a+b and n.

Time analysis: the value of K will affect the possibility that \ (x^2 \ mod \ n \) is k smoothing and the time of factor decomposition y. Usually take the following time

\[L=e^{\sqrt{ln(n)*ln(ln(n))}} \\ k=\sqrt{L}, expected time O(L^2), success probability at least 0.5 \]

2.4 Monte Carlo algorithm

There are some problems that can not find an effective algorithm to obtain the correct solution. MC finds the correct solution with high probability.

appointment:

p-correct: an MC algorithm returns a correct solution with a probability not less than p
Consistent: an MC algorithm returns the same correct solution for the same instance.

Call the same algorithm several times and select the solution with the most occurrences to increase the probability of success of the consistent and p- correct algorithm.

Biased algorithm:

Partial truth algorithm: the solution of MC algorithm when it returns true must be correct. Only when it returns false can it produce a wrong solution.
Partial \ (y_0 \) algorithm: it is always correct when the algorithm returns \ (y_0 \), and the p probability is correct when it returns non \ (y_0 \).

Repeat calling a consistent, p-correct, partial \ (y_0 \) MC algorithm k times to get a \ (1-(1-p)^k \) - correct algorithm.

2.4.1 main element problems

An array with n elements. If the number of elements is greater than \ (\ frac{n}{2} \), it is the main element.

Determine whether T has a main element: set an x randomly, and then see how many there are

maj(T){
    i = uniform(1..n);
    x = T[i];
    k = 0;
    for j=1 to n do
        if T[j] == x then
            k = k+1;
    return k>n/2;
}
Partial truth, 0.5-Correct algorithm

Repeated calls reduce the error rate:

maj2(T){
    if maj(T) then return true; // 1 success
    else return maj(T); // Call twice
}
Partial truth, 0.75-Correct algorithm

Algorithm improvement: the error probability of the control algorithm is less than \ (\ varepsilon > 0 \), and the number of calls should be \ (k=\left \lceil lg\frac{1}{\varepsilon} \right \rceil \)

// e is the error probability
majMC(T, e){
    k = [lg(1/e)]; // Rounding up
    for i=1 to k do
        if maj(T) then return true;
    return false;
}
// O(nlg(1/e))

2.4.2 prime number determination

To determine whether an integer is a prime number, no effective deterministic algorithm or LV algorithm has been found

Simple probabilistic algorithm:

prime(n){
    d = uniform(2..[sqrt(n)]); // Rounding down
    return (n mod d)!=0;
}

Fermat's small theorem: \ (if \ n is a prime number, then \ forall a \in [1,n-1], with a^{n-1} mod \ n = 1 \)

Transform proposition: \ (if \ \exist a \in [1,n-1], make a ^ {n-1}mod \ n! = 1, then n is not a prime \)

Primality determination algorithm: use Fermat's small theorem, partial false. That is, the algorithm returns false, which must be false.

Fermat(n){
    a = uniform(1..n-1);
    if a**(n-1) mod n == 1 then return true; // Not necessarily correct
    else return false; // It must be right
}

Pseudo prime number and prime pseudo evidence: the composite number conforming to Fermat's small theorem is called the pseudo prime number with a as the base, and a is called the prime pseudo evidence of n.

Repeating the Fermat test several times does not reduce the error.

Strong pseudo Prime: n is an odd number greater than 4, s and T are integers, so that \ (n-1=2^st,t is an odd number \), and the set \ (B(n) \) satisfies the following conditions:

\[\begin{align} &A \ in B (n), 2 \ Le a \ Le n-2, and one of the following two conditions is met:\ & 1. \ a^t mod \ n = 1 \\ & 2. \ \exist \ 0 \le i < s,a^{2^it} mod \ n = n-1 \end{align} \]

N is a prime number, then \ (\ forall a \in [2,n-2], a \in B(n) \)
N is a composite number, \ (if \ a \in B(n) \), then n is a strong pseudo prime number with a as the base, and a is the strong pseudo evidence of N prime.

Btest(a, n){
    // N is an odd number, 2 < = a < = n-2. If true is returned, it means that a belongs to B(n). Then n is a strong pseudo prime or prime
    s=0; t=n-1; // t starts with an even number
    repeat
        s++;t/=2;
    until t mod 2 = 1; //n-1=(2**s)t, t is an odd number
    x = a**t mod n;
    if x=1 or x=n-1 then return true; // Meet condition 1 or 2
    for i=1 to s-1 do{
        x = x**2 mod n;
        if x=n-1 then return true; // Meet condition 2
    }
    return false;
}
0.75-correct

The number of strong false evidence is much less than that of false evidence.

$strong false evidence \ Rightarrow false evidence $

Miller Rabin test: 0.75 - correct, false

MillRab(n){ // Odd n > 4. When it returns true, it represents prime number, and when it returns false, it represents composite number
    a = uniform(2..n-2);
    return Btest(a,n); // Test whether n is a strong pseudo prime
}

RepeatMillRob: repeat the experiment k times, \ (1-4^{-k} \) - the correct MC algorithm

RepeatMillRob(n,k){
    for i=1 to k do
        if MillRob(n) == false then return false; // It must be a composite number
    return true;
}

If the error probability does not exceed \ (\ varepsilon \), the number of repetitions is \ (k=\left \lceil \frac{1}{2}lg\frac{1}{\varepsilon} \right \rceil \)

Determine n prime time as: \ (O(lg^3n \ lg\frac{1}{\varepsilon}) \)

2.4.3 matrix multiplication verification

Verification \ (matrix AB=C \)

Set \ (X_{1 \times n}=(a_1,a_2,\cdots,a_n),a_i\in\{0,1 \} \), then the problem is converted to \ (XAB=XC \)

Basic steps:

Calculate \ (XA \)
Calculate the product of \ (XA \) and \ (B \)
Calculation \ (XC \)

GoodProduct(A,B,C,n){
    for i=1 to n do
        x[i] = uniform(0..1);
    if XAB=XC then return true;
    else return false;
}

False, 0.5-correct

Return false must be false

Improved algorithm: repeat k times

RepeatGoodProduct(A,B,C,n,k){
    for i=1 to k do // Repeat k times
        if GoodProduct(A,B,C,n) == false then return false;
    return true;
}

False, \ (1-2^{-k} \) - correct

Give the error probability: $2^{-k}=\varepsilon$

GP(A,B,C,n,e){
    k = [lg(1/e)]; // Rounding up
    return RepeatGoodProduct(A,B,C,n,k);
}
// O(n**2log(1/e))

Programmer Think

HKUST - algorithm analysis and design - probabilistic algorithm review knowledge points

2, Probabilistic algorithm

2.1 digital algorithm

2.2 Sherwood algorithm

2.3 Las Vegas algorithm

2.3.1 eight queens problem

2.3.2 modular p square root

2.3.3 factorization of integers

2.4 Monte Carlo algorithm

2.4.1 main element problems

2.4.2 prime number determination

2.4.3 matrix multiplication verification

Hot Topics