Data structure and algorithm [Python implementation] search and basic sorting

Posted by kidsleep on Tue, 04 Jan 2022 18:53:04 +0100

1, Hanoi Tower problem

```def hanoi(n,a,b,c):  #Move from a through b to c
if n>0:
hanoi(n-1,a,c,b) #Move from a through b to c
print("disc%d moving from %s to %s" %(n,a,c))
hanoi(n-1,b,a,c)  #Move from b through a to c

hanoi(2,'A','B','C')```

The recurrence formula h(x)=2h(x-1)+1 is about equal to the nth power of 2

2, Search

1. Sequential search

```def linear_search(li,val):
for ind,v in enumerate(li):
if v == val:
return ind
else:
return None```

2. Dichotomy search

```def binary_search(li,val):
left = 0
right = len(li) - 1
while left <= right:  #Candidate area has value
mid = (left + right) // 2 # divide by 2
if li[mid] == val:
return mid
elif li[mid] > val:  #The value to be found is to the left of mid
right = mid - 1
else:               #The value to be found is to the right of mid
left = mid + 1
else:

Time complexity: O(logn)

3, Sort

Primary sort:

1. Bubble sorting

Sorting in place does not need to occupy new memory space.

Every two adjacent numbers in the list. If the front is larger than the back, exchange these two numbers;

After a sequence is completed, the unordered area decreases by one number and the ordered area increases by one number.

```def bubble_sort(li):  #Ascending arrangement
for i in range(len(li)-1):  #The i-th trip starts from 0
exchange = False
for j in range(len(li)-i-1):  #Pointer, n-i-1 positions per trip
if li[j] > li[j+1]:
li[j],li[j+1] = li[j+1],li[j]
exchange = True
print(li)
if not exchange:
return```

Time complexity O(n^2)

2. Select sort

The minimum number of records to be sorted in a row is placed in the first position

Once again, sort the smallest number in the unordered area of the record list and put it in the second position

... until the end of sorting

```def select_sort(li):
for i in range(len(li)-1):  #Trip i
min_loc = i
for j in range(i+1,len(li)):  #Traversal range
if li[j] < li[min_loc]:  #If it is the smallest, save the subscript to min_loc
min_loc = j
li[i],li[min_loc] = li[min_loc],li[i]  #The minimum value and the first number of disordered areas exchange positions
print(li)```

Time complexity O(n^2)

3. Insert sort

At the beginning, there is only one card in your hand (ordered area)

Touch a card from the disordered area each time and insert it into the correct position of the existing card in your hand

```def insert_sort(li):
for i in range(1,len(li)):  #Indicates the subscript of the touched card
tmp = li[i]
j = i - 1  #j refers to the subscript of the card in hand traversing from right to left
while li[j] > tmp and j >= 0:  #Until you find a card smaller than the one you touch
li[j+1] = li[j]  #Move one bit to the right
j -= 1
li[j+1] = tmp
print(li)```

Time complexity O(n^2)

3. Quick sort

Take the first element p (the first element) and return the element p

The list is divided into two parts by p. the left is smaller than p and the right is larger than p

Recursive completion sorting

```def partition(li,left,right):#One time fast sorting of homing function
tmp = li[left]
while left < right:
while left < right and li[right] >= tmp:  #From the right, find a number smaller than tmp and put it in the space of left
right -= 1   #One to the left
li[left] = li[right]  #Put the value on the right into the space on the left

while left < right and li[left] <= tmp:
left += 1
li[right] = li[left]  #Put the value on the left into the space on the right
li[left] = tmp  #Reset the original value
return left  #Put back mid, that is, the value of left or right

def quick_sort(li,left,right):
if left < right:   #There are at least two elements
mid = partition(li,left,right)
quick_sort(li,left,mid-1)
quick_sort(li,mid+1,right)   #Recursive call mid is divided into left and right parts```

Time complexity: O(nlogn)

Worst case: O(n^2)

Avoidance method: you can mess up the list first, or select the first number at random instead of the first number

4. Heap sort

Storage mode of binary tree: chain storage mode and sequential storage mode

Sequential storage mode:

Parent node i, left child node 2i+1, right child node 2i+2

Child node i, parent node (i-1) // 2

Heap: is a special complete binary tree structure

Large root heap: a complete binary tree that satisfies that any node is larger than its child node

Small root heap: a complete binary tree, satisfying that any node is smaller than its child node

One-time downward adjustment: the left and right subtrees of a node are heaps, but they are not heaps. They can be turned into a heap by one-time downward adjustment

Heap sort process:

(1) Build heap (any node is larger / smaller than its child node)

(2) Start counting by size: get the top element of the heap, which is the largest element

(3) Remove the top of the heap and put the last element of the heap on the top of the heap. At this time, the conditions for one-time downward adjustment are met, and the heap can be reordered through one-time adjustment

(4) The top element is the second largest element

(5) Repeat step 3 until the reactor becomes empty

Build heap:

First look at the last non leaf node and look up layer by layer

```def sift(li,low,high):   #One time downward adjustment algorithm
"""
:param li: list
:param low: Heap top position of the heap (root node)
:param high: Position of the last element of the heap
:return:
"""
i = low   #i initially points to the root node
j = 2 * i + 1  #j started as a left child
tmp = li[low]  #Store the top of the pile
while j <= high:  #Cycle as long as there are several j positions
if j+1 <= high and li[j+1] > li[j]:  #If the right child is larger than the left child, point j to the right child
j = j + 1   #Point j to the right child
if li[j] > tmp:
li[i] = li[j]  #Move the large number to the position of i
i = j   #i look down one floor
j = 2 * i + 1
else:   #tmp is bigger. Put tmp at i
li[i] = tmp  #Put tmp in the position of the parent node of a certain layer
break
else:
li[i] = tmp  #j> After high, put the tmp into the leaf node

def heap_sort(li):
n = len(li)
#Build heap
for i in range((n-2)//2,-1,-1):
#i represents the subscript of the root of the adjusted part when creating the heap, starting from the root node of n-1, in reverse order
sift(li,i,n-1)  #low is the root node of the adjustment part, and high is placed directly on the last leaf node
#Heap building is complete and counting begins
for i in range(n-1,-1,-1):
#i always points to the last number in the heap
li[0],li[i] = li[i],li[0]  # The last number is exchanged with the top of the heap
sift(li,0,i-1)  #Adjust the low of the whole heap to 0, the current last number is i-1, and i-1 is the new high
#Sort complete

li = [i for i in range(100)]
import random
random.shuffle(li)
print(li)
heap_sort(li)
print(li)```

Time complexity: half the complexity of sift process, which is O(logn); Entire heap sort O(nlogn)

Actual performance: quick sort is better than heap sort

Python heap sorting built-in module: heapq

Common functions:

Heap ify (x): build heap (small root heap)

heappop(heap): pop up a minimum number at a time

eg:

```import heapq   #Q - > queue priority queue (small first out / large first out)
import random

li = [i for i in range(100)]
random.shuffle(li)

print(li)
heapq.heapify(li)  #Build pile
print(li)

for i in range(len(li)):
print(heapq.heappop(li),end=',')```

topk problem: there are n numbers. Design the algorithm to get the number with the largest top k (k < n)

Solution:

Slice after sorting O(nlogn)

Bubble / insert / select sort O(kn)

Using heap sorting: complexity O(nlogk) is faster

Take the first k elements of the list to create a small root heap, and the top of the heap is the largest number k at present

Traverse the original list backward in turn. If the element in the list is less than the top of the heap, the element is ignored; If it is larger than the top of the heap, replace the top of the heap with this element and adjust the heap once

After traversing all the elements in the list, pop up the top of the heap in reverse order

```#topk changes according to heap sort
def sift(li,low,high):   #One time downward adjustment algorithm
"""
:param li: list
:param low: Heap top position of the heap (root node)
:param high: Position of the last element of the heap
:return:
"""
i = low   #i initially points to the root node
j = 2 * i + 1  #j started as a left child
tmp = li[low]  #Store the top of the pile
while j <= high:  #Cycle as long as there are several j positions
if j+1 <= high and li[j+1] < li[j]:  #If the right child is smaller than the left child, point j to the right child
j = j + 1   #Point j to the right child
if li[j] < tmp:
li[i] = li[j]  #Move the small number to the position of i
i = j   #i look down one floor
j = 2 * i + 1
else:   #tmp is bigger. Put tmp at i
li[i] = tmp  #Put tmp in the position of the parent node of a certain layer
break
else:
li[i] = tmp  #j> After high, put the tmp into the leaf node

def topk(li,k):
heap = li[0:k]
for i in range((k-2)//2,-1,-1):
sift(li,i,k-1)
#1. Build pile
for i in range(k,len(li)-1):
if li[i] > heap[0]:
heap[0] = li[i]
sift(heap,0,k-1)
#2. Traverse all elements
for i in range(k-1,-1,-1):
heap[0], heap[i] = heap[i], heap[0]
sift(heap, 0, i - 1)
#3. Count out
return heap

import random
li = list(range(100))
random.shuffle(li)

print(topk(li,10))```

5. Merge sort

One time merge: suppose the list is divided into two ordered segments, which are collectively called a sequential table

Merge sort - use one merge:

(1) Decomposition: divide the list smaller and smaller until it is divided into one element

(2) Termination condition: when there is only one element, it must be ordered

(3) Merge: merge two sequential tables. The list becomes larger and larger

```def merge(li,low,mid,high): #The list is divided into two paragraphs
i = low
j = mid + 1   #Start pointer at both ends
ltmp = []  #A temporary list
while i <= mid and j <= high:  #As long as there are both left and right paragraphs
if li[i] < li[j]:
ltmp.append(li[i])
i += 1
else:
ltmp.append(li[j])
j += 1
#After the while execution, some of the two paragraphs must be numbered
while i <= mid:
ltmp.append(li[i])
i += 1
while j <= high:
ltmp.append(li[j])
j += 1
li[low:high+1] = ltmp   #Slice the right package and write the temporary list

def merge_sort(li,low,high):
if low < high:   #There are at least two elements in the segment, recursive
mid = (low + high) //2
merge_sort(li,low,mid)
merge_sort(li,mid+1,high)
merge(li,low,mid,high)```

Time complexity: once merge O(n), decompose + merge the whole: O(nlogn)

Space complexity: O(n) opens temporary list storage