numpy_6 sorting, searching, counting and collection operations

Posted by jmrothermel on Mon, 17 Jan 2022 20:54:26 +0100

sort

Direct sort ()

numpy.sort(a[, axis=-1, kind='quicksort', order=None])
Axis: sort along the (axis) direction of the array. 0 means by vertical axis, 1 means by horizontal axis, and None means expand to sort. The default is - 1, which means sort along the last axis.
kind: sorting algorithm, which provides fast sorting 'quicksort', mixed sorting 'mergeport', heap sorting 'heapsort', and the default is' quicksort '.
order: the name of the field to be sorted. You can specify the field sorting. The default is None. (see below for example)

import numpy as np
dt = np.dtype([('name', 'S10'), ('age', np.int)])
a = np.array([("Mike", 21), ("Nancy", 25), ("Bob", 17), ("Jane", 27)], dtype=dt)
b = np.sort(a, order='name')
print(b)
# [(b'Bob', 17) (b'Jane', 27) (b'Mike', 21) (b'Nancy', 25)]

Get the index position of the sorted element argsort()

After sorting, you want to replace the actual result after sorting with the index position of the element.
numpy.argsort(a[, axis=-1, kind='quicksort', order=None])

import numpy as np
np.random.seed(20200612)
x = np.random.randint(0, 10, 10)
print(x)
# [6 1 8 5 5 4 1 2 9 1]

y = np.argsort(x)
print(y)
# [1 6 9 7 5 3 4 0 2 8]

print(x[y])
# [1 1 1 2 4 5 5 6 8 9]

Multi column indicators are sorted in primary and secondary order. Lexport()

numpy. Lexport (keys [, axis = - 1]) # uses key sequences to perform indirect stable sorting.
Given multiple sort keys that can be interpreted as columns in the spreadsheet, lexport returns an integer indexed array that describes the order of sorting by multiple columns. The last key in the sequence is used for the primary sort order, the penultimate key is used for the secondary sort order, and so on. The keys parameter must be a sequence of objects that can be converted to an array of the same shape. If a 2D array is provided for the keys parameter, its rows are interpreted as sort keys and sorted according to the last row, the penultimate row, etc.

import numpy as np
x = np.array([1, 5, 1, 4, 3, 4, 4])
y = np.array([9, 4, 0, 4, 0, 2, 1])
a = np.lexsort([x])
b = np.lexsort([y])
print(a) # [0 2 4 3 5 6 1]
print(x[a]) # [1 1 3 4 4 4 5]
print(b) # [2 4 6 5 1 3 0]
print(y[b]) # [0 0 1 2 4 4 9]
z = np.lexsort([y, x])
print(z)
# [2 0 4 6 5 3 1]
print(x[z])
# [1 1 3 4 4 4 5]

Sort partition () based on an index element

numpy.partition(a, kth, axis=-1, kind='introselect', order=None)
The ordering of the elements in the two partitions is undefined.
Based on the element whose index is kth, the element is divided into two parts, that is, those larger than the element are placed behind it and those smaller than the element are placed before it. The order of the two-part elements is undefined.

Get the index position argpartition() sorted based on an element

numpy.argpartition(a, kth, axis=-1, kind='introselect', order=None)

search

np.argmax()/np.argmin()

numpy.argmax(a[, axis=None, out=None]) # returns the index position of the maximum value
numpy.argmin(a[, axis=None, out=None]) # returns the index position of the minimum value

np.nonzero()

numppy.nonzero(a) # returns the index value of a non-zero element

  1. Only non-zero elements in a have index values, and those zero elements have no index values.
  2. Returns a tuple with a length of a.ndim. Each element of the tuple is an array of integers.
  3. Each array describes its index value from a dimension. For example, if a is a two-dimensional array, tuple contains two arrays. The first array describes the index value from the row dimension; The second array describes the index value from the column dimension.
  4. The NP The transfer (NP. Nonzero (x)) function can describe the index value of each non-zero element in different dimensions.
  5. Non zero values in all a are obtained by a[nonzero(a)].
# Two dimensional array
import numpy as np
x = np.array([[3, 0, 0], [0, 4, 0], [5, 6, 0]])
print(x)
# [[3 0 0]
#  [0 4 0]
#  [5 6 0]]
print(x.shape)  # (3, 3)
print(x.ndim)  # 2

y = np.nonzero(x)
print(y)
# (array([0, 1, 2, 2], dtype=int64), array([0, 1, 0, 1], dtype=int64))
print(np.array(y))
# [[0 1 2 2]
#  [0 1 0 1]]
print(np.array(y).shape)  # (2, 4)
print(np.array(y).ndim)  # 2

y = x[np.nonzero(x)]
print(y)  # [3 4 5 6]

y = np.transpose(np.nonzero(x))
print(y)
# [[0 0]
#  [1 1]
#  [2 0]
#  [2 1]]

nonzero() converts a Boolean array to an integer array for operation.

import numpy as np
x = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(x)
# [[1 2 3]
#  [4 5 6]
#  [7 8 9]]

y = x > 3
print(y)
# [[False False False]
#  [ True  True  True]
#  [ True  True  True]]

y = np.nonzero(x > 3)
print(y)
# (array([1, 1, 1, 2, 2, 2], dtype=int64), array([0, 1, 2, 0, 1, 2], dtype=int64))

y = x[np.nonzero(x > 3)]
print(y)
# [4 5 6 7 8 9]

y = x[x > 3]
print(y)
# [4 5 6 7 8 9]

np.where()

numpy.where(condition, [x=None, y=None])
1) If the condition is satisfied, output x, if not, output y.
2) If there is only condition but no x and y, the coordinates of elements satisfying the condition (i.e. non-0) are output (equivalent to numpy.nonzero). The coordinates here are given in the form of tuples. Usually, the output tuple contains several arrays corresponding to the dimensional coordinates of qualified elements.

numpy.searchsorted(a, v[, side='left', sorter=None]) Find indices where elements should be inserted to maintain order.
a: One dimensional input array. When the sorter parameter is None, a must be an ascending array; Otherwise, sorter cannot be empty. It stores the index of the elements in a to reflect the ascending arrangement of the a array.
v: Insert the value of a array, which can be a single element, list or ndarray.
side: query direction. When it is left, the subscript of the first qualified element will be returned; When it is right, the subscript of the last eligible element will be returned.
sorter: the one-dimensional array stores the index of a array elements, and the corresponding elements of index are in ascending order.

import numpy as np
x = np.array([0, 1, 5, 9, 11, 18, 26, 33])
y = np.searchsorted(x, 15)
print(y)  # 5
import numpy as np
x = np.array([0, 1, 5, 9, 11, 18, 26, 33])
np.random.shuffle(x) #Randomly disorder array order
print(x)  # [33  9 11 26 18  5  1  0]

x_sort = np.argsort(x)
print(x_sort)  # [7 6 5 1 2 4 3 0]

y = np.searchsorted(x, [-1, 0, 11, 15, 33, 35], sorter=x_sort)
print(y)  # [0 0 4 5 7 8]

y = np.searchsorted(x, [-1, 0, 11, 15, 33, 35], side='right', sorter=x_sort)
print(y)  # [0 1 5 5 8 8]

count

numpy.count_nonzero(a, axis=None) # returns the number of non-zero elements in the array.

Collection operation

Construct a set (de duplication) numpy unique()

numpy.unique(ar, return_index=False, return_inverse=False, return_counts=False, axis=None)
return_index=True # indicates that the position of the new list element in the old list is returned.
return_ Reverse = true # indicates that the position of the old list element in the new list is returned.
return_counts=True # indicates the number of times the new list element appears in the old list.

import numpy as np
x = np.array(['a', 'b', 'b', 'c', 'a'])
u, index = np.unique(x, return_index=True)
print(u)  # ['a' 'b' 'c']
print(index)  # [0 1 3]
print(x[index])  # ['a' 'b' 'c']

x = np.array([1, 2, 6, 4, 2, 3, 2])
u, index = np.unique(x, return_inverse=True)
print(u)  # [1 2 3 4 6]
print(index)  # [0 1 4 3 1 2 1]
print(u[index])  # [1 2 6 4 2 3 2]

u, count = np.unique(x, return_counts=True)
print(u)  # [1 2 3 4 6]
print(count)  # [1 3 1 1 1]

Supplement: python_set() function

set(iterable) is used to create an empty set or convert an iteratable object into an unordered set.
iterable – iteratable objects to be converted into unordered sets, including lists, strings, dictionaries, etc. Can be omitted. When this parameter is omitted, an empty collection will be created without any elements

  1. When parameters are omitted: create an empty set, assign it to a variable, and output the type and value of the variable
a = set() 
print(type(a)) # <class 'set'>
print(a) # set()
# At this point, the variable a is a collection type without any elements.
  1. Convert string to collection
a = "Hello World !"
b = set(a)
print(b) # {'l', 'e', 'H', '!', 'o', 'd', 'r', ' ', 'W'}
  1. Convert list to collection
a = []
b = [21,7, 21, 22, 530, 'wdf']
print(set(a)) # set()
print(set(b)) # {7, 530, 21, 22, 'wdf'}
  1. Convert tuples to collections:
a = ()
b = (7, 7, 23, 22, 530, 'wdf')
print(set(a)) # set()
print(set(b)) # {'wdf', 7, 530, 22, 23}
  1. Convert dictionary to collection:
a = {}
b = {'China': 'Beijing', 'Japan': 'Tokyo', 'Mongolia': 'Ulan Bator'}
print(set(a)) # set()
print(set(b)) # {'China', 'Japan', 'Mongolia'}

Note: when converting a dictionary into a collection, only the key s in the dictionary are included.

  1. Iteratable object with multiple repeating elements
a = 'uuuvvvaaa'
b = [1, 1, 1]
c = ('sdf', 'sdf', 'sdf')
d = {1: 12, 2: 23, 1: 345}
print(set(a)) # {'a', 'v', 'u'}
print(set(b)) # {1}
print(set(c)) # {'sdf'}
print(set(d)) # {1, 2}

matters needing attention:

  1. When using the parametric function of set, the parameter must be an iteratable object. Python throws an exception when the parameter is not an iteratable object. For example:
a = 1.23
print(set(a))

  1. After conversion to a set, the sequence is random and may be different from the sequence before conversion.
  2. Since the collection does not contain duplicate elements, the duplicate elements in the previous iterative sequence will be deleted after using the set() function.
  3. To convert a dictionary into a set, you just include the key in the set. If you want to convert the values in the dictionary into a collection, you can use the dictionary function values()
    --------
    python_set() function part reference link: https://blog.csdn.net/TCatTime/article/details/82312600

Boolean operation numpy in1d()

numpy.in1d(ar1, ar2, assume_unique=False, invert=False)
Whether the preceding array is included in the following array, and returns a Boolean value. The returned value is for the array of the first parameter, so the dimension is consistent with the first parameter, and the Boolean value corresponds to the element position of the array one by one.

import numpy as np
test = np.array([0, 1, 2, 5, 0])
states = [0, 2]
mask = np.in1d(test, states)
print(mask)  # [ True False  True False  True]
print(test[mask])  # [0 2 0]

mask = np.in1d(test, states, invert=True)
print(mask)  # [False  True False  True False]
print(test[mask])  # [1 5]

Find the intersection NP of two sets intersect1d(ar1, ar2)

numpy. Intersect1d (AR1, ar2, estimate_unique = false, return_indexes = false) # find the uniqueness of two arrays + find the intersection + sort function

import numpy as np
from functools import reduce

x = np.intersect1d([1, 3, 4, 3], [3, 1, 2, 1])
print(x)  # [1 3]

x = np.array([1, 1, 2, 3, 4])
y = np.array([2, 1, 4, 6])
xy, x_ind, y_ind = np.intersect1d(x, y, return_indices=True)
print(x_ind)  # [0 2 4]
print(y_ind)  # [1 0 2]
print(xy)  # [1 2 4]
print(x[x_ind])  # [1 2 4]
print(y[y_ind])  # [1 2 4]

x = reduce(np.intersect1d, ([1, 3, 4, 3], [3, 1, 2, 1], [6, 3, 4, 2]))
print(x)  # [3]

Supplement: python_reduce() function

from functools import reduce
reduce(function, iterable[, initializer])
Function – function with two parameters
Iteratable – iteratable object
initializer – optional, initial parameter
The function performs the following operations on all data in a data set (linked list, tuple, etc.): first operate the first and second elements in the set with the function (with two parameters) passed to reduce, then operate with the third data with the function function, and finally get a result.

Find the union of two sets numpy union1d(ar1, ar2)

numpy.union1d(ar1, ar2) # calculates the union of two sets, uniqueness and sorting.

Find the difference set of two sets numpy setdiff1d(ar1, ar2)

numpy.setdiff1d(ar1, ar2, assume_unique=False) # set difference, that is, the element exists in the first array and does not exist in the second array.

import numpy as np
a = np.array([1, 2, 3, 2, 4, 1])
b = np.array([3, 4, 5, 6])
x = np.setdiff1d(a, b)
print(x)  # [1 2]

Find the XOR numpy of two sets setxor1d(ar1, ar2)

numpy.setxor1d(ar1, ar2, assume_unique=False) Find the set exclusive-or of two arrays. # The complement of the intersection of two sets -- a set of elements in two arrays that each has independently.

import numpy as np
a = np.array([1, 2, 3, 2, 4, 1])
b = np.array([3, 4, 5, 6])
x = np.setxor1d(a, b)
print(x)  # [1 2 5 6]

Topics: Python Algorithm