Translation of articles on numpy official website

Posted by Kifebear on Sun, 16 Jan 2022 03:23:34 +0100

What is numpy

numpy is a scientific computing package for Python. It provides a variety of array objects, derived objects (masked arrays and matrices) and their daily fast operations, including mathematics, logic, shape operation, classification, selection, I/O, discrete Fourier transforms, basic linear algebra, basic data operation, random ink, etc.
The core of numpy package is ndarray, which packages n-dimensional array objects (the data types of each dimension are the same), which enables various operations to be executed in compiled code, thus improving performance.

NumPyarray objects differ from Python standard sequences:

NumPy objects have a fixed size when they are created, while Python's list does not. Changing the size of ndarray will delete the original object and create a new object
NumPy all data types are consistent, so the size occupied in memory is consistent. Exception: there can be object arrays (Python, including NumPy), allowing arrays of elements of different sizes.
NumPy objects make it easier to perform advanced mathematical and other types of operations on large amounts of data. Compared with Python built-in sequences, NumPy operations are more efficient and have less code.
More and more scientific and mathematical Python packages use NumPy packages. Although they support Python built-in sequences, they are converted to NumPy objects before operation. In other words, in order to effectively use the current science \ mathematics Python package, it is not enough to only understand the python built-in sequence, but also understand the use of NumPy objects.

The key to scientific computing is sequence size and speed. For a simple example, in a one-dimensional sequence, each element should be multiplied by the corresponding element of another sequence with the same length. We can iterate each element.

c = []
for i in range(len(a)):
    c.append(a[i]*b[i])

This process will output the correct answer, but if each element of lists a and b is millions, we will pay the price for Python's inefficient loop. c language can accomplish the same task faster (we assume that we don't do variable declaration, initialization and memory allocation):

for(i = 0; i < rows; i++):{
    c[i] = a[i]*b[i]
}

This solves all the overhead involved in interpreting Python code and manipulating Python objects, but sacrifices the benefits of Python coding. In addition, with the increase of data dimension, the workload also increases. For example, in 2-dimensional array, c language is encoded as

for (i = 0; i < rows; i++): {
  for (j = 0; j < columns; j++): {
    c[i][j] = a[i][j]*b[i][j];
  }
}

NumPy provides us with the optimal solution for these two schemes: when it comes to ndarray, the element by element operation is the default mode, but in fact, it is quickly executed by the C language mutated in advance. NumPy code runs close to C language, and Python based code is also very concise (more concise than Python!). The last example illustrates two advantages of NumPy: vectorization and broadcasting.
c = a*b

Vectorization

Vectorization means that there are no visible loops and indexes in the code, and they actually happen, but "secretly" after optimization - run by precompiled C language.
Vectorization advantages

Vectorized code is more concise and easy to read
The less code, the fewer bug s
The code is more similar to standard mathematical symbols (it will be easier to modify the mathematical structure of the code)
Vectorization will have more "Python" code. Without vectorization, the code will be full of inefficient and difficult to read loops.

radio broadcast

Broadcast is a term used to describe the implicit element by element behavior of an operation. Generally speaking, in NumPy, all operations including arithmetic, logic, bitwise, function, etc. are performed in this implicit element by element way - that is, broadcast. In addition, in the above example, a and b can be multidimensional arrays, scalars, arrays with the same shape, or even two arrays with different shapes (provided that the smaller array is expanded into the shape of the larger array). For more information, see broadcasting.

NumPy's ndarray fully supports object orientation. For example, ndarray is a class with multiple methods and objects. Many of these methods are reflected by NumPy's outermost namespace functions, allowing programmers to write code in any paradigm they like. This flexibility makes NumPy array and NumPyndarray the actual language for multi bit data interaction in Python.

install

The only condition required to install NumPy is python. If you don't understand Python and just want to use the simplest way, we recommend it anaconda , it includes python, NumPy, and many other Python packages for common scientific computing and data analysis.
When installing NumPy for Linux and macOS, you can use pip and conda, or use Download source . For more information, refer to https://numpy.org/install/#py....
>

donda
You can install numpy from defaults or CONDA forge.

 Best practice, use an environment rather than install in the base env
conda create -n my-env
conda activate my-env
# If you want to install from conda-forge
conda config --env --add channels conda-forge
# The actual install command
conda install numpy

pip
pip install numpy

NumPy quick start

You need to know before using NumPy Python ； matplotlib needs to be installed.

Learning objectives

Understand the difference between one-dimensional, two-dimensional and n-dimensional arrays
Learn how to apply some linear algebra to n-dimensional arrays without using the for loop
Understand the axis and shape properties of n-dimensional arrays

[basics]

The main object of NumPy is a uniform multidimensional array. An array is a table of elements (common production is numbers). All elements are of the same type and are indexed by a tuple of non negative integers. In NumPy, dimensions are called axes.
For example, in a three-dimensional space, the coordinate array of a point is [1, 2, 1], it has only one axis, and the axis contains three elements, so we can say that its length is 3. The array in the following example has 2 axes. The first axis has a length of 2 and the second axis has a length of 3

[[1., 0., 0.],
 [0., 1., 2.]]

NumPy's array class is called ndarray, alias array. Remember: NumPy Array and Python standard library class array Unlike array, the latter can only be solved as an array and has fewer functions.

The most important parameters of ndarray:

ndarray.ndim: number of array dimensions
ndarray.shape: the dimension of the array. It returns a tuple indicating the size of the array of each dimension, such as a matrix with n rows and m columns. Its shape is (n,m), and the length of the tuple is the ndim value.
ndarray.size: the number of all elements of the array, which is equal to the product of all values of shape
ndarray.dtype: the data type of the elements in the array. We can create or specify it as a standard Python data type, or use the type provided by NumPy, such as NumPy int32,numpy.int16,numpy.float64 et al
ndarray.itemsize: the byte size of each element in the array. For example, the size of float64 is 8 (64 / 8), and float64 is 4 bytes. Equivalent to ndarray dtype. itemsize.
ndarray.data:

Create array

Convert a list or tuple to an array

import numpy as np
a = np.array([2, 3, 4])
a #array([2, 3, 4])
a.dtype #dtype('int64')
b = np.array([1.2, 3.5, 5.1])
b.dtype #dtype('float64')

A common mistake is to put too many parameters, but the requirement is to put a parameter that is a sequence

a = np.array(1, 2, 3, 4)    # WRONG
Traceback (most recent call last):
  ...
TypeError: array() takes from 1 to 2 positional arguments but 4 were given
a = np.array([1, 2, 3, 4])  # RIGHT

The array method converts the sequence containing one layer of sequence into a two-dimensional array, converts the sequence containing two layers of sequence into a three-dimensional array, and so on.

b = np.array([(1.5, 2, 3), (4, 5, 6)])
b
array([[1.5, 2. , 3. ],
       [4. , 5. , 6. ]])
b.shape #(2,3)

You can specify the data type when creating an array

c = np.array([[1, 2], [3, 4]], dtype=complex)
c
array([[1.+0.j, 2.+0.j],
       [3.+0.j, 4.+0.j]])

Create an array from the original placeholder
In general, the type of array element is unknown, but its size is always unknown. Therefore, NumPy can also create arrays through the original placeholder method, which reduces the need for high-cost operations such as expanding arrays.
zero,ones,empty
The zero function can create an array of all zeros,
The ones function can create an array of all 1,
The empty function creates an array of random contents based on the cache.
The data type of these three methods is float64 by default, but can be modified through the keyword parameter dtype.

np.zeros((3, 4))
array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])
np.ones((2, 3, 4), dtype=np.int16)
array([[[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]],

       [[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]]], dtype=int16)
np.empty((2, 3))
array([[3.73603959e-262, 6.02658058e-154, 6.55490914e-260],  # may vary
       [5.30498948e-313, 3.14673309e-307, 1.00000000e+000]])

Create numeric sequence
The range function creates a sequence similar to range, but returns an array.

np.arange(10, 30, 5) #Head, tail and step length
array([10, 15, 20, 25])
np.arange(0, 2, 0.3)  # it accepts float arguments
array([0. , 0.3, 0.6, 0.9, 1.2, 1.5, 1.8])

linspace
When floating-point numbers are used as parameters of range, it is usually difficult to predict the value of each element due to accuracy problems. Therefore, it is best to use linspace function, which can receive the number of elements as a parameter.

from numpy import pi
np.linspace(0, 2, 9)  
array([0.  , 0.25, 0.5 , 0.75, 1.  , 1.25, 1.5 , 1.75, 2.  ])
x = np.linspace(0, 2 * pi, 100)
f = np.sin(x)

other

Print array

NumPy array structure looks like a nested list, but it has some characteristics

The last axis prints from left to right
Penultimate print from top to bottom
The rest are printed from top to bottom, with one blank line for each slice and the next slice
If the matrix is too large, it will automatically skip the director part and display only four weeks. To display all, we can set set_printoptions

c = np.arange(24).reshape(2, 3, 4)  # 3d array
print(c)
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]

np.set_printoptions(threshold=sys.maxsize)  # sys module should be imported

basic operation

The arithmetic operation is applied to the element and the result returns a new array of answers.

a = np.array([20, 30, 40, 50])
b = np.arange(4)
b #array([0, 1, 2, 3])
c = a - b
c #array([20, 29, 38, 47])
b**2 #array([0, 1, 4, 9])
10 * np.sin(a) #array([ 9.12945251, -9.88031624,  7.4511316 , -2.62374854])
a < 35
array([ True,  True, False, False])

*The multiplication operation is for each element. If you want to reduce the matrix operation, use the @ or dot function.

A = np.array([[1, 1],
              [0, 1]])
B = np.array([[2, 0],
              [3, 4]])
A * B     # elementwise product
array([[2, 0],
       [0, 4]])
A @ B     # matrix product
array([[5, 4],
       [3, 4]])
A.dot(B)  # another matrix product
array([[5, 4],
       [3, 4]])

+=And * = operations are operations on meta arrays and will not return a new array.

rg = np.random.default_rng(1)  # create instance of default random number generator
a = np.ones((2, 3), dtype=int)
b = rg.random((2, 3))
a *= 3
a
array([[3, 3, 3],
    [3, 3, 3]])
b += a
b
array([[3.51182162, 3.9504637 , 3.14415961],
    [3.94864945, 3.31183145, 3.42332645]])
a += b  # b is not automatically converted to integer type
Traceback (most recent call last):
 ...
numpy.core._exceptions._UFuncOutputCastingError: Cannot cast ufunc 'add' output from dtype('float64') to dtype('int64') with casting rule 'same_kind'

When the data types of the arrays operated are different, the array results are more comprehensive and accurate (upcasting)

a = np.ones(3, dtype=np.int32)
b = np.linspace(0, pi, 3)
b.dtype.name
'float64'
c = a + b
c
array([1.        , 2.57079633, 4.14159265])
c.dtype.name
'float64'
d = np.exp(c * 1j)
d
array([ 0.54030231+0.84147098j, -0.84147098+0.54030231j,
    -0.54030231-0.84147098j])
d.dtype.name
'complex128'

Many unary operations, such as summing all elements, are in the methods of the ndarray class.

a = rg.random((2, 3))
a
array([[0.82770259, 0.40919914, 0.54959369],
    [0.02755911, 0.75351311, 0.53814331]])
a.sum()
3.1057109529998157
a.min()
0.027559113243068367
a.max()
0.8277025938204418

By default, an array can be treated as a list regardless of its shape. However, in special cases, you can specify the axis parameter to operate along this axis.

b = np.arange(12).reshape(3, 4)
b
array([[ 0,  1,  2,  3],
    [ 4,  5,  6,  7],
    [ 8,  9, 10, 11]])
>>>
b.sum(axis=0)     # sum of each column
array([12, 15, 18, 21])
>>>
b.min(axis=1)     # min of each row
array([0, 4, 8])
>>>
b.cumsum(axis=1)  # cumulative sum along each row
array([[ 0,  1,  3,  6],
    [ 4,  9, 15, 22],
    [ 8, 17, 27, 38]])

Topics: numpy

Programmer Think