Python numpy detailed tutorial

Posted by jefffan24 on Sun, 23 Jan 2022 06:58:30 +0100

Introduction to Numpy

NumPy(Numerical Python) is a basic library of scientific computing, which provides a large number of functions related to scientific computing, such as data statistics, random number generation and so on. The core type it provides is multi-dimensional array type (ndarray), which supports a large number of dimensional array and matrix operations. Numpy support vector handles ndarray objects to improve program operation speed.

Numpy (Numerical Python) is an open source Python scientific computing library, which is used to quickly process arrays with arbitrary dimensions.

Numpy supports common array and matrix operations. For the same numerical calculation task, using numpy is much simpler than using Python directly.

Numpy uses the ndarray object to handle multidimensional arrays, which is a fast and flexible big data container.

Introduction to ndarray

NumPy provides an N-dimensional array type, the ndarray,
which describes a collection of "items" of the same type.

NumPy provides an N-dimensional array type ndarray, which describes a collection of "items" of the same type.

Store with ndarray:

import numpy as np 
# Create ndarray 
score = np.array( 
[[80, 89, 86, 67, 79], 
[78, 97, 89, 67, 81], 
[90, 94, 78, 67, 74], 
[91, 91, 90, 67, 69], 
[76, 87, 75, 67, 86], 
[70, 79, 84, 67, 84], 
[94, 92, 93, 67, 64], 
[86, 85, 83, 67, 80]]) 
score

Return result:

array([[80, 89, 86, 67, 79], 
	  [78, 97, 89, 67, 81], 
	  [90, 94, 78, 67, 74], 
	  [91, 91, 90, 67, 69], 
	  [76, 87, 75, 67, 86], 
	  [70, 79, 84, 67, 84], 
	  [94, 92, 93, 67, 64], 
	  [86, 85, 83, 67, 80]])

put questions to:

Python lists can be used to store one-dimensional arrays, and multi-dimensional arrays can be realized through the nesting of lists. So why use Numpy's ndarray?

Comparison of operation efficiency between ndarray and Python native list

Here we realize the benefits of ndarray by running a piece of code

import random 
import time 
import numpy as np 
a = [] for i in range(100000000): 
a.append(random.random()) 
# Through the% time magic method, view the time taken for the current line of code to run once 
%time sum1=sum(a) 
b=np.array(a) 
%time sum2=np.sum(b)

The first time shows the time calculated using native Python, and the second content is the time calculated using numpy:

CPU times: user 852 ms, sys: 262 ms, total: 1.11 s 
Wall time: 1.13 s

CPU times: user 133 ms, sys: 653 µs, total: 133 ms 
Wall time: 134 ms

From this, we can see that the calculation speed of ndarray is much faster and saves time.

The biggest feature of machine learning is a large number of data operations. Without a fast solution, python may not achieve good results in the field of machine learning.

Numpy is specially designed for ndarray operations and operations, so the storage efficiency and input-output performance of arrays are much better than those of nested lists in Python. The larger the array, the more obvious the advantages of numpy.

Why can darray be so fast?

Advantages of ndarray

Memory block style

How is ndarray different from the native python list? Please see a figure:

We can see from the figure that when ndarray stores data, the data and data addresses are continuous, which makes the batch operation of array elements faster.

This is because the types of all elements in the ndarray are the same, and the element types in the python list are arbitrary. Therefore, the memory of the ndarray can be continuous when storing elements, while the python native list can only find the next element through addressing. Although this also leads to the fact that the ndarray of numpy is inferior to the python native list in terms of general performance, in scientific calculation, Numpy's ndarray can eliminate many circular statements, and the code is much simpler than Python's native list.

ndarray supports parallelization (vectorization)

Numpy has built-in parallel computing function. When the system has multiple cores, numpy will automatically perform parallel computing when doing some computing

Much more efficient than pure Python code

The bottom layer of Numpy is written in C language, and the GIL (global interpreter lock) is released internally. Its operation speed on the array is not limited by the Python interpreter. Therefore, its efficiency is much higher than that of pure Python code.

Numpy installation

The easiest way to install NumPy is to use the pip tool. The syntax format is as follows:

pip install numpy

arange function test environment installation

import numpy as np 
a=np.arange(10) 
print(a)

In the above program, only an orange function in the numpy module is involved. This function can pass in an integer type parameter n. The return value of the function looks like a list. In fact, the return value type is numpy ndarray. This is a unique array type in numpy. If the parameter value passed into the range function is n, the range function returns an array of ndarray types from 0 to n-1.

N-dimensional array - ndarray

array creation

The array function of numpy module can generate multi-dimensional arrays. For example, if you want to generate a two-dimensional array, you need to pass a list type parameter to the array function. Each list element is a one-dimensional array of ndarray type as the row of the two-dimensional array. In addition, the number of elements in each dimension of the array (in tuple form) can be obtained through the shape attribute of the ndarray class, or the number of elements in each dimension can be obtained in the form of shape[n], where n is the dimension, starting from 0.

The syntax format is as follows:

Use of ndmin parameter

import numpy as np 
a=np.array([1,2,3,4,5,6],ndmin=3) 
print(a)

Use of dtype parameters

a=np.array([1,2,3,4,5,6],dtype=complex) 
print(a)

Properties of ndarray

Array properties reflect the information inherent in the array itself.

Shape of ndarray

First create some arrays.

# Create arrays of different shapes 
>>> a = np.array([[1,2,3],[4,5,6]]) 
>>> b = np.array([1,2,3,4]) 
>>> c = np.array([[[1,2,3],[4,5,6]],[[1,2,3],[4,5,6]]])

Print out shapes separately

>>> a.shape 
>>> b.shape 
>>> c.shape 
(2, 3) # Two dimensional array 
(4,) # One dimensional array 
(2, 2, 3) # 3D array

How to understand the shape of an array?

2D array:

3D array:

Type of ndarray

>>> type(score.dtype) 
<type 'numpy.dtype'>

dtype is numpy dtype type, first look at the types of arrays:

Specify the type when creating an array

>>> a = np.array([[1, 2, 3],[4, 5, 6]], dtype=np.float32) >>> a.dtype 
dtype('float32') 

>>> arr = np.array(['python', 'tensorflow', 'scikit-learn', 'numpy'], dtype = np.string_) 
>>> arr 
array([b'python', b'tensorflow', b'scikit-learn', b'numpy'], dtype='|S12')

Note: if not specified, integer defaults to int64 and decimal defaults to float64

summary

Basic properties of array

basic operation

Method of generating array

Generate an array of 0 and 1

np.ones(shape, dtype)
np.ones_like(a, dtype)
np.zeros(shape, dtype)
np.zeros_like(a, dtype)

ones = np.ones([4, 8])
ones

array([[1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1.]])

np.zeros_like(ones)

array([[0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0.]])

np.empy()

numpy. The empty method is used to create an uninitialized array with a specified shape and data type. The value of the element in the array is the value of the previous memory:

numpy.empty(shape, dtype = float, order = 'C')

empty create

x=np.empty([3,2],dtype=int) 
print(x)

Generate from existing array

Generation mode

np.array(object, dtype)
np.asarray(a, dtype)

a = np.array([[1,2,3],[4,5,6]]) 
# Create from an existing array 
a1 = np.array(a) 
# Equivalent to the form of an index, there is no real creation of a new one 
a2 = np.asarray(a)

About the difference between array and asarray

Generate a fixed range array

np.linspace (start, stop, num, endpoint)

np.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)

Create an isometric array - specify the number
Parameters:
- start: the starting value of the sequence
- stop: the end value of the sequence
- num: the number of equally spaced samples to be generated. The default value is 50
- endpoint: whether the sequence contains the stop value. The default value is true
- retstep: if True, the spacing will be displayed in the generated array, otherwise it will not be displayed.
- Dtype: data type of ndarray

# Generate equally spaced arrays 
np.linspace(0, 100, 11)

Return result:

array([ 0., 10., 20., 30., 40., 50., 60., 70., 80., 90., 100.])

np.arange(start,stop, step, dtype)

Create an isometric array - specify the step size
parameter
- Step: step size. The default value is 1

np.arange(10, 50, 2)

Return result:

array([10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48])

np.logspace(start,stop, num)

Create an isometric sequence
Parameters:
- num: the number of proportional series to be generated. The default value is 50

# Generate 10^x 
np.logspace(0, 2, 3)

Return result:

array([ 1., 10., 100.])

Generate random array

Introduction to using module: NP Random module

Normal distribution

What is normal distribution?

Normal distribution is a probability distribution. The normal distribution has two parameters μ and σ Distribution of continuous random variables, first parameter μ Is the mean value of random variables subject to normal distribution, the second parameter σ Is the standard deviation of this random variable, so the normal distribution is recorded as n( μ，σ ).

Application of normal distribution

The probability distribution of many random variables in life, production and scientific experiments can be approximately described by normal distribution.

Normal distribution characteristics

μ Determines its position and its standard deviation σ Determines the magnitude of the distribution. When μ = 0 σ = The normal distribution at 1 is the standard normal distribution.

How does the standard deviation come from?

variance
It is a measure of the degree of dispersion when probability theory and statistical variance measure a set of data

Where M is the average value and n is the total number of data, σ Is the standard deviation, σ ^ 2 it can be understood that a whole is variance

Significance of standard deviation and variance
It can be understood as a measure of the degree of dispersion of data

Normal distribution creation method

np.random.randn(d0, d1, ..., dn)
Function: return one or more sample values from standard normal distribution
np.random.normal(loc=0.0, scale=1.0, size=None)

loc: float
The mean value of this probability distribution (corresponding to the center of the whole distribution)
scale: float
The standard deviation of this probability distribution (corresponding to the width of the distribution, the larger the scale, the fatter, and the smaller the scale, the thinner)
size: int or tuple of ints
The output shape is None by default, and only one value is output
np.random.standard_normal(size=None)
Returns an array of standard normal distributions for a specified shape.

Example 1: generate 100000000 normal distribution data with mean value of 1.75 and standard deviation of 1

x1 = np.random.normal(1.75, 1, 100000000)

Return result:

array([2.90646763, 1.46737886, 2.21799024, ..., 1.56047411, 1.87969135, 0.9028096 ])

# 1. Create canvas
plt.figure(figsize=(20, 8), dpi=100)

# 2. Draw image
plt.hist(x1, 1000)  # 1000 groups

# 3. Display image
plt.show()

For example, we can simulate the data of the rise and fall of a group of stocks

Example 2: randomly generate the daily increase data of four stocks for one week

How to obtain the weekly (5-day) fluctuation data of 4 stocks?

Randomly generated rise and fall within a normal distribution, such as mean 0 and variance 1

Creation of stock fluctuation data

# Create 5-day rise and fall data of 4 stocks in line with normal distribution
stock_change = np.random.normal(0, 1, (4, 5)) 
stock_change

array([[-0.90442265, -1.17400313, -0.25891122, -0.60057691,  0.19973249],
       [-0.23449971, -1.01396609, -0.08161324, -1.61459823, -2.05132145],
       [ 0.0700032 ,  1.28480273, -1.17703456,  0.0929877 , -0.42675005],
       [ 1.14477016, -1.16018108,  0.76815853, -0.15971892, -1.79386421]])

# 1. Create canvas
plt.figure(figsize=(20, 8), dpi=100)

days = ['one', 'two', ]
# 2. Draw images and multi maps
for i in range(len(stock_change)):
    plt.plot(stock_change[i], label='Stock'+str(i))

# 2.1. Set x-axis scale
days = ['The first{}day'.format(i+1) for i in range(0, 4)]
plt.xticks(range(0, 4), days)

# 2.2. Set x, y axis labels
plt.xlabel('Days')
plt.ylabel('Share price')

# 2.3 setting legend
plt.legend(loc=0)

# 3. Display image
plt.show()

uniform distribution

np.random.rand(d0, d1, ..., dn)
- Returns a set of evenly distributed numbers in [0.0, 1.0).
np.random.uniform(low=0.0, high=1.0, size=None)
- Function: randomly sample from a uniform distribution [low, high]. Note that the definition domain is closed left and open right, that is, it contains low and does not contain high
- Parameter introduction:
  - low: sampling lower bound, float type, the default value is 0;
  - high: sampling upper bound, float type, the default value is 1;
  - Size: the number of output samples, which is of type int or tuple. For example, if size=(m,n,k), mnk samples will be output, and 1 value will be output by default.
- Return value: ndarray type, whose shape is consistent with that described in the parameter size.
np.random.randint(low, high=None, size=None, dtype='l')
- Randomly sample from a uniform distribution to generate an integer or N-dimensional integer array,
- Access range: if high is not None, the random integer between [low, high]) is taken; otherwise, the random integer between [0, low]) is taken.

# Generate uniformly distributed random numbers 
x2 = np.random.uniform(-1, 1, 100000000)

Return result:

array([ 0.22411206, 0.31414671, 0.85655613, ..., -0.92972446, 0.95985223, 0.23197723])

Draw a picture to see the distribution:

# 1. Create canvas
plt.figure(figsize=(20, 8), dpi=100)

# 2. Draw image
plt.hist(x2, 1000)  # 1000 groups

# 3. Display image
plt.show()

Index and slice of array

The contents of the ndarray object can be accessed and modified by indexing or slicing, just like the slicing operation of list in Python.

The ndarray array can be indexed based on the subscript of 0 - n, and set the start, stop and step parameters to cut a new array from the original array.

How to index one-dimensional, two-dimensional and three-dimensional arrays?

Direct indexing, slicing
Object [:,:] – first and last columns

Two dimensional array index method:

For example: obtain the rise and fall data of the first stock in the first three trading days

# Two dimensional array, two dimensions 
stock_change[0, 0:3]

Return result:

array([-0.03862668, -1.46128096, -0.75596237])

3D array index method:

# three-dimensional 
a1 = np.array([ [[1,2,3],[4,5,6]], [[12,3,34],[5,6,7]]]) 
# Return results 
array([[[ 1, 2, 3], [ 4, 5, 6]],
	  [[12, 3, 34], [ 5, 6, 7]]])
# Index, slice 
>>> a1[0, 0, 1] # Output: 2

Index is negative to get

print('Get last row') 
print(a[-1]) 
print('Reverse the order of rows')
print(a[::-1]) 
print('The ranks are in reverse order') 
print(a[::-1,::-1])

Shape modification

An important task in processing arrays is to change the dimensions of the array, including increasing and decreasing the dimensions of the array, as well as the transpose of the array. Numpy provides a large number of API s that can easily complete the operation of these arrays. For example, a one-dimensional array can be changed into a two-dimensional, three-dimensional or multi-dimensional array through the reshape method. The multi-dimensional array can be changed into one-dimensional array through the t ravel method or the flatten method. To change the dimension of the array, you can also directly set the shape attribute (tuple type) of the numpy array. You can also change the dimension of the array through the resize method.

ndarray.reshape(shape, order)

Returns a view with the same data field but different shape s
Rows and columns are not interchangeable

# When converting shapes, be sure to pay attention to the element matching of the array 
stock_change.reshape([5, 4]) 
stock_change.reshape([-1,10]) # The shape of the array is modified to: (2, 10), - 1: indicates that it passes through the to be calculated

ndarray.resize(new_shape)

Modify the shape of the array itself (keep the number of elements the same)
Rows and columns are not interchangeable

stock_change.resize([5, 4]) 

# View modified results 
stock_change.shape 
(5, 4)

ndarray.T

Transpose of array
Exchange the rows and columns of the array

stock_change.T.shape 
(4, 5)

ndarray.ravel

#Use the t ravel function to turn a three-dimensional b into a one-dimensional array
a1=b.ravel()
print(a1)
print('-'*30)

ndarray.flatten

#Use the flatten function to turn a two-dimensional c into a one-dimensional array 
a2=c.flatten() 
print(a2) 
print('-'*30)

Type modification

ndarray.astype(type)

Returns the array after the type is modified\

stock_change.astype(np.int32)

ndarray.tostring([order]) or ndarray tobytes([order])

Construct Python bytes that contain the original data bytes in the array

arr = np.array([[[1, 2, 3], [4, 5, 6]], [[12, 3, 34], [5, 6, 7]]]) 
arr.tostring()

Too large jupyter output may cause crash problems

If encountered

IOPub data rate exceeded. The notebook server will temporarily stop sending output to the client in order to avoid crashing it. To change this limit, set the config variable --NotebookApp.iopub_data_rate_limit.

The problem is that there is a limit on the number of bytes output in jupyer. You need to modify the configuration file:

create profile

jupyter notebook --generate-config 
vi ~/.jupyter/jupyter_notebook_config.py

Uncomment and add more

## (bytes/sec) Maximum rate at which messages can be sent on iopub before they 
# are limited. 
c.NotebookApp.iopub_data_rate_limit = 10000000

However, it is not recommended to modify it in this way. The output of jupyter will crash if it is too large

Array de duplication

np.unique()

temp = np.array([[1, 2, 3, 4],[3, 4, 5, 6]]) 
>>> np.unique(temp) 
array([1, 2, 3, 4, 5, 6])

Darray operation

Logical operation

# Generate data for 10 students and 5 courses 
>>> score = np.random.randint(40, 100, (10, 5)) 

# Take out the scores of the last four students for logical judgment 
>>> test_score = score[6:, 0:5] 

# Logical judgment. If the score is greater than 60, it is marked as True; otherwise, it is False 
>>> test_score > 60 
array([[ True, True, True, False, True], [ True, True, True, False, True], [ True, True, False, False, True], [False, True, True, True, True]]) 

# BOOL assignment, which sets the satisfying condition to the specified value - Boolean index 
>>> test_score[test_score > 60] = 1 
>>> test_score 
array([[ 1, 1, 1, 52, 1], 
	  [ 1, 1, 1, 59, 1], 
	  [ 1, 1, 44, 44, 1], 
	  [59, 1, 1, 1, 1]])

General judgment function

np.all()

# Judge whether the first two students have passed [0:2,:] 
>>> np.all(score[0:2, :] > 60) 
False

np.any()

# Judge whether the score of the first two students [0:2,:] is greater than 90 
>>> np.any(score[0:2, :] > 80) 
True

np.where (ternary operator)

By using NP Where can perform more complex operations

# Judge the top four students. In the top four courses, the score greater than 60 is set as 1, otherwise it is 0 
temp = score[:4, :4] 
np.where(temp > 60, 1, 0)

Compound logic needs to be combined with NP logical_ And and NP logical_ Or use

# Judge the top four students. In the first four courses, the score greater than 60 and less than 90 is changed to 1, otherwise it is 0 
np.where(np.logical_and(temp > 60, temp < 90), 1, 0) 

# Judge the top four students. In the first four courses, the score greater than 90 or less than 60 is changed to 1, otherwise it is 0 
np.where(np.logical_or(temp > 90, temp < 60), 1, 0)

Statistical operation

What should I do if I want to know the student's maximum score or make a small score?

statistical indicators

In the field of data mining / machine learning, the value of statistical indicators is also a way for us to analyze problems. Common indicators are as follows:

Case: student achievement statistical operation

During statistics, the value of axis is not necessarily the same. The values of different API axes in Numpy are different. Here, axis 0 represents a column and axis 1 represents a row for statistics

# Next, for the top four students, do some statistical operations 
# Specify column de statistics 
temp = score[:4, 0:5] 
print("Top four students,Maximum score of each subject:{}".format(np.max(temp, axis=0))) 
print("Top four students,Minimum score of each subject:{}".format(np.min(temp, axis=0))) 
print("Top four students,Performance fluctuation of each subject:{}".format(np.std(temp, axis=0))) 
print("Top four students,Average score of each subject:{}".format(np.mean(temp, axis=0)))

result:

Top four students,Maximum score of each subject:[96 97 72 98 89] 
Top four students,Minimum score of each subject:[55 57 45 76 77] 
Top four students,Performance fluctuation of each subject:[16.25576821 14.92271758 10.40432602 8.0311892 4.32290412] 
Top four students,Average score of each subject:[78.5 75.75 62.5 85. 82.25]

If it is necessary to calculate which student has the highest score in a subject?

np.argmax(temp, axis=)
np.argmin(temp, axis=)

print("For the top four students, the subscript of the student with the highest score in each subject:{}".format(np.argmax(temp, axis=0)))

result:

For the top four students, the subscript of the student with the highest score in each subject:[0 2 0 0 1]

Array separator

split separation

numpy. The split function divides an array into subarrays along a specific axis. The format is as follows:

numpy.split(ary, indices_or_sections, axis)

Parameter Description:

ary: array to be split
indices_or_sections: if it is an integer, the number is used to divide evenly. If it is an array, it is the position along the axis.
axis: along which dimension is the tangent, the default value is 0, and the horizontal tangent is 0. When it is 1, vertical segmentation.

split separated one-dimensional array

import numpy as np 
x=np.arange(1,9) 
a=np.split(x,4) 
print(a) 
print(a[0]) 
print(a[1]) 
print(a[2]) 
print(a[3]) 

#Passing arrays for separation 
b=np.split(x,[3,5]) 
print(b)

split separated two-dimensional array

#Import numpy
import numpy as np 
#Create two arrays 
a=np.array([[1,2,3],[4,5,6],[11,12,13],[14,15,16]]) 
print('axis=0 Vertical average separation') 
r=np.split(a,2,axis=0) 
print(r[0]) 
print(r[1]) 
print('axis=1 The horizontal direction is separated by position') 
r=np.split(a,[2],axis=1) 
print(r)

Horizontally separated array

Separated array is the inverse process of combined array. Like combined array, separated array is also divided into horizontal separated array and vertical separated array. The horizontally separated array corresponds to the horizontally combined array. Horizontal combined array is to end and connect two or more arrays horizontally, while horizontal separated array is to separate the arrays that have been combined horizontally.

The hsplit function can be used to horizontally separate arrays. The function has two parameters. The first parameter represents the array to be separated, and the second parameter represents that the array should be horizontally separated into several small arrays. Now let's take a look at an example.

The following is a 2 * 6 two-dimensional array X

Now, array X is separated into three arrays with 2 columns. However, if hsplit(X,4) is used to separate array X, an exception will be thrown. This is because array X cannot be separated into four arrays with the same number of columns. Therefore, a rule for separating arrays using hsplit function is that the second parameter value must be able to divide the number of columns of the array to be separated.

hsplit

grid=np.arange(16).reshape(4,4) 
a,b=np.hsplit(grid,2) 
print(a) 
print(b)

Vertically separated array

Vertically separating arrays is the inverse process of vertically combining arrays. Vertically combined arrays connect two or more arrays vertically, and vertically separated arrays separate the arrays that have been vertically combined.

vsplit function can be used to vertically separate arrays. The function has two parameters. The first parameter represents the array to be separated, and the second parameter represents the vertical separation of the array into several small arrays.

As an example, a 4 * 3 two-dimensional array X.

vsplit

grid=np.arange(16).reshape(4,4) 
a,b=np.vsplit(grid,[3]) 
print(a) 
print(b) 
print('tripartite'*10) 
a,b,c=np.vsplit(grid,[1,3]) 
print(a) 
print(b) 
print(c)

transpose

#transpose
#2D transpose 
a=np.arange(1,13).reshape(2,6) 
print('Original array a') 
print(a) 
print('Transposed array') 
print(a.transpose()) 
#Multidimensional array transpose 
aaa=np.arange(1,37).reshape(1,3,3,4) 
#Convert 1,3,3,4 to 3,3,4,1 
print(np.transpose(aaa,[1,2,3,0]).shape)

Inter array operation

Arithmetic function

If both objects involved in the operation are ndarray and have the same shape, the bits will be (+ - * /) operated on each other. NumPy arithmetic functions contain simple addition, subtraction, multiplication and division: add(), subtract(), multiply(), and divide().

Mathematical function

NumPy provides standard trigonometric functions: sin(), cos(), tan().

numpy. The around() function returns the rounded value of the specified number.

Aggregate function

NumPy provides many statistical functions to find the minimum element, maximum element, percentile standard deviation and variance from the array. The details are as follows:

numpy. The mean() function returns the arithmetic mean of the elements in the array. If an axis is provided, the arithmetic mean calculated along it is the sum of the elements along the axis divided by the number of elements.

numpy. The power() function takes the element in the first input array as the base and calculates its power to the corresponding element in the second input array.

Reference code:

# Import numpy module
import numpy as np

a = np.arange(9).reshape(3, 3)
b = np.array([10, 10, 10])
print('addition')
print(np.add(a, b))
print((a+b))

print('subtraction')
print(np.subtract(b, a))
print(b-a)

# Use of out parameter
y = np.empty((3, 3), dtype=np.int)
np.multiply(a, 10, out=y)
print(y)

# Mathematical function
a = np.array([0, 30, 60, 90])
print(np.sin(a))

# around ceil floor
a = np.array([1.0, 4.55, 123, 0.567, 25.532])
print('around:', np.round(a))
print('ceil:', np.ceil(a))
print('floor', np.floor(a))

# Statistical function
# power
a = np.arange(1, 13).reshape(3, 4)
print('Primitive function a')
print(a)
print('power', np.power(a, 2))

# Use of out in power
x = np.arange(5)
y = np.zeros(10)
np.power(2, x, out=y[:5])
print(y)

# median()
# Median of one-dimensional array
a = np.array([4, 3, 2, 5, 2, 1])  # Sort the array [1,2,2,3,4,5] the number of elements in the array is even, and the median refers to the average of the middle two numbers
print(np.median(a))

a=np.array([4, 3, 2, 5, 2])  # Sort the array [2,2,3,4,5] the number of elements in the array is odd, and the median refers to the number in the middle
print(np.median(a))

# The axis of a two-dimensional array should be specified through axis
a = np.arange(1, 13).reshape(3, 4)
print(a)
print('vertical direction', np.median(a, axis=0))
print('horizontal direction', np.median(a, axis=1))

# mean average
# One dimensional array
a = np.array([4, 3, 2, 5, 2])
print(np.mean(a))

# The two-dimensional array axis specifies the average of the axes
a = np.arange(13).reshape(3, 4)
print(a)
print('axis=0 vertical direction', np.mean(a, axis=0))
print('axis=1 horizontal direction', np.mean(a, axis=1))

# sum() max() min()
a = np.array([4, 3, 2, 5, 2])
print('max:', np.max(a))
print('sum:', np.sum(a))
print('min:', np.min(a))

# argmax argmin: gets the index of the highest value
print('argmin:', np.argmin(a))
print('argmax:', np.argmax(a))

Operation of array and number

arr = np.array([[1, 2, 3, 2, 1, 4], [5, 6, 1, 2, 3, 1]]) 
arr + 1 
arr / 2 

# You can compare the operations of python list to see the difference 
a = [1, 2, 3, 4, 5] 
a * 3

Array and array operation

arr1 = np.array([[1, 2, 3, 2, 1, 4], [5, 6, 1, 2, 3, 1]]) 
arr2 = np.array([[1, 2, 3, 4], [3, 4, 5, 6]])

Broadcasting mechanism

When the array is vectorized, the shape of the array is required to be equal. When arrays with unequal shapes perform arithmetic operations, a broadcast mechanism will appear. This mechanism will expand the array to make the shape attribute value of the array the same. In this way, vectorization operation can be carried out. The following is an example:

arr1 = np.array([[0],[1],[2],[3]]) 
arr1.shape 
# (4, 1) 

arr2 = np.array([1,2,3]) 
arr2.shape 
# (3,) 
arr1+arr2 
# The result is: 
array([[1, 2, 3], 
	   [2, 3, 4], 
	   [3, 4, 5], 
	   [4, 5, 6]])

In the above code, array arr1 is 4 rows and 1 column, and arr2 is 1 row and 3 columns. The two arrays should be added. According to the broadcast mechanism, both arrays arr1 and arr2 will be extended, so that both arrays arr1 and arr2 become 4 rows and 3 columns.

This sentence is the core of understanding broadcasting. Broadcasting mainly occurs in two cases: one is that the dimensions of two arrays are not equal, but the axis lengths of their trailing edge dimensions are consistent, and the other is that the length of one side is 1.

The broadcast mechanism realizes the operation of two or more arrays. Even if the shape s of these arrays are not exactly the same, it only needs to meet any of the following conditions.

If the axis lengths of the trailing dimension (i.e. the dimension from the end) of the two arrays match,
Or one of them has a length of 1.

The broadcast mechanism will be on the dimension of missing and / or length 1.

The broadcast mechanism needs to expand the array with small dimension to make it the same as the shape value of the array with the largest dimension, so as to use element level functions or operators for operation.

If this is the case, there is no match:

A (1d array): 10 
B (1d array): 12 
A (2d array): 2 x 1 
B (3d array): 8 x 4 x 3

These two can

arr1 = np.array([[1, 2, 3, 2, 1, 4], [5, 6, 1, 2, 3, 1]]) 
arr2 = np.array([[1], [3]])

Array splicing

Horizontal array combination

Through the hstack function, two or more arrays can be combined horizontally to form an array. What is the horizontal combination of arrays.

Now there are two 2 * 3 arrays A and B.

As you can see, array A and array B are connected horizontally to form a new array. This is the horizontal combination of arrays. The effect of horizontal combination of multiple arrays is similar. However, the horizontal combination of arrays must meet one condition, that is, the number of rows of all arrays participating in the horizontal combination must be the same, otherwise the horizontal combination will throw an exception.

Vertical array combination

Through vstack function, two or more arrays can be vertically combined to form an array. What is the vertical combination of arrays?

Now take two 2 * 3 arrays A and B as an example

concatenate

numpy. The concatenate function is used to connect two or more arrays of the same shape along the specified axis. The format is as follows:

numpy.concatenate((a1, a2, ...), axis)

Parameter Description:
a1, a2, ...: Arrays of the same type
Axis: the axis along which the array is connected. The default value is 0

concatenate to realize the splicing of arrays

a=np.array([[1,2,3],[4,5,6]]) 
print(a) 
b=np.array([['a','b','c'],['d','e','f']]) 
print(b) 
print(np.concatenate([a,b]))

print('Vertical splicing is equivalent to vstack') 
print(np.concatenate([a,b],axis=0)) 
print('Horizontal splicing is equivalent to hstack') 
print(np.concatenate([a,b],axis=1))

hstack&vstack

numpy. Hsstack, which generates arrays by stacking horizontally.
numpy.vstack generates arrays by stacking vertically.

vstack and hstack realize array splicing

print('x Axial direction and vertical stacking') 
print(np.vstack([a,b])) 
print('y Axial direction and horizontal stacking') 
print(np.hstack([a,b]))

Note that if the number of rows and columns spliced is different, an error will be reported

Splicing of 3D arrays

aa=np.arange(1,37).reshape(3,4,3) 
print(aa) 
bb=np.arange(101,137).reshape(3,4,3) 
print(bb) 
print('axis=0'*10) 
print(np.concatenate((aa,bb),axis=0)) #6 4 3 
print('axis=1'*10) 
print(np.concatenate((aa,bb),axis=1)) #3,8,3 
print('axis=2'*10) 
print(np.concatenate((aa,bb),axis=2)) #3,4,6

axis=0 can be replaced with vstack
axis=1 can be replaced with hstack
axis=2 can be replaced with dstack

Mathematics: matrix

matrix

The difference between matrix, English matrix and array must be two-dimensional, but array can be multi-dimensional.

As shown in the figure: This is 3 × 2 matrix, that is, 3 rows and 2 columns. If m is row and n is column, then m × n is 3 × two

The dimension of a matrix is the number of rows × Number of columns

Matrix element (matrix item):

Aij refers to the element in row i and column j.

vector

Vector is a special matrix. The vectors in the handout are generally column vectors. The following is a three-dimensional column vector (3) × 1).

Addition and Scalar Multiplication

matrix-vector multiplication

The multiplication of matrix and vector is shown in the figure: m × The matrix of N times n × The vector of 1 gets m × Vector of 1

Example:

1 * 1+3 * 5 = 16
4 * 1+0 * 5 = 4
2 * 1+1 * 5 = 7

Matrix multiplication follows the following guidelines:

(row M, column N) * (row N, column L) = (row M, column L)

Matrix multiplication

Properties of matrix multiplication

Inverse & transpose

Matrix multiplication

np.matmul
np.dot

np.matmul and NP Differences between dot:

Both are matrix multiplication. np. Matrix and scalar multiplication are prohibited in matmul. In the inner product operation of vector multiplication vector, NP Matmu and NP Dot makes no difference.

There's a lot of content. I've been sorting it out for a long time. I'm tired.

come on.

thank!

strive!

Topics: Python

Programmer Think

Python numpy detailed tutorial

Introduction to Numpy

Introduction to ndarray

Comparison of operation efficiency between ndarray and Python native list

Advantages of ndarray

Memory block style

ndarray supports parallelization (vectorization)

Much more efficient than pure Python code

Numpy installation

N-dimensional array - ndarray

array creation

Properties of ndarray

Shape of ndarray

Type of ndarray

summary

basic operation

Method of generating array

Generate an array of 0 and 1

np.empy()

Generate from existing array

About the difference between array and asarray

Generate a fixed range array

np.linspace (start, stop, num, endpoint)

np.arange(start,stop, step, dtype)

np.logspace(start,stop, num)

Generate random array

Normal distribution

uniform distribution

Index and slice of array

Shape modification

ndarray.reshape(shape, order)

ndarray.resize(new_shape)

ndarray.T

ndarray.ravel

ndarray.flatten

Type modification

ndarray.astype(type)

ndarray.tostring([order]) or ndarray tobytes([order])

Too large jupyter output may cause crash problems

Array de duplication

np.unique()

Darray operation

Logical operation

General judgment function

np.all()

np.any()

np.where (ternary operator)

Statistical operation

statistical indicators

Case: student achievement statistical operation

Array separator

split separation

Horizontally separated array

Vertically separated array

transpose

Inter array operation

Arithmetic function

Mathematical function

Aggregate function

Operation of array and number

Array and array operation

Broadcasting mechanism

Array splicing

Horizontal array combination

Vertical array combination

concatenate

hstack&vstack

Splicing of 3D arrays

Mathematics: matrix

matrix

vector

Addition and Scalar Multiplication

matrix-vector multiplication

Matrix multiplication

Properties of matrix multiplication

Inverse & transpose

Matrix multiplication

Hot Topics