2.2.5 conversion between tensor and numpy
We can easily use numpy () and from_numpy() converts arrays in Tensor and numpy to each other. However, it should be noted that the arrays in Tensor and numpy generated by these two functions share the same memory (so the conversion between them is fast). When changing one of them, the other will also change.
Another common method to convert array in Numpy into tensor is torch.tensor(). It should be noted that this method will always copy data (which will consume more time and space), so the returned tensor and the original data will no longer share memory.
Use of numpy()
a=torch.ones(5) b=a.numpy()
from_ Use of numpy()
import numpy as np a=np.ones(5) b=torch.from_numpy(a)
All tensors on the CPU support the conversion between Numpy and array.
Use of torch.tensor()
import numpy as np a=np.ones(5) b=torch.tensor(a)
3.1 linear regression
3.1.1 basic elements of linear regression
3.1.1.1 model definition
3.1.1.2 model training
(1) Training data
(2) Loss function
(3) Optimization algorithm
When the model and loss function are simple, the solution of the above error minimization problem can be expressed directly by formula. Such solutions are called analytical solutions. The linear regression and square error used in this section just fall into this category. However, most deep learning models have no analytical solution, and can only reduce the value of the loss function by optimizing the model parameters for a limited number of iterations. Such solutions are called numerical solution s
3.1.1.3 model prediction
3.1.2 expression method of linear regression
3.1.2.1 neural network diagram
3.1.2.2 vector calculation expression
Broadly speaking, when the number of data samples is n n n. The characteristic number is d d d, the vector calculation expression of linear regression is:
y ^ = X w + b \widehat{y}=Xw+b y =Xw+b (1)
Where model output y ^ ∈ R n × 1 \widehat{y}\in\mathbb{R}^{n\times1} y ∈Rn × 1. Batch data sample characteristics X ∈ R n × d X\in\mathbb{R}^{n\times d} X∈Rn × d. Weight w ∈ R d × 1 w\in\mathbb{R}^{d\times1} w∈Rd × 1. Deviation b ∈ R b\in\mathbb{R} b∈R. Accordingly, batch data sample labels y ∈ R n × 1 y\in\mathbb{R}^{n\times1} y∈Rn × 1. Set model parameters θ = [ w 1 , w 2 , b ] T \theta=[w_1,w_2,b]^T θ= [w1, w2, b]T, we can rewrite the loss function as:
ℓ ( θ ) = 1 2 n ( y ^ − y ) T ( y ^ − y ) \ell(\theta)=\frac{1}{2n}(\widehat{y}-y)^T(\widehat{y}-y) ℓ(θ)=2n1(y −y)T(y −y) (12)
3.2 implementation of linear regression from zero
3.2.1 generating data sets
Drawing:
def use_svg_display(): #Represented by vector diagram display.set_matplotlib_formats('svg') def set_figure(figsize=(3.5,2.5)): use_svg_display() #Set the size of the drawing plt.rcParams['figure.figsize']=figsize #In.. / d2lzh_ After adding the above two functions to pytorch, you can import them like this #import sys #sys.path.append("..") #from d2lzh_pytorch improt * set_figure() plt.scatter(features[:,1].numpy(),labels.numpy(),1)
3.2.2 reading data
When training the model, we need to traverse the data set and constantly read small batch data samples (SGD). Here we define a function: it returns batch every time_ Size the characteristics and labels of random samples.
#This function has been saved in d2lzh package for future use def data_iter(batch_size,features,labels): num_examples=len(features) indices=list(range(num_examples)) #Disrupt the reading sequence of samples random.shuffle(indices) for i in range(0,num_examples,batch_size): #There may be less than one batch at the last time j=torch.LongTensor(indices[i:min(i+batch_size,num_examples)]) yield features.index_select(0,j),labels.index_select(0,j)
Read the first small batch data sample and print:
batch_size=10 for X,y in data_iter(batch_size, features, labels): print(X,y) break
3.2.3 initialize model parameters
We initialize the weight to a normal random number with a mean of 0 and a standard deviation of 0.01, and the deviation to 0
w=torch.tensor(np.random.normal(0,0.01,(num_inputs,1)),dtype=torch.float32) b=torch.zeros(1,dtype=torch.float32)
In the subsequent model training, we need to calculate the gradient of these parameters to iterate the values of the parameters, so we need to make their requirements_ grad=True
w.requires_grad_(requires_grad=True) b.requires_grad_(requires_grad=True)
3.2.4 definition model
Note that the mm function is used here for matrix multiplication