1, Random number module
Generate random number sequences that obey specific statistical laws.
(1) binomial distribution
The binomial distribution is the Bernoulli test that repeats n independent events. In each test, there are only two possible results, and the occurrence of the two results are opposite to each other and independent of each other. The probability of event occurrence remains unchanged in each independent test.
# Generate size random numbers, and each random number comes from the number of successful attempts in n attempts, where the probability of success in each attempt is p. np.random.binomial(n, p, size)
Binomial distribution can be used to approximate the probability of the following scenarios:
- Someone's shooting percentage is 0.3, the probability of shooting 10 times and scoring 5 goals.
sum(np.random.binomial(10, 0.3, 200000) == 5) / 200000
- When someone makes a customer service call, the customer service connection rate is 0.6. It's a probability that no one answers after three calls.
sum(np.random.binomial(3, 0.6, 200000) == 0) / 200000
(2) Hypergeometric distribution
# Generate size random numbers. Each random number t is the number of good samples after nsample samples are randomly selected from the total samples. The total samples are composed of ngood good samples and nbad bad samples np.random.hypergeometric(ngood, nbad, nsample, size)
Model ball game: put 25 good balls and 1 bad ball together, model 3 balls each time, add 1 point for all good balls, as long as you touch the bad ball, reduce 6 points, and find the change of score in the process of 100 rounds.
import numpy as np import matplotlib.pyplot as mp outcomes = np.random.hypergeometric(25, 1, 3, 100) scores = [0] for outcome in outcomes: if outcome == 3: scores.append(scores[-1] + 1) else: scores.append(scores[-1] - 6) scores = np.array(scores) mp.figure('Hypergeometric Distribution', facecolor='lightgray') mp.title('Hypergeometric Distribution', fontsize=20) mp.xlabel('Round', fontsize=14) mp.ylabel('Score', fontsize=14) mp.tick_params(labelsize=12) mp.grid(linestyle=':') o, h, l, c = 0, scores.argmax(), scores.argmin(), scores.size-1 if scores[o] < scores[c]: color = 'orangered' elif scores[c] < scores[o]: color = 'limegreen' else: color = 'dodgerblue' mp.plot(scores, c=color, label='Score') mp.axhline(y=scores[o], linestyle='--',color='deepskyblue', linewidth=1) mp.axhline(y=scores[h], linestyle='--',color='crimson', linewidth=1) mp.axhline(y=scores[l], linestyle='--',color='seagreen', linewidth=1) mp.axhline(y=scores[c], linestyle='--',color='orange', linewidth=1) mp.legend() mp.show()
(3) Normal distribution
# Generate size random numbers, which obey the standard normal distribution (expectation = 0, standard deviation = 1). np.random.normal(size) # Generate size random numbers and obey normal distribution (expectation = 1, standard deviation = 10). np.random.normal(loc=1, scale=10, size)
mark accurate just state branch cloth General rate dense degree : e − x 2 2 2 π Probability density of standard normal distribution: \ frac {e ^ {- \ frac {x ^ 2} {2} {\ sqrt {2 \ PI}} Probability density of standard normal distribution: 2 π e−2x2
Case: generate 10000 random numbers subject to normal distribution and draw the frequency histogram of random values.
import numpy as np import matplotlib.pyplot as mp samples = np.random.normal(size=10000) mp.figure('Normal Distribution',facecolor='lightgray') mp.title('Normal Distribution', fontsize=20) mp.xlabel('Sample', fontsize=14) mp.ylabel('Occurrence', fontsize=14) mp.tick_params(labelsize=12) mp.grid(axis='y', linestyle=':') mp.hist(samples, 100, normed=True, edgecolor='steelblue', facecolor='deepskyblue', label='Normal')[1] mp.legend() mp.show()
2, Sorting and scipy common APIs
(1) Sort
1. Joint indirect sorting
Joint indirect sorting supports sorting the columns to be sorted. If the values of the columns to be sorted are the same, the reference sequence will be used as a reference to continue sorting. Finally, the ordered index sequence after sorting is returned.
indices = numpy.lexsort((Reference sequence, Waiting sequence))
Case: first sort by price, then reverse by sales volume.
import numpy as np prices = np.array([92,83,71,92,40,12,64]) volumes = np.array([100,251,4,12,709,34,75]) print(volumes) names = ['Product1','Product2','Product3','Product4','Product5','Product6','Product7'] ind = np.lexsort((volumes*-1, prices)) print(ind) for i in ind: print(names[i], end=' ')
2. Complex array sorting
Arrange in ascending order of real parts. For elements with the same real part, refer to the ascending order of imaginary parts and directly return the sorted result array.
numpy.sort_complex(Complex array)
3. Insert sort
If you need to insert elements into an ordered array to keep the array in order, numpy provides a searchsorted method to query and return an array of pluggable positions.
indices = numpy.searchsorted(Ordered array, Data array to be inserted)
Calling numpy provides the insert method to insert the elements in the element array to be inserted into the target array according to the position in the position array, and return the result array.
numpy.insert(A, indices, B) # Insert the elements in the B array into the indices position in the A array
Case:
import numpy as np # 0 1 2 3 4 5 6 a = np.array([1, 2, 4, 5, 6, 8, 9]) b = np.array([7, 3]) c = np.searchsorted(a, b) print(c) d = np.insert(a, c, b) print(d)
(2) Interpolation
Demand: Statistics on lottery tickets bought by lottery people in each community:
Number of lottery participants | Lottery ticket purchase |
---|---|
30 | 100 note |
40 | 120 note |
50 | 135 note |
60 | 155 note |
45 | - |
65 | 170 note |
scipy provides a common interpolation algorithm, which can pass a certain law interpolator function. If we give the interpolator function more scatter x coordinate sequences, the function will return the corresponding y coordinate sequence.
func = si.interp1d( Discrete horizontal coordinates, Discrete vertical coordinates, kind=Interpolation algorithm(The default is linear interpolation) )
Case:
# scipy.interpolate import scipy.interpolate as si # Original data 11 groups of data min_x = -50 max_x = 50 dis_x = np.linspace(min_x, max_x, 11) dis_y = np.sinc(dis_x) # Through a series of scatter points, the interpolator function conforming to a certain law is designed, and the linear interpolation (kind default value) is used linear = si.interp1d(dis_x, dis_y) lin_x = np.linspace(min_x, max_x, 200) lin_y = linear(lin_x) # CUbic Spline Interpolation obtains a smooth curve cubic = si.interp1d(dis_x, dis_y, kind='cubic') cub_x = np.linspace(min_x, max_x, 200) cub_y = cubic(cub_x)
(3) Integral
Intuitively speaking, for a given positive real value function, the definite integral on a real number interval can be understood as the area value of the curved trapezoid surrounded by curves, lines and axes on the coordinate plane (a certain real value).
Using calculus to understand what integral is.
Case:
- Draw the curve of quadratic function y=2x2+3x+4 in the interval [- 5, 5]:
import numpy as np import matplotlib.pyplot as mp import matplotlib.patches as mc def f(x): return 2 * x ** 2 + 3 * x + 4 a, b = -5, 5 x1 = np.linspace(a, b, 1001) y1 = f(x1) mp.figure('Integral', facecolor='lightgray') mp.title('Integral', fontsize=20) mp.xlabel('x', fontsize=14) mp.ylabel('y', fontsize=14) mp.tick_params(labelsize=10) mp.grid(linestyle=':') mp.plot(x1, y1, c='orangered', linewidth=6,label=r'$y=2x^2+3x+4$', zorder=0) mp.legend() mp.show()
- The differential method draws the small trapezoid of the function in the closed region composed of the x-axis and [- 5,5].
n = 50 x2 = np.linspace(a, b, n + 1) y2 = f(x2) area = 0 for i in range(n): area += (y2[i] + y2[i + 1]) * (x2[i + 1] - x2[i]) / 2 print(area) for i in range(n): mp.gca().add_patch(mc.Polygon([ [x2[i], 0], [x2[i], y2[i]], [x2[i + 1], y2[i + 1]], [x2[i + 1], 0]], fc='deepskyblue', ec='dodgerblue', alpha=0.5))
Call SciPy The quad method of integrate module calculates the integral:
import scipy.integrate as si # Using quad to calculate the integral, the function f is given, and the lower integral limit and upper integral limit [a, b] return (integral value, maximum error) area = si.quad(f, a, b)[0] print(area)
(4) Image
scipy.ndimage provides some simple image processing functions, such as Gaussian blur, arbitrary angle rotation, edge recognition and so on.
import numpy as np import scipy.misc as sm import scipy.ndimage as sn import matplotlib.pyplot as mp #read file original = sm.imread('../../data/head.jpg', True) #Gaussian blur median = sn.median_filter(original, 21) #Angular rotation (counterclockwise) rotate = sn.rotate(original, 45) #Edge recognition prewitt = sn.prewitt(original) mp.figure('Image', facecolor='lightgray') mp.subplot(221) mp.title('Original', fontsize=16) mp.axis('off') mp.imshow(original, cmap='gray') mp.subplot(222) mp.title('Median', fontsize=16) mp.axis('off') mp.imshow(median, cmap='gray') mp.subplot(223) mp.title('Rotate', fontsize=16) mp.axis('off') mp.imshow(rotate, cmap='gray') mp.subplot(224) mp.title('Prewitt', fontsize=16) mp.axis('off') mp.imshow(prewitt, cmap='gray') mp.tight_layout() mp.show()
(5) Financial related
import numpy as np # Final value = NP Fv (interest rate, number of periods, payment per period, present value) # Deposit 1000 yuan in the bank at an annual interest rate of 1% for 5 years, with an additional deposit of 100 yuan per year, # How much is the total principal and interest after maturity? fv = np.fv(0.01, 5, -100, -1000) print(round(fv, 2)) # Present value = NP PV (interest rate, number of periods, payment per period, final value) # How much money will be deposited in the bank at an annual interest rate of 1% for 5 years, with an additional deposit of 100 yuan per year, # Total principal and interest after maturity fv yuan? pv = np.pv(0.01, 5, -100, fv) print(pv) # NPV = NP NPV (interest rate, cash flow) # Deposit 1000 yuan in the bank at an annual interest rate of 1% for 5 years, with an additional deposit of 100 yuan per year, # How much is equivalent to a one-time deposit? npv = np.npv(0.01, [ -1000, -100, -100, -100, -100, -100]) print(round(npv, 2)) fv = np.fv(0.01, 5, 0, npv) print(round(fv, 2)) # Internal rate of return = NP IRR (cash flow) # Deposit 1000 yuan into the bank for 5 years, and withdraw 100 yuan, 200 yuan and # 300 yuan, 400 yuan and 500 yuan. The bank interest rate can be determined in the end # Pay off all principal and interest after one withdrawal, that is, the net present value is 0 yuan? irr = np.irr([-1000, 100, 200, 300, 400, 500]) print(round(irr, 2)) npv = np.npv(irr, [-1000, 100, 200, 300, 400, 500]) print(npv) # Payment per installment = NP PMT (interest rate, number of periods, present value) # Loan 1000 yuan from the bank at an annual interest rate of 1% and pay it off in five years, # How much is the average annual return? pmt = np.pmt(0.01, 5, 1000) print(round(pmt, 2)) # Number of periods = NP NPER (interest rate, payment per period, present value) # Loan 1000 yuan from the bank at the annual interest rate of 1%, and repay pmt yuan every year on average, # How many years? nper = np.nper(0.01, pmt, 1000) print(int(nper)) # Interest rate = NP Rate (number of periods, payment per period, present value, final value) # The loan from the bank is 1000 yuan, the average annual repayment is pmt yuan, and nper is paid off every year, # What is the annual interest rate? rate = np.rate(nper, pmt, 1000, 0) print(round(rate, 2))