1. Exploring attributes
Before doing data analysis, we need to understand our data, then how to view the attributes of the data, the operation is as follows:
a=np.arange(8) print(a) # Print array print(a.ndim) # Rank 1 print(a.size) # Eight elements in the whole array print(a.shape) # Results (8,) print(a.dtype.name) # The type of data in the array int64 print(type(a)) # Array type <class'numpy.ndarray'> print(a.itemsize) # The size of space occupied by data type 8
2. Exploring interfaces
- Maximum element, minimum element
a = np.array([[32, 15, 6, 9, 14], [12, 10, 5, 23, 1], [2, 16, 13, 40, 37]]) print(a.min()) # Minimum 1 of all elements print(a.max()) # The largest 40 of all elements
- Summation, accumulation
(1) summation of all elements
# Solve the sum of all elements, regardless of how many dimensions the array is. print(a.sum()) # 235
(2) Sum by row or column
# Because arrays are multidimensional, you can use array methods on specific axes. # axis=0 is a function operation in the column direction; axis=1 is a function operation in the row direction print(a.sum(axis=0)) # Add in column direction [46 41 24 72 52]
(3) Accumulate by row or column
# Accumulate in the direction of axis print(a.cumsum(axis=1)) ------------------------------------ # The result is that the first column remains unchanged, the second column is the first + the second column, and the third column is the second + the third column... And so on [[ 32 47 53 62 76] [ 12 22 27 50 51] [ 2 18 31 71 108]]
- Sort: np.sort()
(1) Used in conjunction with a random sequence, sorted and used on the number axis
# Get a normal, one-dimensional array arranged from small to large. a=np.random.normal(10,2,50) np.sort(a) # When it is a multidimensional array, axis=0 is used to perform function operations in the column direction; axis=1 is used to perform function operations in the row direction. a=np.random.normal(10,3,(2,4)) print(a) b=np.sort(a,axis=0) print(b) ------------------------- [[11.26599373 8.77551005 12.02658342 9.74330763] [12.90387694 6.31457854 17.79464722 2.81888163]] [[11.26599373 6.31457854 12.02658342 2.81888163] [12.90387694 8.77551005 17.79464722 9.74330763]]
(2) Excluding outliers or outliers
heights = np.array([49.7, 46.9, 62, 47.2, 47, 48.3, 48.7]) np.sort(heights) ------------------- # The results are as follows, 62 of which are outliers. array([ 46.9, 47. , 47.2, 48.3, 48.7, 49.7, 62])
- Average value: np.mean()
(1) Find the average of an array
# Finding the Mean Value of One-Dimensional Array a = np.random.normal(10,2,50) np.mean(a) # 10.309453780901238 is not 10 because it calculates random arrays # To find the average number of two-dimensional arrays, axis=0 is a function operation in the direction of columns; axis=1 is a function operation in the direction of rows. ring_toss = np.array([[1, 0, 0], [0, 0, 1], [1, 0, 1]]) np.mean(ring_toss) # The average of all elements is 0.4444444444442 2. np.mean(ring_toss, axis=0) # The average number of arrays per column (0.66666667, 0., 0.66666667) is calculated as a unit.
(2) Calculate the percentage/probability of the number of data to the total number of samples under certain conditions
# In fact, it's about calculating the percentage of values that satisfy a logical statement. class_year = np.array([1967, 1949, 2004, 1997, 1953, 1950, 1958, 1974, 1987, 2006, 2013, 1978, 1951, 1998, 1996, 1952, 2005, 2007, 2003, 1955, 1963, 1978, 2001, 2012, 2014, 1948, 1970, 2011, 1962, 1966, 1978, 1988, 2006, 1971, 1994, 1978, 1977, 1960, 2008, 1965, 1990, 2011, 1962, 1995, 2004, 1991, 1952, 2013, 1983, 1955, 1957, 1947, 1994, 1978, 1957, 2016, 1969, 1996, 1958, 1994, 1958, 2008, 1988, 1977, 1991, 1997, 2009, 1976, 1999, 1975, 1949, 1985, 2001, 1952, 1953, 1949, 2015, 2006, 1996, 2015, 2009, 1949, 2004, 2010, 2011, 2001, 1998, 1967, 1994, 1966, 1994, 1986, 1963, 1954, 1963, 1987, 1992, 2008, 1979, 1987]) millennials=np.mean(class_year > 2005) print(millennials) # 0.2 That is, 20% of the total number of people born after 2005. a = np.random.normal(10,2,50) np.mean(a > 11) # 0.4 is the ratio of the number greater than 11/the probability of the number greater than 11 by removing a number from the random number.
- Standard deviation: np.std()
a = np.random.normal(10,2,50) np.std(a) # 1.7086488749575695 is not 2 because it calculates random arrays.
- Median: np.median()
my_array = np.array([50, 38, 291, 59, 14]) np.median(my_array) # 50.0
- Find a value on a percentage: np. percentile (array name, percentage)
d = np.array([1, 2, 3, 4, 4, 4, 6, 6, 7, 8, 8]) np.percentile(d, 40) # 4.00
** In addition to this, numpy also has some computational methods of linear algebra, such as solving linear equations, etc.