Python 3 Foundation Package-Numpy's Attributes and Interfaces

Posted by brianb on Fri, 04 Oct 2019 04:52:27 +0200

1. Exploring attributes

Before doing data analysis, we need to understand our data, then how to view the attributes of the data, the operation is as follows:

a=np.arange(8)

print(a)						# Print array
print(a.ndim)					# Rank 1
print(a.size)					# Eight elements in the whole array
print(a.shape)					# Results (8,)
print(a.dtype.name)				# The type of data in the array int64
print(type(a))					# Array type <class'numpy.ndarray'>
print(a.itemsize)				# The size of space occupied by data type 8

2. Exploring interfaces

  • Maximum element, minimum element
a = np.array([[32, 15, 6, 9, 14], 
              [12, 10, 5, 23, 1],
              [2, 16, 13, 40, 37]])

print(a.min())				# Minimum 1 of all elements
print(a.max())				# The largest 40 of all elements
  • Summation, accumulation

(1) summation of all elements

# Solve the sum of all elements, regardless of how many dimensions the array is.
print(a.sum())		#  235		

(2) Sum by row or column

# Because arrays are multidimensional, you can use array methods on specific axes.
# axis=0 is a function operation in the column direction; axis=1 is a function operation in the row direction
print(a.sum(axis=0))		# Add in column direction [46 41 24 72 52]

(3) Accumulate by row or column

# Accumulate in the direction of axis
print(a.cumsum(axis=1))
------------------------------------
# The result is that the first column remains unchanged, the second column is the first + the second column, and the third column is the second + the third column... And so on
[[ 32  47  53  62  76]
 [ 12  22  27  50  51]
 [  2  18  31  71 108]]
  • Sort: np.sort()

(1) Used in conjunction with a random sequence, sorted and used on the number axis

# Get a normal, one-dimensional array arranged from small to large.
a=np.random.normal(10,2,50)
np.sort(a)

# When it is a multidimensional array, axis=0 is used to perform function operations in the column direction; axis=1 is used to perform function operations in the row direction.
a=np.random.normal(10,3,(2,4))
print(a)
b=np.sort(a,axis=0)
print(b)

-------------------------
[[11.26599373  8.77551005 12.02658342  9.74330763]
 [12.90387694  6.31457854 17.79464722  2.81888163]]
 
[[11.26599373  6.31457854 12.02658342  2.81888163]
 [12.90387694  8.77551005 17.79464722  9.74330763]]

(2) Excluding outliers or outliers

heights = np.array([49.7, 46.9, 62, 47.2, 47, 48.3, 48.7])
np.sort(heights)	

-------------------
# The results are as follows, 62 of which are outliers.
array([ 46.9,  47. ,  47.2,  48.3,  48.7,  49.7,  62])
  • Average value: np.mean()

(1) Find the average of an array

# Finding the Mean Value of One-Dimensional Array
a = np.random.normal(10,2,50)
np.mean(a)					#  10.309453780901238 is not 10 because it calculates random arrays

# To find the average number of two-dimensional arrays, axis=0 is a function operation in the direction of columns; axis=1 is a function operation in the direction of rows.
ring_toss = np.array([[1, 0, 0], 
                       [0, 0, 1], 
                       [1, 0, 1]])
                       
np.mean(ring_toss) 						# The average of all elements is 0.4444444444442
2. np.mean(ring_toss, axis=0)			# The average number of arrays per column (0.66666667, 0., 0.66666667) is calculated as a unit.

(2) Calculate the percentage/probability of the number of data to the total number of samples under certain conditions

# In fact, it's about calculating the percentage of values that satisfy a logical statement.
class_year = np.array([1967, 1949, 2004, 1997, 1953, 1950, 1958, 1974, 1987, 2006, 2013, 1978, 1951, 1998, 1996, 1952, 2005, 2007, 2003, 1955, 1963, 1978, 2001, 2012, 2014, 1948, 1970, 2011, 1962, 1966, 1978, 1988, 2006, 1971, 1994, 1978, 1977, 1960, 2008, 1965, 1990, 2011, 1962, 1995, 2004, 1991, 1952, 2013, 1983, 1955, 1957, 1947, 1994, 1978, 1957, 2016, 1969, 1996, 1958, 1994, 1958, 2008, 1988, 1977, 1991, 1997, 2009, 1976, 1999, 1975, 1949, 1985, 2001, 1952, 1953, 1949, 2015, 2006, 1996, 2015, 2009, 1949, 2004, 2010, 2011, 2001, 1998, 1967, 1994, 1966, 1994, 1986, 1963, 1954, 1963, 1987, 1992, 2008, 1979, 1987])
millennials=np.mean(class_year > 2005)
print(millennials) 				# 0.2 That is, 20% of the total number of people born after 2005.

a = np.random.normal(10,2,50)
np.mean(a > 11)					# 0.4 is the ratio of the number greater than 11/the probability of the number greater than 11 by removing a number from the random number.
  • Standard deviation: np.std()
a = np.random.normal(10,2,50)
np.std(a)				# 1.7086488749575695 is not 2 because it calculates random arrays.
  • Median: np.median()
my_array = np.array([50, 38, 291, 59, 14])
np.median(my_array)				#  50.0
  • Find a value on a percentage: np. percentile (array name, percentage)
d = np.array([1, 2, 3, 4, 4, 4, 6, 6, 7,  8, 8])
np.percentile(d, 40)			# 4.00

** In addition to this, numpy also has some computational methods of linear algebra, such as solving linear equations, etc.

Topics: Programming