Numpy grammar note input and output

Posted by vlcinsky on Sat, 12 Feb 2022 02:25:52 +0100

https://github.com/datawhalechina/team-learning-program/tree/master/IntroductionToNumpy

Input and output

1. numpy binary file

Save(), savez() and load() functions save and read data in numpy special binary types (. npy,. npz). These three functions will automatically process information such as ndim, dtype and shape. It is very convenient to use them to read and write arrays, but the files output by save() and savez() are difficult to be compatible with programs written in other languages.

[function]

def save(file, arr, allow_pickle=True, fix_imports=True):
  • save() function: in npy format saves the array to a binary file.
  • . npy format: the file is stored in binary mode. The meta information of the data (ndim, dtype, shape, etc.) is saved in text form in the first line of the binary file. You can use binary tools to view the content.

[function]

def load(file, mmap_mode=None, allow_pickle=False, fix_imports=True, encoding='ASCII'):
  • load() function: from npy,. npz or pickled files load arrays or pickled objects.
  • mmap_mode: {None, ‘r+’, ‘r’, ‘w+’, ‘c’};: How to read files.
  • allow_pickle=False: allow loading stored in Array of pickled objects in npy file.
  • fix_imports=True: if True, pickle will attempt to map the old python2 name to the new name used in python3.
  • Encoding = 'ASCII': set the encoding format, which is "ASCII" by default.

Save an example to an array file.

import numpy as np

outfile = r'.\test.npy'
np.random.seed(20200619)
x = np.random.uniform(low=0, high=1,size = [3, 5])
np.save(outfile, x)
y = np.load(outfile)
print(y)
# [[0.01123594 0.66790705 0.50212171 0.7230908  0.61668256]
#  [0.00668332 0.1234096  0.96092409 0.67925305 0.38596837]
#  [0.72342998 0.26258324 0.24318845 0.98795012 0.77370715]]

[function]

def savez(file, *args, **kwds):
  • savez() function: in uncompressed npz format saves multiple arrays to a single file.
  • . npz format: the file is stored in the form of compression and packaging, and can be decompressed with compression software.
  • savez() function: the first parameter is the file name, and the subsequent parameters are arrays to be saved. You can also use keyword parameters to give a name to the array, and the array passed by non keyword parameters will be automatically named arr_0, arr_1, ….
  • savez() function: the output is a compressed file (extension. npz), where each file is saved by a save() npy file, the file name corresponds to the array name. load() is recognized automatically npz file and returns an object similar to a dictionary. You can get the contents of the array through the array name as a keyword.

[example] save multiple arrays to one file.

import numpy as np

outfile = r'.\test.npz'
x = np.linspace(0, np.pi, 5)
y = np.sin(x)
z = np.cos(x)
np.savez(outfile, x, y, z_d=z)
data = np.load(outfile)
np.set_printoptions(suppress=True)
print(data.files)  
# ['z_d', 'arr_0', 'arr_1']

print(data['arr_0'])
# [0.         0.78539816 1.57079633 2.35619449 3.14159265]

print(data['arr_1'])
# [0.         0.70710678 1.         0.70710678 0.        ]

print(data['z_d'])
# [ 1.          0.70710678  0.         -0.70710678 -1.        ]

Open test.exe with decompression software Npz file, you will find three files: arr_0.npy,arr_1.npy,z_d.npy, which holds the contents of arrays x, y and Z respectively.

2. Text file

The savetxt(), loadtext() and genfromtext() functions are used to store and read text files (such as. TXT,. CSV, etc.). Genfromtext() is more powerful than loadtext() and can handle missing data.

[function]

def savetxt(fname, X, fmt='%.18e', delimiter=' ', newline='\n',header='', footer='', comments='# ', encoding=None):
  • fname: file path
  • 10: An array stored in a file.
  • fmt = '%. 18e': the string format of each element written to the file. The default is'%. 18e '(floating-point number with 18 decimal places).
  • delimiter = '': split strings, separated by spaces by default.
def loadtxt(fname, dtype=float, comments='#', delimiter=None,
            converters=None, skiprows=0, usecols=None, unpack=False,
            ndmin=0, encoding='bytes', max_rows=None):
  • fname: file path.
  • dtype=float: data type; the default is float.
  • comments = '#': a list of strings or strings. The default is' # ', indicating the start of the comment character set.
  • Skirows = 0: how many lines to skip, generally skipping the header of the first line.
  • usecols=None: tuple (the numerical index of the data column in the tuple), which is used to specify the column to read the data (the first column is 0).
  • unpack=False: whether to decouple data columns and assign values to different variables when loading multiple columns of data.

[example] write and read TXT files.

import numpy as np

outfile = r'.\test.txt'
x = np.arange(0, 10).reshape(2, -1)
np.savetxt(outfile, x)
y = np.loadtxt(outfile)
print(y)
# [[0. 1. 2. 3. 4.]
#  [5. 6. 7. 8. 9.]]

test.txt file is as follows:

0.000000000000000000e+00 1.000000000000000000e+00 2.000000000000000000e+00 3.000000000000000000e+00 4.000000000000000000e+00
5.000000000000000000e+00 6.000000000000000000e+00 7.000000000000000000e+00 8.000000000000000000e+00 9.000000000000000000e+00

[example] write and read CSV files.

import numpy as np

outfile = r'.\test.csv'
x = np.arange(0, 10, 0.5).reshape(4, -1)
np.savetxt(outfile, x, fmt='%.3f', delimiter=',')
y = np.loadtxt(outfile, delimiter=',')
print(y)
# [[0.  0.5 1.  1.5 2. ]
#  [2.5 3.  3.5 4.  4.5]
#  [5.  5.5 6.  6.5 7. ]
#  [7.5 8.  8.5 9.  9.5]]

test. The CSV file is as follows:

0.000,0.500,1.000,1.500,2.000
2.500,3.000,3.500,4.000,4.500
5.000,5.500,6.000,6.500,7.000
7.500,8.000,8.500,9.000,9.500

[function]

def genfromtxt(fname, dtype=float, comments='#', delimiter=None,
               skip_header=0, skip_footer=0, converters=None,
               missing_values=None, filling_values=None, usecols=None,
               names=None, excludelist=None,
               deletechars=''.join(sorted(NameValidator.defaultdeletechars)),
               replace_space='_', autostrip=False, case_sensitive=True,
               defaultfmt="f%i", unpack=None, usemask=False, loose=True,
               invalid_raise=True, max_rows=None, encoding='bytes'):
  • Genfromtext() function: loads data from a text file and handles the missing values in the specified way (it is oriented to structure array and missing data processing).
  • names=None: when set to True, the program will take the first row as the column name.

data.csv file (without missing values)

id,value1,value2,value3
1,123,1.4,23
2,110,0.5,18
3,164,2.1,19

[example]

import numpy as np

outfile = r'.\data.csv'
x = np.loadtxt(outfile, delimiter=',', skiprows=1)
print(x)
# [[  1.  123.    1.4  23. ]
#  [  2.  110.    0.5  18. ]
#  [  3.  164.    2.1  19. ]]

x = np.loadtxt(outfile, delimiter=',', skiprows=1, usecols=(1, 2))
print(x)
# [[123.    1.4]
#  [110.    0.5]
#  [164.    2.1]]

val1, val2 = np.loadtxt(outfile, delimiter=',', skiprows=1, usecols=(1, 2), unpack=True)
print(val1)  # [123. 110. 164.]
print(val2)  # [1.4 0.5 2.1]

[example]

import numpy as np

outfile = r'.\data.csv'
x = np.genfromtxt(outfile, delimiter=',', names=True)
print(x)
# [(1., 123., 1.4, 23.) (2., 110., 0.5, 18.) (3., 164., 2.1, 19.)]

print(type(x))  
# <class 'numpy.ndarray'>

print(x.dtype)
# [('id', '<f8'), ('value1', '<f8'), ('value2', '<f8'), ('value3', '<f8')]

print(x['id'])  # [1. 2. 3.]
print(x['value1'])  # [123. 110. 164.]
print(x['value2'])  # [1.4 0.5 2.1]
print(x['value3'])  # [23. 18. 19.]

data1.csv file with missing value

id,value1,value2,value3
1,123,1.4,23
2,110,,18
3,,2.1,19

[example]

import numpy as np

outfile = r'.\data1.csv'
x = np.genfromtxt(outfile, delimiter=',', names=True)
print(x)
# [(1., 123., 1.4, 23.) (2., 110., nan, 18.) (3.,  nan, 2.1, 19.)]

print(type(x))  
# <class 'numpy.ndarray'>

print(x.dtype)
# [('id', '<f8'), ('value1', '<f8'), ('value2', '<f8'), ('value3', '<f8')]

print(x['id'])  # [1. 2. 3.]
print(x['value1'])  # [123. 110.  nan]
print(x['value2'])  # [1.4 nan 2.1]
print(x['value3'])  # [23. 18. 19.]

3. Text format options

[function]

def set_printoptions(precision=None, threshold=None, edgeitems=None,
                     linewidth=None, suppress=None, nanstr=None, infstr=None,
                     formatter=None, sign=None, floatmode=None, **kwarg):
  • set_printoptions() function: sets printing options. These options determine how floating point numbers, arrays, and other NumPy objects are displayed.
  • precision=8: sets the floating-point precision and controls the number of decimal points output. The default is 8.
  • threshold=1000: it is displayed roughly. If it exceeds this value, it will be expressed in the form of "...", which is 1000 by default.
  • linewidth=75: used to determine the number of characters per line after insertion.
  • suppress=False: when suppress=True, it means that the decimal does not need to be output in the form of scientific counting method. The default is False.
  • nanstr=nan: string representation of floating-point non numbers. The default is nan.
  • infstr=inf: string representation of floating point infinity. The default is inf.
  • formatter: a dictionary that custom formats the array elements used for display. The key is the type to be formatted, and the value is the formatted string.
    'bool'
    'int'
    'float'
    'str' : all other strings
    'all' : sets all types
    ...

[example]

import numpy as np

np.set_printoptions(precision=4)
x = np.array([1.123456789])
print(x)  # [1.1235]

np.set_printoptions(threshold=20)
x = np.arange(50)
print(x)  # [ 0  1  2 ... 47 48 49]

np.set_printoptions(threshold=np.iinfo(np.int).max)
print(x)
# [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
#  24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
#  48 49]

eps = np.finfo(float).eps
x = np.arange(4.)
x = x ** 2 - (x + eps) ** 2
print(x)  
# [-4.9304e-32 -4.4409e-16  0.0000e+00  0.0000e+00]
np.set_printoptions(suppress=True)
print(x)  # [-0. -0.  0.  0.]

x = np.linspace(0, 10, 10)
print(x)
# [ 0.      1.1111  2.2222  3.3333  4.4444  5.5556  6.6667  7.7778  8.8889
#  10.    ]
np.set_printoptions(precision=2, suppress=True, threshold=5)
print(x)  # [ 0.    1.11  2.22 ...  7.78  8.89 10.  ]

np.set_printoptions(formatter={'all': lambda x: 'int: ' + str(-x)})
x = np.arange(3)
print(x)  # [int: 0 int: -1 int: -2]

np.set_printoptions()  # formatter gets reset
print(x)  # [0 1 2]

[example] restore default options

np.set_printoptions(edgeitems=3, infstr='inf', linewidth=75,
                    nanstr='nan', precision=8, suppress=False, 
                    threshold=1000, formatter=None)

[function]

def get_printoptions():
  • get_printoptions() function: get the current printing options.

[example]

import numpy as np

x = np.get_printoptions()
print(x)
# {
# 'edgeitems': 3, 
# 'threshold': 1000, 
# 'floatmode': 'maxprec', 
# 'precision': 8, 
# 'suppress': False, 
# 'linewidth': 75, 
# 'nanstr': 'nan', 
# 'infstr': 'inf', 
# 'sign': '-', 
# 'formatter': None, 
# 'legacy': False
# }