Save numpy array to file

Posted by bubbadawg on Thu, 09 Sep 2021 22:00:00 +0200

1. txt or csv file

[External chain picture transfer failed, source station may have anti-theft chain mechanism, it is recommended to save the picture and upload it directly (img-pbcWqI5i-16311994397) (C:UsersfylalAppDataRoamingTyporatypora-user-imagesimage-2023000394;54.png)]

import numpy as np

a = np.array(range(20)).reshape((4, 5))
print(a)

# Change the suffix to the same as.txt
filename = 'data/a.csv'
# Write File
np.savetxt(filename, a, fmt='%d', delimiter=',')

# read file
b = np.loadtxt(filename, dtype=np.int32, delimiter=',')
print(b)

Disadvantages:

  • Only one-dimensional and two-dimensional numpy arrays can be saved. When numpy arrays are multidimensional, they need to be two-dimensional to be saved.
  • Save cannot be appended, that is, every time np.savetxt() overwrites the previous content

2. Read and write npy or npz files through numpy

  • Read and write npy files
numpy.save(file, arr, allow_pickle=True, fix_imports=True)
 
file:file name/File Path
arr:Array to store
allow_pickle:Boolean Value,Allow use Python pickles Save object array(Optional parameters,Default is fine)
fix_imports:For convenience Pyhton2 Read Python3 Saved data(Optional parameters,Default is fine)
import numpy as np
a=np.array(range(20)).reshape((2,2,5))
print(a)

filename='data/a.npy'     #Save Path
# Write File
np.save(filename,a)

#read file
b=np.load(filename)
print(b)
print(b.shape)

Advantage:

(1) npy files can hold numpy arrays of any dimension, not limited to one and two dimensions

(2) npy holds the structure of numpy arrays, including shape s and dtype s

Disadvantages:

(3) Only one numpy array can be saved, each save will overwrite the previous contents of the file

  • Read and write npz files

Parameter introduction

numpy.savez(file, *args, **kwds)
 
file:file name/File Path
*args:Array to store,Can write multiple,If no array is specified Key,Numpy Will default from'arr_0','arr_1'Method Naming
kwds:(Optional parameters,Default is fine)
import numpy as np

a = np.array(range(20)).reshape((2, 2, 5))
b = np.array(range(20, 44)).reshape(2, 3 ,4)
print('a:\n', a)
print('b:\n', b)

filename = 'data/a.npz'
# Write the file, and if you don't specify a key, the default keys are'arr_0','arr_1', and keep going.
np.savez(filename, a, b=b)

# read file
c = np.load(filename)
print('keys of NpzFile c:\n', c.keys())
print("c['arr_0']:\n", c['arr_0'])
print("c['b']:\n", c['b'])

What's more amazing is that instead of Numpy giving the array keys, we can give them meaningful keys so that we don't have to guess if we need to load the data.

#Data Save
np.savez('newsave_xy',x=x,y=y)
  
#Read saved data
npzfile=np.load('newsave_xy.npz')
  
#Access by setting the array key on save
npzfile['a']
 npzfile['b']

Advantage:

(1) npy files can hold numpy arrays of any dimension;

(2) npy preserves the structure of numpy arrays;

(3) Multiple numpy arrays can be saved at the same time

(4) You can specify a key to hold the numpy array, which is convenient to read.

Disadvantages:

(1) When multiple numpy arrays are saved, they can only be saved at the same time.

  • Read and write hdf5 files through h5py

    import numpy as np
    import h5py
    
    a = np.array(range(20)).reshape((2, 2, 5))
    b = np.array(range(20)).reshape((1, 4, 5))
    print(a)
    print(b)
    
    filename = 'data/data.h5'
    # Write File
    h5f = h5py.File(filename, 'w')
    h5f.create_dataset('a', data=a)
    h5f.create_dataset('b', data=b)
    h5f.close()
    
    # read file
    h5f = h5py.File(filename, 'r')
    print(type(h5f))
    # numpy array from slice
    print(h5f['a'][:])
    print(h5f['b'][:])
    h5f.close()
    

    By Slice Amplitude

    import numpy as np
    import h5py
    
    a = np.array(range(20)).reshape((2, 2, 5))
    print(a)
    
    filename = 'data/a.h5'
    # Write File
    h5f = h5py.File(filename, 'w')
    # h5f['a'] may not be initialized directly when the array A is too large to be sliced for operation;
    # The maxshape parameter can be omitted when there is no need to change the shape of h5f['a'] later
    h5f.create_dataset('a', shape=(2, 2, 5), maxshape=(None, 2, 5), dtype=np.int32, compression='gzip')
    for i in range(2):
        # Assignment in the form of slices
        h5f['a'][i] = a[i]
    h5f.close()
    
    # read file
    h5f = h5py.File(filename, 'r')
    print(type(h5f))
    print(h5f['a'])
    # numpy array from slice
    print(h5f['a'][:])
    

    (1) Numpy array dimension is not limited, numpy array structure and data type can be maintained;

    (2) Suitable for large numpy arrays and small file footprint;

    (3) dataset can be accessed by key (numpy.array), which is easy to read without confusion.

    (4) The contents contained in the original file may not be overwritten.

3. Summary

  • csv and txt can only be used to store one-dimensional or two-dimensional numpy arrays;
  • npy is used to store a single numpy array. npz can store multiple numpy arrays at the same time, both of which are not limited to the numpy dimension, and both maintain the shape and dtype of the numpy array. When writing a file, the original file content can only be overwritten if it exists.
  • When the numpy array is large, it is best to use hdf5 files, which are relatively smaller.
  • When the numpy array is large and MemoryError is prone to occur when the entire numpy array is computed, you can choose to slice the numpy array and save the computed array to a hdf5 file, which supports slice indexing.

Reference resources:

10914932.html

9722794.html

Topics: Python