Methods of python processing files and files (shutil, filecmp, MD5, tarfile, zip)

Posted by Dorin85 on Mon, 27 Apr 2020 08:14:45 +0200

1, Advanced file processing interface

shutil

Is a high-level file operation tool
It is similar to advanced API, and its main strength lies in its better support for copying and deleting files.

usage method

  • Copyfile (src, dst) is copied from source src to dst. Of course, the premise is that the target address has writable permission. The exception information thrown is IOException. If the current dst already exists, it will be overwritten
  • Copymode (SRC, DST) will only copy its permissions. Other things will not be copied
  • Copystat (SRC, DST) copy permission, last access time, last modification time
  • Copy (SRC, DST) copies a file to a file or directory
  • Based on copy, Copy2 (SRC, DST) copies the last access time and modification time of the file, similar to cp – p
  • Copy2 (SRC, DST) if the file systems of two locations are the same, it is equivalent to rename, just rename; if it is not in the same file system, it is move
  • Copytree (olddir, newdir, True / flame) copies olddir to newdir. If the third parameter is True, the symbolic connection under the folder will be maintained when copying the directory. If the third parameter is False, a physical copy will be generated under the replicated directory to replace the symbolic connection
[root@python ~]# mkdir /tmp/demo
[root@python ~]# cd /tmp/demo/
[root@python demo]# mkdir -p dir1
[root@python demo]# touch a.txt b.txt c.txt
[root@python demo]# touch sh.py cc.py 001.jpg 002.jpg 003.jpg
//Create required files

[root@python demo]# ipython
//Open ipython

You can also create files in pycharms for implementation.

1. Copy files and folders

shutil.copy(file1,file2)     #file
shutil.copytree(dir1,dir2) #folder

(1) Copy file

In [1]: import shutil                                         

In [2]: shutil.copy('a.txt','aa.txt')                         
Out[2]: 'aa.txt'
//You can check whether there are generated files in the corresponding paths of PyCharm and Linux

In [3]: ls                                                    
001.jpg  003.jpg  a.txt  cc.py  sh.py
002.jpg  aa.txt   b.txt  c.txt

(2) Copy folder

In [5]: shutil.copytree('dir1','dir11')                       
Out[5]: 'dir11'

In [6]: ls                                                    
001.jpg  003.jpg  a.txt  cc.py  dir1/   sh.py
002.jpg  aa.txt   b.txt  c.txt  dir11/

(3) Copy the contents of the file to another file

# _*_ coding:utf-8 _*_
__author__ = 'junxi'

import shutil

# Copy the contents of the file to another file
shutil.copyfileobj(open('old.txt', 'r'), open('new.txt', 'w'))

# Copy files
shutil.copyfile('old.txt', 'old1.txt')

# Copy permission only. Content, group and user remain unchanged
shutil.copymode('old.txt', 'old1.txt')

# Copy permission, last access time, last modification time
shutil.copystat('old.txt', 'old1.txt')

# Copy a file to a file or directory
shutil.copy('old.txt', 'old2.txt')

# On the basis of copy, copy the last access time and modification time of the file
shutil.copy2('old.txt', 'old2.txt')

# Copy olddir to newdir. If the third parameter is True, the symbolic connection under the folder will be maintained when copying the directory. If the third parameter is False, a physical copy will be generated under the copied directory to replace the symbolic connection
shutil.copytree('C:/Users/xiaoxinsoso/Desktop/aaa', 'C:/Users/xiaoxinsoso/Desktop/bbb')

# Move directory or file
shutil.move('C:/Users/xiaoxinsoso/Desktop/aaa', 'C:/Users/xiaoxinsoso/Desktop/bbb') # Move aaa directory to bbb directory

# Delete a directory
shutil.rmtree('C:/Users/xiaoxinsoso/Desktop/bbb') # Delete bbb directory

2. Rename and move files and folders

shutil.move(filel, file2)
shutil.move(file, dir)

(1) Rename of file

In [7]: shutil.move('aa.txt','dd.txt')                        
Out[7]: 'dd.txt'

In [8]: ls                                                    
001.jpg  003.jpg  b.txt  c.txt   dir1/   sh.py
002.jpg  a.txt    cc.py  dd.txt  dir11/

(2) File move to folder

In [9]: shutil.move('dd.txt','dir1')                          
Out[9]: 'dir1/dd.txt'

In [11]: ls dir1                                              
dd.txt

3. Delete directory

shutil.rmtree(dir)    # Delete directory
os.unlink(file)       # Delete file

Delete directory

In [15]: shutil.rmtree('dir1')                                

In [16]: ls                                                   
001.jpg  003.jpg  b.txt  c.txt   sh.py
002.jpg  a.txt    cc.py  dir11/

2, Document content management

1. Directory and file comparison

The filecmp module contains operations to compare directories and files.

filecmp can realize the difference comparison function of files, directories and traversal subdirectories.

With filecmp module, no installation is required.

(1) Directory structure

The contents of files a_copy.txt, a.txt and c.txt in directory dir1 are the same, but the contents of b.txt are different

[root@python demo]# mkdir compare
[root@python demo]# cd compare/
[root@python compare]# mkdir -p dir1 dir2
[root@python compare]# mkdir dir1/subdir1
[root@python compare]# ls
dir1  dir2
[root@python compare]# touch dir1/a_copy.txt dir1/a.txt dir1/b.txt   dir1/c.txt
[root@python compare]# touch dir2/a.txt dir2/b.txt dir2/c.txt
[root@python compare]# mkdir -p dir2/subdir1 dir2/subdir2
[root@python compare]# touch dir2/subdir1/sb.txt
//Create required files

[root@python compare]# ipython
//Open ipython

filecmp provides three operation methods: CMP (single file comparison), cmpfile (multi file comparison), and dircmp (directory comparison).

(2) Example code:

Use the cmp function of filecmp module to compare whether two files are the same. If the files are the same, return True, otherwise False

In [1]: import filecmp 

In [2]: filecmp.cmp('a.txt','b.txt')                          
Out[2]: False

In [3]: filecmp.cmp('a.txt','c.txt')                          
Out[3]: True

In [4]: filecmp.cmp('a.txt','a_copy.txt')                     
Out[4]: True

(3) Compare two files

There is also a function named cmpfiles in the filecmp directory, which is used to compare multiple files in two different directories at the same time, and return a triple containing the same file, different files and files that cannot be compared. An example is as follows:

In [9]: filecmp.cmpfiles('dir1','dir2',['a.txt','b.txt','c.txt','a_copy.txt'])                                      
Out[9]: (['b.txt'], ['a.txt', 'c.txt'], ['a_copy.txt'])
# Returns a triple. The first is the same. The third is different. The third is not comparable (without this file or for other reasons)

(4) Compare multiple files

The cmpfiles function is used to compare files in two directories at the same time, or it can be used to compare two directories. However, when comparing two directories, you need to specify possible files by parameters, so it is cumbersome.

There is also a function called dircmp in filecmp to compare two directories. After calling the dircmp function, an object of dircmp class will be returned. This object holds many properties. We can get the differences between directories by looking at these properties. As follows:

In [11]: d = filecmp.dircmp('dir1','dir2')                    
#Set test directory

In [12]: d.report()                                           
diff dir1 dir2
Only in dir1 : ['a_copy.txt']
Only in dir2 : ['subdir2']
Identical files : ['b.txt']
Differing files : ['a.txt', 'c.txt']
Common subdirectories: ['subdir1']

(5) Direct comparison directory does not specify file

Directory comparison: create A directory comparison object through filecmp (a,b[,ignore[,hide]]) class to compare folders. By comparing two folders, you can get some detailed comparison results (such as the list of files only existing in folder A), and support recursive comparison of subfolders.

In [17]: d.left_list      #View dir1 directory structure                                    
Out[17]: ['a.txt', 'a_copy.txt', 'b.txt', 'c.txt', 'subdir1']

In [18]: d.right_list     #View dir2 directory structure                                 
Out[18]: ['a.txt', 'b.txt', 'c.txt', 'subdir1', 'subdir2']

In [19]: d.left_only      #Only the           
Out[19]: ['a_copy.txt']

In [20]: d.right_only     #Only directory dir2 exists                                 
Out[20]: ['subdir2']

2. MD5 checksum comparison

The check code is calculated by hash function, which is a method of creating small digital "fingerprint" from any data. The hash function compresses the message or data into a summary, making the data smaller and easier to compare. MDS is the most official

MD5 hashes are generally used to check the integrity of files, especially to check the correctness of files in case of file transfer, disk error or other situations.

Under Linux, the MD5 check code of a file is calculated as follows:

[root@192 demo]# md5sum a.txt
d41d8cd98f00b204e9800998ecf8427e  a.txt

It is also very simple to calculate the MD5 check code of a file in Python. You can use the standard library hashlib module. As follows:

import hashlib

d = hashlib.md5()
with open('b.txt') as f:
    for line in f:
        d.update(line.encode('utf-8'))
print(d.hexdigest())

# Or you can (the most common way of writing, often used to name pictures)
>>> import hashlib

>>> hashlib.md5(b'123').hexdigest()
'202cb962ac59075b964b07152d234b70'

# You can also use the general method of hash.new(), hashlib.new(name[, data]). Name passes in the name of the hash encryption algorithm, such as md5
>>> hashlib.new('md5', b'123').hexdigest()
'202cb962ac59075b964b07152d234b70'

Remember to create the b.txt file

3, Python Management Pack

1,tarfile

Since there is a compression module zipfile, it is natural to have an archive module tarfile. The tarfile module is used to unpack and package files, including those compressed by gzip, bz2 or lzma. If it is a file of type. zip, it is recommended to use the zipfile module. For more advanced functions, please use the shutil module.

Defined classes and exceptions

tarfile.open(name=None, mode='r', fileobj=None, bufsize=10240, \kwargs)

Returns an object of type TarFile. In essence, it is to open a file object. Python can see this kind of file object type design everywhere. You can easily understand it, can't you?

Name is the file name or path.

bufsize is used to specify the size of the data block, which is 20 * 512 bytes by default.

Mode is the open mode. A string similar to filemode[:compression] format can have the combination shown in the following table. The default is "r"

Pattern Explain
'r'or'r:*' Extract and open files automatically (recommended mode)
'r:' Open file only and do not decompress
'r:gz' Extract and open the file in gzip format
'r:bz2' Decompress and open the file in bz2 format
'r:xz' Decompress and open the file in lzma format
'x'or'x:' Create package file only, do not compress
'x:gz' Compress and package files in gzip mode
'x:bz2' Compress and package files with bzip2
'x:xz' Using lzma to compress and package files
'a'or'a:' Open the file and append the contents uncompressed. New if file does not exist
'w'or'w:' Write uncompressed
'w:gz' Compress and write as gzip
'w:bz2' Compress and write as bzip2
'w:xz' Compress and write in lzma mode
Be careful Modes' A: GZ ',' A: bz2 'and' a:xz 'are not supported

If the current mode does not properly open the file for reading, a ReadError exception will be thrown, in which case use the "r" mode. If the specified compression method is not supported, a CompressionError exception is thrown.

In the mode of w:gz,r:gz,w:bz2,r:bz2,x:gz,x:bz2, the tarfile.open() method accepts an additional compression level parameter, compresslevel, with the default value of 9.

(1) Read file

Compress files Extraction code: 0418

import tarfile

with tarfile.open('tengine-2.3.2.tar.gz') as t:
    # getmember() to view the list of files
    for member in t.getmembers():
        print(member.name)
with tarfile.open('tengine-2.3.2.tar.gz') as t:
    t.extractall('a','tengine-2.3.2/man')
    t.extract('tengine-2.3.2/man','b')
Common method description:
  • getmembers(): get the list of files in the tar package
  • member.name: get the file name of the file in the tar package
  • extract(member, path): extract a single file
  • Extract all (path, memebers): extract all files

(2) Create tar package

Remember to create the read.txt file

import tarfile

with tarfile.open( 'readme.tar',mode='w') as out :
    out.add('read.txt')

You can check whether there is a readme.tar file in the corresponding location

(3) Read and create compressed package

import tarfile

with tarfile.open('tarfile_add.tar ',mode='r:gz') as out:
    pass
with tarfile.open('tarfile_add.tar ',mode='r:bz2') as out:
    pass

(4) Back up the specified file to a compressed package

import os
import fnmatch
import tarfile
import datetime

def is_file_math(filename, patterns):
    '''Find files of a specific type'''
    for pattern in patterns:
        if fnmatch.fnmatch(filename, pattern):
            return True
        return False

def find_files(root, patterns=['*']):
    for root, dirnames, filenames in os.walk(root):
        for filename in filenames:
            if is_file_math(filename, patterns):
                yield os.path.join(root, filename)

patterns = ['*.txt','*.md']
now = datetime.datetime.now().strftime('%Y_%m_%d_%H_%M_%S')
filename = 'backup_all_file_{0}.tar.gz'.format(now)
with tarfile.open(filename, 'w') as f:
    for item in find_files('.', patterns):
        f.add(item)

You can check whether there is a readme.tar file in the corresponding location

2,zipfile

zipfile is used for compression and decompression of zip format in python. Because it is a very common zip format, this module is also used frequently.

Zipfile has two very important classes, zipfile and ZipInfo. In most cases, you only need to use these two classes.

  • ZipFile is the main class used to create and read zip files;
  • ZipInfo is the information for each file of the stored zip file.

(1) Read zip file

import zipfile

demo_zip = zipfile.ZipFile('read.zip')
print(demo_zip.namelist())
demo_zip.extractall('1')
demo_zip.extract('a.jpg','2')
//Remember to create a directory named 2. Of course, the path of the first field must also be correct.
Common method description:
  • namelist(): returns a string list of all files and folders contained in the zip file
  • extract(filename, path): extract a single file from a zip file
  • Extract all (path): extract all files from the zip file

(2) Create zip file

import zipfile

newZip = zipfile.ZipFile( 'new.zip', mode='w' )
newZip.write('a.jpg')     #File must exist
newZip.close()

(3) Python command line calls zipfile

#Create zip file
python -m zipfile -c new1.zip b.txt

#View the contents of the zip file
python -m zipfile -l new1.zip
File Name                                             Modified             Size
b.txt                                          2020-04-26 14:35:12            0

#Extract the zip file to the specified directory
python -m zipfile -e new1.zip /

Options included in the command line interface provided by the zipfile module:

  • -1: Display the list of files in zi p package
  • -e: Extracting z i p compressed packets
  • -c: Create a zip package
  • -t: Verify that the file is a valid zi p

(4) Properties of zipfile

import zipfile, os
zipFile = zipfile.ZipFile(os.path.join(os.getcwd(), 'duoduo.zip'))
zipInfo = zipFile.getinfo('Files in files.txt')
print ('filename:', zipInfo.filename) #Get file name
print ('date_time:', zipInfo.date_time) #Gets the last modification time of the file. Returns a tuple containing six elements: (year, month, day, hour, minute, second)
print ('compress_type:', zipInfo.compress_type) #Compression type
print ('comment:', zipInfo.comment) #Document description
print ('extra:', zipInfo.extra) #Extension data
print ('create_system:', zipInfo.create_system) #Gets the system that created the zip document.
print ('create_version:', zipInfo.create_version) #Gets the PKZIP version of the zip document created.
print ('extract_version:', zipInfo.extract_version) #Get the PKZIP version required to extract the zip document.
print ('extract_version:', zipInfo.reserved) # Reserved field. The current implementation always returns 0.
print ('flag_bits:', zipInfo.flag_bits) #zip flag bit.
print ('volume:', zipInfo.volume) # Volume label for the header.
print ('internal_attr:', zipInfo.internal_attr) #Internal properties.
print ('external_attr:', zipInfo.external_attr) #External properties.
print ('header_offset:', zipInfo.header_offset) # File header offset.
print ('CRC:', zipInfo.CRC) # CRC-32 for uncompressed files.
print ('compress_size:', zipInfo.compress_size) #Gets the compressed size.
print ('file_size:', zipInfo.file_size) #Gets the uncompressed file size.
zipFile.close() #

3. shutil creates and reads compressed packages

Shutil can be simply understood as sh + util, shell tool. The shutil module is a supplement to the os module, mainly for copying, deleting, moving, compressing and decompressing files.

usage method

  • Copyfile (src, dst) is copied from source src to dst. Of course, the premise is that the target address has writable permission. The exception information thrown is IOException. If the current dst already exists, it will be overwritten
  • Copymode (SRC, DST) will only copy its permissions. Other things will not be copied
  • Copystat (SRC, DST) copy permission, last access time, last modification time
  • Copy (SRC, DST) copies a file to a file or directory
  • Based on copy, Copy2 (SRC, DST) copies the last access time and modification time of the file, similar to cp – p
  • Copy2 (SRC, DST) if the file systems of two locations are the same, it is equivalent to rename, just rename; if it is not in the same file system, it is move
  • Copytree (olddir, newdir, True / flame) copies olddir to newdir. If the third parameter is True, the symbolic connection under the folder will be maintained when copying the directory. If the third parameter is False, a physical copy will be generated under the replicated directory to replace the symbolic connection

test

import shutil

print(shutil.get_archive_formats())
The output results are as follows:
[('bztar', "bzip2'ed tar-file"), ('gztar', "gzip'ed tar-file"), ('tar', 'uncompressed tar file'), ('xztar', "xz'ed tar-file"), ('zip', 'ZIP file')]

(1) Create a compressed package

import shutil
# Parameter 1: name of the generated package file
# Parameter 2: format of compressed package
# Parameter 3: compressed directory
shutil.make_archive('a.jpg','gztar', 'ddd')

You can check whether there are generated files in the corresponding location

(2) Unzip

import shutil
# Parameter 1: the compressed package to be decompressed
# Parameter 2: extracted directory
print(shutil.unpack_archive('a.jpg.tar.gz','jpg'))

You can check whether there are generated files in the corresponding location

Topics: Python IPython Linux Pycharm