1, Advanced file processing interface
shutil
Is a high-level file operation tool
It is similar to advanced API, and its main strength lies in its better support for copying and deleting files.
usage method
- Copyfile (src, dst) is copied from source src to dst. Of course, the premise is that the target address has writable permission. The exception information thrown is IOException. If the current dst already exists, it will be overwritten
- Copymode (SRC, DST) will only copy its permissions. Other things will not be copied
- Copystat (SRC, DST) copy permission, last access time, last modification time
- Copy (SRC, DST) copies a file to a file or directory
- Based on copy, Copy2 (SRC, DST) copies the last access time and modification time of the file, similar to cp – p
- Copy2 (SRC, DST) if the file systems of two locations are the same, it is equivalent to rename, just rename; if it is not in the same file system, it is move
- Copytree (olddir, newdir, True / flame) copies olddir to newdir. If the third parameter is True, the symbolic connection under the folder will be maintained when copying the directory. If the third parameter is False, a physical copy will be generated under the replicated directory to replace the symbolic connection
[root@python ~]# mkdir /tmp/demo [root@python ~]# cd /tmp/demo/ [root@python demo]# mkdir -p dir1 [root@python demo]# touch a.txt b.txt c.txt [root@python demo]# touch sh.py cc.py 001.jpg 002.jpg 003.jpg //Create required files [root@python demo]# ipython //Open ipython
You can also create files in pycharms for implementation.
1. Copy files and folders
shutil.copy(file1,file2) #file shutil.copytree(dir1,dir2) #folder
(1) Copy file
In [1]: import shutil In [2]: shutil.copy('a.txt','aa.txt') Out[2]: 'aa.txt' //You can check whether there are generated files in the corresponding paths of PyCharm and Linux In [3]: ls 001.jpg 003.jpg a.txt cc.py sh.py 002.jpg aa.txt b.txt c.txt
(2) Copy folder
In [5]: shutil.copytree('dir1','dir11') Out[5]: 'dir11' In [6]: ls 001.jpg 003.jpg a.txt cc.py dir1/ sh.py 002.jpg aa.txt b.txt c.txt dir11/
(3) Copy the contents of the file to another file
# _*_ coding:utf-8 _*_ __author__ = 'junxi' import shutil # Copy the contents of the file to another file shutil.copyfileobj(open('old.txt', 'r'), open('new.txt', 'w')) # Copy files shutil.copyfile('old.txt', 'old1.txt') # Copy permission only. Content, group and user remain unchanged shutil.copymode('old.txt', 'old1.txt') # Copy permission, last access time, last modification time shutil.copystat('old.txt', 'old1.txt') # Copy a file to a file or directory shutil.copy('old.txt', 'old2.txt') # On the basis of copy, copy the last access time and modification time of the file shutil.copy2('old.txt', 'old2.txt') # Copy olddir to newdir. If the third parameter is True, the symbolic connection under the folder will be maintained when copying the directory. If the third parameter is False, a physical copy will be generated under the copied directory to replace the symbolic connection shutil.copytree('C:/Users/xiaoxinsoso/Desktop/aaa', 'C:/Users/xiaoxinsoso/Desktop/bbb') # Move directory or file shutil.move('C:/Users/xiaoxinsoso/Desktop/aaa', 'C:/Users/xiaoxinsoso/Desktop/bbb') # Move aaa directory to bbb directory # Delete a directory shutil.rmtree('C:/Users/xiaoxinsoso/Desktop/bbb') # Delete bbb directory
2. Rename and move files and folders
shutil.move(filel, file2) shutil.move(file, dir)
(1) Rename of file
In [7]: shutil.move('aa.txt','dd.txt') Out[7]: 'dd.txt' In [8]: ls 001.jpg 003.jpg b.txt c.txt dir1/ sh.py 002.jpg a.txt cc.py dd.txt dir11/
(2) File move to folder
In [9]: shutil.move('dd.txt','dir1') Out[9]: 'dir1/dd.txt' In [11]: ls dir1 dd.txt
3. Delete directory
shutil.rmtree(dir) # Delete directory os.unlink(file) # Delete file
Delete directory
In [15]: shutil.rmtree('dir1') In [16]: ls 001.jpg 003.jpg b.txt c.txt sh.py 002.jpg a.txt cc.py dir11/
2, Document content management
1. Directory and file comparison
The filecmp module contains operations to compare directories and files.
filecmp can realize the difference comparison function of files, directories and traversal subdirectories.
With filecmp module, no installation is required.
(1) Directory structure
The contents of files a_copy.txt, a.txt and c.txt in directory dir1 are the same, but the contents of b.txt are different
[root@python demo]# mkdir compare [root@python demo]# cd compare/ [root@python compare]# mkdir -p dir1 dir2 [root@python compare]# mkdir dir1/subdir1 [root@python compare]# ls dir1 dir2 [root@python compare]# touch dir1/a_copy.txt dir1/a.txt dir1/b.txt dir1/c.txt [root@python compare]# touch dir2/a.txt dir2/b.txt dir2/c.txt [root@python compare]# mkdir -p dir2/subdir1 dir2/subdir2 [root@python compare]# touch dir2/subdir1/sb.txt //Create required files [root@python compare]# ipython //Open ipython
filecmp provides three operation methods: CMP (single file comparison), cmpfile (multi file comparison), and dircmp (directory comparison).
(2) Example code:
Use the cmp function of filecmp module to compare whether two files are the same. If the files are the same, return True, otherwise False
In [1]: import filecmp In [2]: filecmp.cmp('a.txt','b.txt') Out[2]: False In [3]: filecmp.cmp('a.txt','c.txt') Out[3]: True In [4]: filecmp.cmp('a.txt','a_copy.txt') Out[4]: True
(3) Compare two files
There is also a function named cmpfiles in the filecmp directory, which is used to compare multiple files in two different directories at the same time, and return a triple containing the same file, different files and files that cannot be compared. An example is as follows:
In [9]: filecmp.cmpfiles('dir1','dir2',['a.txt','b.txt','c.txt','a_copy.txt']) Out[9]: (['b.txt'], ['a.txt', 'c.txt'], ['a_copy.txt']) # Returns a triple. The first is the same. The third is different. The third is not comparable (without this file or for other reasons)
(4) Compare multiple files
The cmpfiles function is used to compare files in two directories at the same time, or it can be used to compare two directories. However, when comparing two directories, you need to specify possible files by parameters, so it is cumbersome.
There is also a function called dircmp in filecmp to compare two directories. After calling the dircmp function, an object of dircmp class will be returned. This object holds many properties. We can get the differences between directories by looking at these properties. As follows:
In [11]: d = filecmp.dircmp('dir1','dir2') #Set test directory In [12]: d.report() diff dir1 dir2 Only in dir1 : ['a_copy.txt'] Only in dir2 : ['subdir2'] Identical files : ['b.txt'] Differing files : ['a.txt', 'c.txt'] Common subdirectories: ['subdir1']
(5) Direct comparison directory does not specify file
Directory comparison: create A directory comparison object through filecmp (a,b[,ignore[,hide]]) class to compare folders. By comparing two folders, you can get some detailed comparison results (such as the list of files only existing in folder A), and support recursive comparison of subfolders.
In [17]: d.left_list #View dir1 directory structure Out[17]: ['a.txt', 'a_copy.txt', 'b.txt', 'c.txt', 'subdir1'] In [18]: d.right_list #View dir2 directory structure Out[18]: ['a.txt', 'b.txt', 'c.txt', 'subdir1', 'subdir2'] In [19]: d.left_only #Only the Out[19]: ['a_copy.txt'] In [20]: d.right_only #Only directory dir2 exists Out[20]: ['subdir2']
2. MD5 checksum comparison
The check code is calculated by hash function, which is a method of creating small digital "fingerprint" from any data. The hash function compresses the message or data into a summary, making the data smaller and easier to compare. MDS is the most official
MD5 hashes are generally used to check the integrity of files, especially to check the correctness of files in case of file transfer, disk error or other situations.
Under Linux, the MD5 check code of a file is calculated as follows:
[root@192 demo]# md5sum a.txt d41d8cd98f00b204e9800998ecf8427e a.txt
It is also very simple to calculate the MD5 check code of a file in Python. You can use the standard library hashlib module. As follows:
import hashlib d = hashlib.md5() with open('b.txt') as f: for line in f: d.update(line.encode('utf-8')) print(d.hexdigest()) # Or you can (the most common way of writing, often used to name pictures) >>> import hashlib >>> hashlib.md5(b'123').hexdigest() '202cb962ac59075b964b07152d234b70' # You can also use the general method of hash.new(), hashlib.new(name[, data]). Name passes in the name of the hash encryption algorithm, such as md5 >>> hashlib.new('md5', b'123').hexdigest() '202cb962ac59075b964b07152d234b70'
Remember to create the b.txt file
3, Python Management Pack
1,tarfile
Since there is a compression module zipfile, it is natural to have an archive module tarfile. The tarfile module is used to unpack and package files, including those compressed by gzip, bz2 or lzma. If it is a file of type. zip, it is recommended to use the zipfile module. For more advanced functions, please use the shutil module.
Defined classes and exceptions
tarfile.open(name=None, mode='r', fileobj=None, bufsize=10240, \kwargs)
Returns an object of type TarFile. In essence, it is to open a file object. Python can see this kind of file object type design everywhere. You can easily understand it, can't you?
Name is the file name or path.
bufsize is used to specify the size of the data block, which is 20 * 512 bytes by default.
Mode is the open mode. A string similar to filemode[:compression] format can have the combination shown in the following table. The default is "r"
Pattern | Explain |
---|---|
'r'or'r:*' | Extract and open files automatically (recommended mode) |
'r:' | Open file only and do not decompress |
'r:gz' | Extract and open the file in gzip format |
'r:bz2' | Decompress and open the file in bz2 format |
'r:xz' | Decompress and open the file in lzma format |
'x'or'x:' | Create package file only, do not compress |
'x:gz' | Compress and package files in gzip mode |
'x:bz2' | Compress and package files with bzip2 |
'x:xz' | Using lzma to compress and package files |
'a'or'a:' | Open the file and append the contents uncompressed. New if file does not exist |
'w'or'w:' | Write uncompressed |
'w:gz' | Compress and write as gzip |
'w:bz2' | Compress and write as bzip2 |
'w:xz' | Compress and write in lzma mode |
Be careful | Modes' A: GZ ',' A: bz2 'and' a:xz 'are not supported |
If the current mode does not properly open the file for reading, a ReadError exception will be thrown, in which case use the "r" mode. If the specified compression method is not supported, a CompressionError exception is thrown.
In the mode of w:gz,r:gz,w:bz2,r:bz2,x:gz,x:bz2, the tarfile.open() method accepts an additional compression level parameter, compresslevel, with the default value of 9.
(1) Read file
Compress files Extraction code: 0418
import tarfile with tarfile.open('tengine-2.3.2.tar.gz') as t: # getmember() to view the list of files for member in t.getmembers(): print(member.name) with tarfile.open('tengine-2.3.2.tar.gz') as t: t.extractall('a','tengine-2.3.2/man') t.extract('tengine-2.3.2/man','b')
Common method description:
- getmembers(): get the list of files in the tar package
- member.name: get the file name of the file in the tar package
- extract(member, path): extract a single file
- Extract all (path, memebers): extract all files
(2) Create tar package
Remember to create the read.txt file
import tarfile with tarfile.open( 'readme.tar',mode='w') as out : out.add('read.txt')
You can check whether there is a readme.tar file in the corresponding location
(3) Read and create compressed package
import tarfile with tarfile.open('tarfile_add.tar ',mode='r:gz') as out: pass with tarfile.open('tarfile_add.tar ',mode='r:bz2') as out: pass
(4) Back up the specified file to a compressed package
import os import fnmatch import tarfile import datetime def is_file_math(filename, patterns): '''Find files of a specific type''' for pattern in patterns: if fnmatch.fnmatch(filename, pattern): return True return False def find_files(root, patterns=['*']): for root, dirnames, filenames in os.walk(root): for filename in filenames: if is_file_math(filename, patterns): yield os.path.join(root, filename) patterns = ['*.txt','*.md'] now = datetime.datetime.now().strftime('%Y_%m_%d_%H_%M_%S') filename = 'backup_all_file_{0}.tar.gz'.format(now) with tarfile.open(filename, 'w') as f: for item in find_files('.', patterns): f.add(item)
You can check whether there is a readme.tar file in the corresponding location
2,zipfile
zipfile is used for compression and decompression of zip format in python. Because it is a very common zip format, this module is also used frequently.
Zipfile has two very important classes, zipfile and ZipInfo. In most cases, you only need to use these two classes.
- ZipFile is the main class used to create and read zip files;
- ZipInfo is the information for each file of the stored zip file.
(1) Read zip file
import zipfile demo_zip = zipfile.ZipFile('read.zip') print(demo_zip.namelist()) demo_zip.extractall('1') demo_zip.extract('a.jpg','2') //Remember to create a directory named 2. Of course, the path of the first field must also be correct.
Common method description:
- namelist(): returns a string list of all files and folders contained in the zip file
- extract(filename, path): extract a single file from a zip file
- Extract all (path): extract all files from the zip file
(2) Create zip file
import zipfile newZip = zipfile.ZipFile( 'new.zip', mode='w' ) newZip.write('a.jpg') #File must exist newZip.close()
(3) Python command line calls zipfile
#Create zip file python -m zipfile -c new1.zip b.txt #View the contents of the zip file python -m zipfile -l new1.zip File Name Modified Size b.txt 2020-04-26 14:35:12 0 #Extract the zip file to the specified directory python -m zipfile -e new1.zip /
Options included in the command line interface provided by the zipfile module:
- -1: Display the list of files in zi p package
- -e: Extracting z i p compressed packets
- -c: Create a zip package
- -t: Verify that the file is a valid zi p
(4) Properties of zipfile
import zipfile, os zipFile = zipfile.ZipFile(os.path.join(os.getcwd(), 'duoduo.zip')) zipInfo = zipFile.getinfo('Files in files.txt') print ('filename:', zipInfo.filename) #Get file name print ('date_time:', zipInfo.date_time) #Gets the last modification time of the file. Returns a tuple containing six elements: (year, month, day, hour, minute, second) print ('compress_type:', zipInfo.compress_type) #Compression type print ('comment:', zipInfo.comment) #Document description print ('extra:', zipInfo.extra) #Extension data print ('create_system:', zipInfo.create_system) #Gets the system that created the zip document. print ('create_version:', zipInfo.create_version) #Gets the PKZIP version of the zip document created. print ('extract_version:', zipInfo.extract_version) #Get the PKZIP version required to extract the zip document. print ('extract_version:', zipInfo.reserved) # Reserved field. The current implementation always returns 0. print ('flag_bits:', zipInfo.flag_bits) #zip flag bit. print ('volume:', zipInfo.volume) # Volume label for the header. print ('internal_attr:', zipInfo.internal_attr) #Internal properties. print ('external_attr:', zipInfo.external_attr) #External properties. print ('header_offset:', zipInfo.header_offset) # File header offset. print ('CRC:', zipInfo.CRC) # CRC-32 for uncompressed files. print ('compress_size:', zipInfo.compress_size) #Gets the compressed size. print ('file_size:', zipInfo.file_size) #Gets the uncompressed file size. zipFile.close() #
3. shutil creates and reads compressed packages
Shutil can be simply understood as sh + util, shell tool. The shutil module is a supplement to the os module, mainly for copying, deleting, moving, compressing and decompressing files.
usage method
- Copyfile (src, dst) is copied from source src to dst. Of course, the premise is that the target address has writable permission. The exception information thrown is IOException. If the current dst already exists, it will be overwritten
- Copymode (SRC, DST) will only copy its permissions. Other things will not be copied
- Copystat (SRC, DST) copy permission, last access time, last modification time
- Copy (SRC, DST) copies a file to a file or directory
- Based on copy, Copy2 (SRC, DST) copies the last access time and modification time of the file, similar to cp – p
- Copy2 (SRC, DST) if the file systems of two locations are the same, it is equivalent to rename, just rename; if it is not in the same file system, it is move
- Copytree (olddir, newdir, True / flame) copies olddir to newdir. If the third parameter is True, the symbolic connection under the folder will be maintained when copying the directory. If the third parameter is False, a physical copy will be generated under the replicated directory to replace the symbolic connection
test
import shutil print(shutil.get_archive_formats())
The output results are as follows:
[('bztar', "bzip2'ed tar-file"), ('gztar', "gzip'ed tar-file"), ('tar', 'uncompressed tar file'), ('xztar', "xz'ed tar-file"), ('zip', 'ZIP file')]
(1) Create a compressed package
import shutil # Parameter 1: name of the generated package file # Parameter 2: format of compressed package # Parameter 3: compressed directory shutil.make_archive('a.jpg','gztar', 'ddd')
You can check whether there are generated files in the corresponding location
(2) Unzip
import shutil # Parameter 1: the compressed package to be decompressed # Parameter 2: extracted directory print(shutil.unpack_archive('a.jpg.tar.gz','jpg'))
You can check whether there are generated files in the corresponding location