Teach you how to operate files with python

Posted by FlyingIsFun1217 on Tue, 11 Jan 2022 07:56:46 +0100

Python has several built-in modules and methods to process files. These methods are segmented into, for example, OS, OS Path, shutil and pathlib. This article will list the most commonly used operations and methods for files in Python.

In this article, you will learn how to:

Get file properties
Create directory
File name pattern matching
Traverse the directory tree
Create temporary files and directories
Delete files and directories
Copy, move, and rename files and directories
Create and unzip ZIP zip and TAR files
Use the fileinput module to open multiple files

Reading and writing file data in Python

Using Python to read and write files is very simple. To do this, you must first open the file in the appropriate mode. Here is an example of how to open a text file and read its contents.

with open('data.txt', 'r') as f:
    data = f.read()
    print('context: {}'.format(data))
Copy code

open() takes a file name and a mode as its parameters, and r means to open the file in read-only mode. If you want to write data to the file, use w as the parameter.

with open('data.txt', 'w') as f:
    data = 'some data to be written to the file'
    f.write(data)
Copy code

In the above example, open() opens a file for reading or writing and returns a file handle (f in this example), which provides a method that can be used to read or write file data. read Working With File I/O in Python Get more information about how to read and write files.

Get directory list

Suppose your current working directory has a directory called my_directory, which contains the following contents:

.
├── file1.py
├── file2.csv
├── file3.txt
├── sub_dir
│   ├── bar.py
│   └── foo.py
├── sub_dir_b
│   └── file4.txt
└── sub_dir_c
    ├── config.py
    └── file5.txt
 Copy code

Python's built-in os module has many useful methods that can be used to list directory contents and filter results. In order to get a list of all files and folders in a specific directory in the file system, you can use os. Com in the legacy version of Python Listdir() or in Python 3 os. X scandir() . If you also want to get file and directory properties (such as file size and modification date), then os Scanner () is the preferred method.

Get directory list using legacy Python

import os
entries = os.listdir('my_directory')
Copy code

os.listdir() returns a Python list containing the names of the files and subdirectories of the directory indicated by the path parameter.

['file1.py', 'file2.csv', 'file3.txt', 'sub_dir', 'sub_dir_b', 'sub_dir_c']
Copy code

The directory list doesn't look easy to read now, right The call result of listdir () is helpful to view by using circular printing.

for entry in entries:
    print(entry)

"""
file1.py
file2.csv
file3.txt
sub_dir
sub_dir_b
sub_dir_c
"""

Copy code

Get a list of directories using a modern version of Python

In modern Python versions, you can use OS Scandir() and pathlib Path instead of OS listdir() .

os.scandir() is referenced in Python 3.5 and its document is PEP 471 .

os. The scanner () call returns an iterator instead of a list.

import os
entries = os.scandir('my_directory')
print(entries)
# <posix.ScandirIterator at 0x105b4d4b0>
Copy code

Scandiritters point to all entries in the current directory. You can traverse the contents of the iterator and print the file name.

import os
with os.scandir('my_directory') as entries:
    for entry in entries:
        print(entry.name)
Copy code

Here OS Scanner () is used with the with statement because it supports the context management protocol. Use the context manager to close the iterator and automatically release the acquired resources when the iterator runs out. In my_ The result of directory printing the file name is the same as that in OS As seen in the listdir () example:

file1.py
file2.csv
file3.txt
sub_dir
sub_dir_b
sub_dir_c
 Copy code

Another way to get the directory list is to use the pathlib module:

from pathlib import Path

entries = Path('my_directory')
for entry in entries.iterdir():
    print(entry.name)
Copy code

pathlib.Path() returns a PosixPath or WindowsPath object, depending on the operating system.

pathlib. The path() object has a The iterdir() method is used to create an iterator that contains all the files and directories in that directory. By Each entry generated by iterdir() contains information about the file or directory, such as its name and file attributes. Pathlib in Python 3 4 was first introduced, and it is a good enhancement to python. It provides an object-oriented interface for the file system.

In the above example, you call pathlib Path() and passed in a path parameter. Then call. iterdir() to get my_ List of all files and directories under directory.

Pathlib provides a set of classes that provide most common operations on the path in a simple and object-oriented manner. Using pathlib is more effective than using functions in os. Another advantage of using pathlib over os is that it reduces the number of packages or modules imported by operating the file system path. For more information, read Python 3's pathlib Module: Taming the File System .

Running the above code will get the following results:

file1.py
file2.csv
file3.txt
sub_dir
sub_dir_b
sub_dir_c
 Copy code

Use pathlib Path () or os Scanner () instead of os Listdir () is the preferred method to get the directory list, especially when you need to get the file type and file attribute information. pathlib.Path() provides most of the functions of processing files and paths in os and shutil, and its method is more effective than these modules. We will discuss how to get file attributes quickly.

function	describe
os.listdir()	Returns all files and folders in the directory as a list
os.scandir()	Returns an iterator containing all the objects in the directory. The object contains file attribute information
pathlib.Path().iterdir()	Returns an iterator containing all the objects in the directory. The object contains file attribute information

These functions return a list of everything in the directory, including subdirectories. This may not always be the result you always wanted. The next section will show you how to filter the results from the directory list.

Lists all files in the directory

This section will show you how to use OS listdir() ，os.scandir() and pathlib Path() prints out the name of the file in the directory. To filter the directory and list only OS The file of the directory list generated by listdir(), to use OS path :

import os

basepath = 'my_directory'
for entry in os.listdir(basepath):
    # Use OS path. Isfile determines whether the path is a file type
    if os.path.isfile(os.path.join(base_path, entry)):
        print(entry)
Copy code

Call OS. 0 here Listdir () returns a list of all the contents in the specified path, and then uses OS path. Isfile() filters the list to show only file types, not directory types. The code execution results are as follows:

file1.py
file2.csv
file3.txt
 Copy code

A simpler way to list all the files in a directory is to use OS Scandir() or pathlib Path() :

import os

basepath = 'my_directory'
with os.scandir(basepath) as entries:
    for entry in entries:
        if entry.is_file():
            print(entry.name)
Copy code

Use OS Scandir() OS Listdir () looks clearer and easier to understand. Call entry. On each item of scandiritator Isfile(). If True is returned, it indicates that this item is a file. The output of the above code is as follows:

file1.py
file3.txt
file2.csv
 Copy code

Next, show how to use pathlib Path() lists files in a directory:

from pathlib import Path

basepath = Path('my_directory')
for entry in basepath.iterdir():
    if entry.is_file():
        print(entry.name)
Copy code

Yes Every call generated by iterdir() is_file() . The output result is the same as above:

file1.py
file3.txt
file2.csv
 Copy code

If the for loop and if statement are combined into a single generator expression, the above code can be more concise. About generator expressions, I recommend one Dan Bader Article.

The revised version is as follows:

from pathlib import Path

basepath = Path('my_directory')
files_in_basepath = (entry for entry in basepath.iterdir() if entry.is_file())
for item in files_in_basepath:
    print(item.name)
Copy code

The execution result of the above code is the same as before. This section shows how to use OS Scandir() and pathlib Path() filters files or directories better than OS Listdir() and OS Path is more intuitive and the code looks more concise.

List subdirectories

If you want to list subdirectories instead of files, use the following method. Now show me how to use OS Listdir() and OS path() :

import os

basepath = 'my_directory'
for entry in os.listdir(basepath):
    if os.path.isdir(os.path.join(basepath, entry)):
        print(entry)
Copy code

When you call OS several times Path, join (), operating the file system in this way becomes cumbersome. Running this code on my computer produces the following output:

sub_dir
sub_dir_b
sub_dir_c
 Copy code

Here is how to use OS scandir() :

import os

basepath = 'my_directory'
with os.scandir(basepath) as entries:
    for entry in entries:
        if entry.is_dir():
            print(entry.name)
Copy code

As with the example in the file list, here in OS Called on each item returned by scanner () is_dir() . If this is a directory, then is_dir() returns True and prints out the name of the directory. The output result is the same as above:

sub_dir_c
sub_dir_b
sub_dir
 Copy code

Here is how to use pathlib Path() :

from pathlib import Path

basepath = Path('my_directory')
for entry in basepath.iterdir():
    if entry.is_dir():
        print(entry.name)
Copy code

Yes Is is called on each item returned by the iterdir() iterator_ Dir() checks whether it is a file or a directory. If the item is a directory, its name is printed and the output generated is the same as in the previous example:

sub_dir_c
sub_dir_b
sub_dir
 Copy code

Get file properties

Python can easily obtain file attributes such as file size and modification time. You can use OS stat() ， os.scandir() or pathlib Path to get.

os.scandir() and pathlib Path() can directly get the list of directories containing file properties. This may be better than using OS Listdir () lists files and then gets the file attribute information of each file, which is more effective.

The following example shows how to get my_ The last modification time of the file in the directory. Output in timestamp:

import os

with os.scandir('my_directory') as entries:
    for entry in entries:
        info = entry.stat()
        print(info.st_mtime)
        
"""
1548163662.3952665
1548163689.1982062
1548163697.9175904
1548163721.1841028
1548163740.765162
1548163769.4702623
"""
Copy code

os.scandir() returns a scandiritator object. Each item in the scandiritator object has a The stat() method gets information about the file or directory it points to stat() provides information such as file size and last modification time. In the above example, the code prints st_time attribute, which is the time when the file content was last modified.

The pathlib module has corresponding methods to obtain the file information of the same result:

from pathlib import Path

basepath = Path('my_directory')
for entry in basepath.iterdir():
    info = entry.stat()
    print(info.st_mtime)

"""
1548163662.3952665
1548163689.1982062
1548163697.9175904
1548163721.1841028
1548163740.765162
1548163769.4702623
"""
Copy code

In the example above, loop Iterator returned by iterdir() and by calling each of them stat() to get the file properties. st_ The mtime attribute is a floating-point value that represents a timestamp. In order to make st_ The value returned by time is easier to read. You can write an auxiliary function to convert it into a datetime object:

import datetime                                                                   
from pathlib import Path                                                          
                                                                                  
                                                                                  
def timestamp2datetime(timestamp, convert_to_local=True, utc=8, is_remove_ms=True)
    """                                                                           
    transformation UNIX Timestamp is datetime object                                                       
    :param timestamp: time stamp                                                         
    :param convert_to_local: Convert to local time                                             
    :param utc: Time zone information, China utc+8                                                     
    :param is_remove_ms: Remove milliseconds                                                   
    :return: datetime object                                                          
    """                                                                           
    if is_remove_ms:                                                              
        timestamp = int(timestamp)                                                
    dt = datetime.datetime.utcfromtimestamp(timestamp)                            
    if convert_to_local:                                                          
        dt = dt + datetime.timedelta(hours=utc)                                   
    return dt                                                                     
                                                                                  
                                                                                  
def convert_date(timestamp, format='%Y-%m-%d %H:%M:%S'):                          
    dt = timestamp2datetime(timestamp)                                            
    return dt.strftime(format)                                                    
                                                                                  
                                                                                  
basepath = Path('my_directory')                                                   
for entry in basepath.iterdir():
    if entry.is_file()
    	info = entry.stat()                                                           
    	print('{} Last modified on {}'.format(entry.name, timestamp2datetime(info.st_mtime)))  
Copy code

First get my_ The list of files in Chinese directory and their properties, then call convert_date() to convert the last modification time of the file to be displayed in a human readable manner. convert_date() is used strftime() converts the datetime type to a string.

Output result of the above code:

file3.txt Last modified in 2019-01-24 09:04:39
file2.csv Last modified in 2019-01-24 09:04:39
file1.py Last modified in 2019-01-24 09:04:39
 Copy code

The syntax for converting dates and times to strings can be confusing.

Create directory

Sooner or later, the program you write needs to create a directory in which to store data. os and pathlib contain functions to create directories. We will consider the following methods:

method	describe
os.mkdir()	Create a single subdirectory
os.makedirs()	Create multiple directories, including intermediate directories
Pathlib.Path.mkdir()	Create single or multiple directories

Create a single directory

To create a single directory, pass the directory path as a parameter to OS mkdir() :

import os

os.mkdir('example_directory')
Copy code

If the directory already exists, OS Mkdir() will throw a FileExistsError exception. Alternatively, you can use pathlib to create directories:

from pathlib import Path

p = Path('example_directory')
p.mkdir()
Copy code

If the path already exists, mkdir() will throw a FileExistsError exception:

FileExistsError: [Errno 17] File exists: 'example_directory'
Copy code

To avoid throwing errors like this, catch errors when they occur and let your users know:

from pathlib import Path

p = Path('example_directory')
try:
    p.mkdir()
except FileExistsError as e:
    print(e)
Copy code

Or you can give it mkdir() passes in exist_ok=True parameter to ignore FileExistsError exception:

from pathlib import Path

p = Path('example_directory')
p.mkdir(exist_ok=True)
Copy code

If the directory already exists, it will not cause an error.

Create multiple directories

os.makedirs() and OS Mkdir() is similar. The difference between the two is that OS Makedirs () can not only create a separate directory, but also create a directory tree recursively. In other words, it can create any intermediate folder necessary to ensure that it is saved in the full path.

os.makedirs() is similar to running mkdir -p in bash. For example, to create a group of directories like October 5, 2018, you can do the following:

import os

os.makedirs('2018/10/05', mode=0o770)
Copy code

The above code creates the directory structure on October 5, 2018 and provides read, write and execute permissions for owners and group users. The default mode is 0o777, which increases the permissions of other user groups.

Run the tree command to confirm the permissions of our application:

$ tree -p -i .
.
[drwxrwx---]  2018
[drwxrwx---]  10
[drwxrwx---]  05
 Copy code

The above code prints out the directory tree of the current directory. Tree is usually used to list the contents of a directory in a tree structure. If the - p and - i parameters are passed in, the directory name and its file permission information will be printed in a vertical list- p is used to output file permissions, and - i is used to make the tree command produce a vertical list without indents.

As you can see, all directories have 770 permissions. Another way to create multiple directories is to use pathlib Path mkdir() :

from pathlib import Path

p = Path('2018/10/05')
p.mkdir(parents=True, exist_ok=True)
Copy code

By giving path Mkdir() passes the parents=True keyword parameter so that it creates 05 directories and all parent directories that make their paths valid.

By default, OS Makedirs() and pathlib Path. Mkdir() will throw OSError when the target directory exists. Pass exist every time the function is called_ OK = true as a keyword parameter can override this behavior (starting with Python 3.2).

Running the above code will get a structure like the following:

└── 2018
    └── 10
        └── 05
 Copy code

I prefer to use pathlib when creating directories, because I can use the same function method to create one or more directories.

File name pattern matching

After using one of the above methods to get the list of files in the directory, you may want to search for files that match a specific pattern.

Here are the methods and functions you can use:

Endswitch() and startswitch() string methods
fnmatch.fnmatch()
glob.glob()
pathlib.Path.glob()

These methods and functions are discussed below. The example in this section will be named some_directory, which has the following structure:

.
├── admin.py
├── data_01_backup.txt
├── data_01.txt
├── data_02_backup.txt
├── data_02.txt
├── data_03_backup.txt
├── data_03.txt
├── sub_dir
│   ├── file1.py
│   └── file2.py
└── tests.py
 Copy code

If you are using the Bash shell, you can create the above directory structure using the following command:

mkdir some_directory
cd some_directory
mkdir sub_dir
touch sub_dir/file1.py sub_dir/file2.py
touch data_{01..03}.txt data_{01..03}_backup.txt admin.py tests.py
 Copy code

This will create some_directory directory and enter it, then create a sub_dir . The next line is in sub_dir create file1 Py and File2 Py, the last line uses the extension to create all other files.

Use string method

Python has several built-in Modify and manipulate strings Methods. When matching file names, two of these methods Startswitch() and Endswitch () is very useful. To do this, first get a list of directories and then traverse.

import os

for f_name in os.listdir('some_directory'):
    if f_name.endswith('.txt'):
        print(f_name)
Copy code

The above code found some_ All files in directory, traverse and use Endswitch() to print all files with the extension txt file name. The running code is output on my computer as follows:

data_01.txt
data_01_backup.txt
data_02.txt
data_02_backup.txt
data_03.txt
data_03_backup.txt
 Copy code

Simple file name pattern matching using fnmatch

The ability of string method matching is limited. Fnmatch has more advanced functions and methods for pattern matching. We will consider using fnmatch Fnmatch(), which supports the use of * and? Functions such as wildcards. For example, use fnmatch to find all in the directory txt file, you can do this:

import os
import fnmatch

for f_name in os.listdir('some_directory'):
    if fnmatch.fnmatch(f_name, '*.txt'):
        print(f_name)
Copy code

Iterative some_directory, and use fnmatch() has an extension of txt file to perform a wildcard search.

More advanced pattern matching

Suppose you want to find a that matches a particular drop txt file. For example, you might point to find a that contains single data txt file, a set of numbers between underscores, and the word backup contained in the file name. It's similar to data_01_backup, data_02_backup, or data_03_backup .

You can use fnmatch fnmatch() :

import os
import fnmatch

for f_name in os.listdir('some_directory'):
    if fnmatch.fnmatch(f_name, 'data_*_backup.txt'):
        print(f_name)
Copy code

Here, just print out the matching data_*_backup.txt mode file name. The * in the pattern will match any character, so running this code will look for a file name that starts with data and starts with backup Txt, as shown in the following output:

data_01_backup.txt
data_02_backup.txt
data_03_backup.txt
 Copy code

File name pattern matching using glob

Another useful pattern matching module is glob.

. glob() is left and right in the glob module, just like fnmatch Fnmatch(), but with fnmach Unlike fnmatch(), it will Files beginning with are considered special files.

UNIX and related systems use wildcards in the file list, such as? And * indicate full match.

For example, use MV *. In a UNIX shell py python_files moves all File with py extension from current directory to python_files . This * is a wildcard character representing any number of characters, * Py is a full mode. This shell feature is not available on Windows operating systems. However, the glob module adds this feature to Python so that windows programs can use this feature.

Here is an example of using the glob module to query all Python code files in the current directory:

import glob

print(glob.glob('*.py'))
Copy code

glob.glob('*.py') searches the current directory for py extension and return them as a list. Glob also supports shell style wildcards for matching:

import glob

for name in glob.glob('*[0-9]*.txt'):
    print(name)
Copy code

This will find all text files (. txt) with numbers in the file names:

data_01.txt
data_01_backup.txt
data_02.txt
data_02_backup.txt
data_03.txt
data_03_backup.txt
 Copy code

glob also makes it easy to recursively search for files in subdirectories:

import glob

for name in glob.iglob('**/*.py', recursive=True):
    print(name)
Copy code

The example here uses glob iglob() searches the current directory and subdirectories for all py file. Pass recursive=True as The iglob() parameter causes it to search the current directory and subdirectories py file. glob.glob() and glob iglob() differs in that iglob() returns an iterator instead of a list.

Running the above code will get the following results:

admin.py
tests.py
sub_dir/file1.py
sub_dir/file2.py
 Copy code

pathlib also contains similar methods to flexibly obtain the file list. The following example shows what you can use Path.glob() lists a list of file types starting with the letter p.

from pathlib import Path

p = Path('.')

for name in p.glob('*.p*'):
    print(name)
Copy code

Calling p.glob('*.p *') returns a generator object pointing to all files in the current directory with extensions beginning with the letter P.

Path.glob() and os Glob() is similar. As you can see, pathlib mixes many os, os The best features of path and glob modules are integrated into one module, which makes it easy to use.

In retrospect, this is the menu we introduced in this section:

function	describe
startswith()	Tests whether a string starts with a specific pattern and returns True or False
endswith()	Tests whether a string ends in a specific pattern and returns True or False
fnmatch.fnmatch(filename, pattern)	Test whether the file name matches this pattern and return True or False
glob.glob()	Returns a list of file names that match the pattern
pathlib.Path.glob()	Returns a generator object that matches the pattern

Traversing directories and processing files

A common programming task is to traverse the directory tree and process the files in the directory tree. Let's explore how to use the built-in Python function os Walk() to do this. os.walk() is used to generate file names in the directory tree by traversing the tree from top to bottom or from bottom to top. For the purposes of this section, we want to operate the following directory tree:

├── folder_1
│   ├── file1.py
│   ├── file2.py
│   └── file3.py
├── folder_2
│   ├── file4.py
│   ├── file5.py
│   └── file6.py
├── test1.txt
└── test2.txt
 Copy code

The following is an example of how to use OS Walk() lists all files and directories in the directory tree.

os. By default, walk() traverses the directory from top to bottom:

import os
for dirpath, dirname, files in os.walk('.'):
   print(f'Found directory: {dirpath}')
   for file_name in files:
       print(file_name)
Copy code

os.walk() returns three values in each loop:

The name of the current folder
List of subfolders in the current folder
List of files in the current folder

In each iteration, the names of subdirectories and files it finds are printed:

Found directory: .
test1.txt
test2.txt
Found directory: ./folder_1
file1.py
file3.py
file2.py
Found directory: ./folder_2
file4.py
file5.py
file6.py
 Copy code

To traverse the directory tree from bottom to top, pass the topdown=False keyword parameter to OS walk() :

for dirpath, dirnames, files in os.walk('.', topdown=False):
    print(f'Found directory: {dirpath}')
    for file_name in files:
        print(file_name)
Copy code

Passing the topdown=False parameter will cause OS Walk() first prints out the files it finds in the subdirectory:

Found directory: ./folder_1
file1.py
file3.py
file2.py
Found directory: ./folder_2
file4.py
file5.py
file6.py
Found directory: .
test1.txt
test2.txt
 Copy code

As you can see, the program lists the contents of subdirectories before the contents of the root directory. This is useful when you want to delete files and directories recursively. You will learn how to do this in the following sections. By default, OS Walk does not access directories created through a soft connection. You can override the default behavior by using the followlinks = True parameter.

Create temporary files and directories

Python provides the tempfile module to easily create temporary files and directories.

tempfile can open and store temporary data in a file or directory when your program is running. tempfile will delete these temporary files after your program stops running.

Now let's see how to create a temporary file:

from tempfile import  TemporaryFile

# Create a temporary file and write some data to it
fp = TemporaryFile('w+t')
fp.write('Hello World!')
# Back to the beginning, read the data from the file
fp.seek(0)
data = fp.read()
print(data)
# Close the file and it will be deleted
fp.close()
Copy code

The first step is to import TemporaryFile from the tempfile module. Next, use the TemporaryFile() method and pass in a pattern that you want to open the file to create an object like file. This creates and opens a file that can be used as a temporary storage area.

In the above example, the mode is w + t, which causes tempfile to create a temporary text file in write mode. It is not necessary to provide a filename for the temporary file because it will be destroyed after the script runs.

After writing to the file, you can read from it and close it after processing. Once the file is closed, it is deleted from the file system. If you need to name a temporary file generated using tempfile, use tempfile NamedTemporaryFile() .

Temporary files and directories created using tempfile are stored in a special system directory for storing temporary files. Python searches the directory list for the directory in which users can create files.

On Windows, the directories are C:\TEMP, C:\TMP, \ TEMP, and \ TMP in order. On all other platforms, the directories are / tmp, / var/tmp, and / usr/tmp in order. If none of the above directories exist, tempfile will store temporary files and directories in the current directory.

. TemporaryFile() is also a context manager, so it can be used with the with statement. Using the context manager automatically closes and deletes files after reading them:

with TemporaryFile('w+t') as fp:
    fp.write('Hello universe!')
    fp.seek(0)
    fp.read()
# The temporary file has now been closed and deleted
 Copy code

This creates a temporary file and reads data from it. Once the contents of the file are read, the temporary file is closed and deleted from the file system.

Tempfile can also be used to create temporary directories. Let's see how to use tempfile Temporarydirectory() to do this:

import tempfile
import os

tmp = ''
with tempfile.TemporaryDirectory() as tmpdir:
    print('Created temporary directory ', tmpdir)
    tmp = tmpdir
    print(os.path.exists(tmpdir))

print(tmp)
print(os.path.exists(tmp))
Copy code

Call tempfile Temporarydirectory () creates a temporary directory in the file system and returns an object representing the directory. In the above example, a directory is created using the context manager, and the name of the directory is stored in the tmpdir variable. The third line prints out the name of the temporary directory, OS path. Exists (tmpdir) to confirm whether the directory was actually created in the file system.

After the context manager exits the context, the temporary directory will be deleted and the OS path. The call to exists (TMPDIR) will return False, which means that the directory has been successfully deleted.

Delete files and directories

You can delete individual files, directories, and the entire directory tree using the methods in the os, shutil, and pathlib modules. The following describes how to delete files and directories you no longer need.

Deleting files in Python

To delete a single file, use pathlib Path. unlink()，os.remove() or OS unlink().

os.remove() and OS Unlink () is semantically identical. To use OS Remove() to delete a file:

import os

data_file = 'C:\\Users\\vuyisile\\Desktop\\Test\\data.txt'
os.remove(data_file)
Copy code

Use OS Unlink() delete files and use OS The method of remove() is similar:

import os

data_file = 'C:\\Users\\vuyisile\\Desktop\\Test\\data.txt'
os.unlink(data_file)
Copy code

Called on a file unlink() or remove() deletes the file from the file system. If the path passed to them points to a directory instead of a file, these two functions throw OSError. To avoid this situation, you can check whether the content you want to delete is a file and delete it when it is confirmed to be a file, or you can use exception handling to handle OSError:

import os

data_file = 'home/data.txt'
# Delete if the type is file
if os.path.is_file(data_file):
    os.remove(data_file)
else:
    print(f'Error: {data_file} not a valid filename')
Copy code

os.path.is_file() check data_ Whether file is actually a file. If so, by calling OS Remove() deletes it. If data_ If file points to a folder, an error message is output to the console.

The following example shows how to use exception handling to handle errors when deleting files:

import os

data_file = 'home/data.txt'
# Using exception handling
try:
    os.remove(data_file)
except OSError as e:
    print(f'Error: {data_file} : {e.strerror}')
Copy code

The above code attempts to delete the file before checking its type. If data_ If file is not actually a file, the OSError thrown will be processed in the except clause and an error message will be output to the console. Printed error messages use Python f-strings format.

Finally, you can also use pathlib Path. Unlink() delete file:

from pathlib import Path

data_file = Path('home/data.txt')
try:
    data_file.unlink()
except IsADirectoryError as e:
    print(f'Error: {data_file} : {e.strerror}')
Copy code

This creates a file named data_file, which points to a file. In data_ Called on file unlink() will delete home / data txt . If data_ If file points to a directory, IsADirectoryError is raised. It is worth noting that the Python program above has the same permissions as the user running it. PermissionError is raised if the user does not have permission to delete the file.

Delete directory

The standard library provides the following functions to delete directories:

os.rmdir()
pathlib.Path.rmdir()
shutil.rmtree()

To delete a single directory or folder, you can use OS Rmdir() or pathlib Path. rmdir() . These two functions are only valid when you delete an empty directory. If the directory is not empty, an OSError will be thrown. The following shows how to delete a folder:

import os

trash_dir = 'my_documents/bad_dir'

try:
    os.rmdir(trash_dir)
except OSError as e:
    print(f'Error: {trash_dir} : {e.strerror}')
Copy code

Now, trash_dir has passed OS Rmdir() was deleted. If the directory is not empty, an error message will be printed on the screen:

Traceback (most recent call last):
  File '<stdin>', line 1, in <module>
OSError: [Errno 39] Directory not empty: 'my_documents/bad_dir'
Copy code

Similarly, you can use pathlib to delete directories:

from pathlib import Path

trash_dir = Path('my_documents/bad_dir')

try:
    trash_dir.rmdir()
except OSError as e:
    print(f'Error: {trash_dir} : {e.strerror}')
Copy code

A Path object is created to point to the directory to be deleted. If the directory is empty, the Path object is called The rmdir() method deletes it.

Delete the complete directory tree

To delete non empty directories and the complete directory tree, Python provides shutil rmtree() :

import shutil

trash_dir = 'my_documents/bad_dir'

try:
    shutil.rmtree(trash_dir)
except OSError as e:
    print(f'Error: {trash_dir} : {e.strerror}')
Copy code

When calling shutil When rmtree(), trash_ All contents in dir will be deleted. In some cases, you may want to delete empty folders recursively. You can use one of the methods discussed above to combine OS Walk() to do this:

import os

for dirpath, dirnames, files in os.walk('.', topdown=False):
    try:
        os.rmdir(dirpath)
    except OSError as ex:
        pass
 Copy code

This will traverse the directory tree and try to delete every directory it finds. If the directory is not empty, an OSError is raised and the directory is skipped. The following table lists the functions covered in this section:

function	describe
os.remove()	Delete a single file, not a directory
os.unlink()	And OS Like remove (), the function deletes a single file
pathlib.Path.unlink()	Delete a single file, not a directory
os.rmdir()	Delete an empty directory
pathlib.Path.rmdir()	Delete an empty directory
shutil.rmtree()	Delete the complete directory tree, which can be used to delete non empty directories

Copy, move, and rename files and directories

Python comes with the shutil module. Shutil is an abbreviation for shell utility. It provides many advanced operations for files to support the replication, archiving and deletion of files and directories. In this section, you will learn how to move and copy files and directories.

Copy file

Shutil provides some functions to copy files. The most commonly used function is shutil Copy () and shutil copy2() . Use shutil Copy() to copy files from one location to another:

import shutil

src = 'path/to/file.txt'
dst = 'path/to/dest_dir'
shutil.copy(src, dst)
Copy code

shutil.copy() is equivalent to the cp command in UNIX based systems. shutil.copy(src, dst) copies the file src to the location specified in dst. If dst is a file, the contents of the file are replaced with the contents of src. If dst is a directory, src will be copied to that directory. shutil.copy() copies only the contents of the file and the permissions of the file. Other metadata, such as when files were created and modified, are not retained.

To preserve all file metadata when copying, use shutil copy2() :

import shutil

src = 'path/to/file.txt'
dst = 'path/to/dest_dir'
shutil.copy2(src, dst)
Copy code

use. copy2() keeps detailed information about the file, such as last access time, permission bit, last modification time and flag.

duplicate catalog

Although shutil Copy () copies only a single file, but shutil Copytree () copies the entire directory and everything it contains. shutil.copytree(src, dest) receives two parameters: the source directory and the destination directory to which files and folders are copied.

The following is an example of how to copy the contents of a folder to another location:

import shutil
dst = shutil.copytree('data_1', 'data1_backup')
print(dst)  # data1_backup
 Copy code

In this example copytree() sets data_ Copy the contents of data1 to the new location data1_backup and return to the destination directory. The destination directory cannot be an existing directory. It will be created without its parent directory. shutil.copytree() is a good way to back up files.

Move files and directories

To move a file or directory to another location, use shutil move(src，dst) .

src is the file or directory to be moved, and dst is the target:

import shutil
dst = shutil.move('dir_1/', 'backup/')
print(dst)  # 'backup'
Copy code

If backup / exists, shutil Move ('dir_1 / ','backup /') will dir_ 1 / move to backup /. If backup / does not exist, dir_ 1 / rename to backup.

Rename files and directories

Python contains os.exe for renaming files and directories rename(src，dst):

import os
os.rename('first.zip', 'first_01.zip')
Copy code

The line above will be first Rename zip to first_01.zip . If the target path points to a directory, an OSError is thrown.

Another way to rename a file or directory is to use rename() in the pathlib module:

from pathlib import Path
data_file = Path('data_01.txt')
data_file.rename('data.txt')
Copy code

To rename a file using pathlib, first create a pathlib Path() object that contains the path to the file to replace. The next step is to call rename() on the path object and pass in the new name of the file or directory you want to rename.

file

Archiving is a convenient way to package multiple files into one file. The two most common archive types are ZIP and TAR. The Python program you write can create archive files, read archive files and extract data from archive files. In this section, you will learn how to read and write two compressed formats.

Read ZIP file

The zipfile module is an underlying module that is part of the Python standard library. Zipfile has functions that can easily open and extract ZIP files. To read the contents of the ZIP file, the first thing to do is to create a zipfile object. The zipfile object is similar to a file object created using open(). Zipfile is also a context manager, so it supports the with statement:

import zipfile

with zipfile.ZipFile('data.zip', 'r') as zipobj:
    pass
 Copy code

Here, create a zipfile object, pass in the name of the ZIP file and open it in read mode. After opening the ZIP file, you can access information about the archive file through the functions provided by the zipfile module. Data in the above example The ZIP archive is created from a directory named data, which contains a total of 5 files and 1 subdirectory:

.
|
├── sub_dir/
|   ├── bar.py
|   └── foo.py
|
├── file1.py
├── file2.py
└── file3.py
 Copy code

To get a list of files in the archive file, call namelist() on the ZipFile object:

import zipfile

with zipfile.ZipFile('data.zip', 'r') as zipobj:
    zipobj.namelist()
Copy code

This generates a list of files:

['file1.py', 'file2.py', 'file3.py', 'sub_dir/', 'sub_dir/bar.py', 'sub_dir/foo.py']
Copy code

. namelist() returns a list of names of files and directories in the archive file. To retrieve information about files in an archive, use getinfo() :

import zipfile

with zipfile.ZipFile('data.zip', 'r') as zipobj:
    bar_info = zipobj.getinfo('sub_dir/bar.py')
    print(bar_info.file_size)
Copy code

This will output:

15277
 Copy code

. getinfo() returns a ZipInfo object that stores information about a single member of the archive file. To get information about the files in the archive, pass their path as a parameter to getinfo() . Using getinfo (), you can retrieve information about the members of the archive file, such as the date the file was last modified, the compressed size, and its full file name. visit. file_size retrieves the original size of the file in bytes.

The following example shows how to retrieve more details about archived files in Python REPL. Assuming that the zipfile module has been imported, bar_info is the same as the object created in the previous example:

>>> bar_info.date_time
(2018, 10, 7, 23, 30, 10)
>>> bar_info.compress_size
2856
>>> bar_info.filename
'sub_dir/bar.py'
Copy code

bar_info contains information about bar Py, such as the compressed size and its full path.

The first line shows how to retrieve the last modified date of the file. The next line shows how to get the file size after archiving. The last line shows the bar in the archive file Py.

ZipFile supports the context manager protocol, which is why you can use it with the with statement. The ZipFile object is automatically closed after the operation is completed. Attempting to open or extract a file from a closed ZipFile object will result in an error.

Extract ZIP file

The zipfile module allows you to pass through extract() and extractall() extracts one or more files from the ZIP file.

By default, these methods extract files to the current directory. They all take optional path parameters that allow you to specify other specified directories to extract files to. If the directory does not exist, it is automatically created. To extract a file from a compressed file:

>>> import zipfile
>>> import os

>>> os.listdir('.')
['data.zip']

>>> data_zip = zipfile.ZipFile('data.zip', 'r')

>>> # Extract a single file to the current directory
>>> data_zip.extract('file1.py')
'/home/test/dir1/zip_extract/file1.py'

>>> os.listdir('.')
['file1.py', 'data.zip']

>>> # Fetch all files to the specified directory
>>> data_zip.extractall(path='extract_dir/')

>>> os.listdir('.')
['file1.py', 'extract_dir', 'data.zip']

>>> os.listdir('extract_dir')
['file1.py', 'file3.py', 'file2.py', 'sub_dir']

>>> data_zip.close()
Copy code

The third line of code is for OS Listdir(), which shows that there is only one file data in the current directory zip .

Next, open data in read mode Zip and call extract() extracts file1 py . . extract() returns the full file path of the extracted file. Because no path was specified extract() will file1 Py extract to the current directory.

The next line prints a list of directories showing that the current directory now includes archive files in addition to the original archive files. Then it shows how to extract the entire archive to the specified directory extractall() creates an extract_dir and add data Extract the contents of ZIP into it. The last line closes the ZIP archive.

Extract data from encrypted documents

zipfile supports extracting password protected zips. To extract a password protected ZIP file, pass the password as a parameter to extract() or extractall() method:

>>> import zipfile

>>> with zipfile.ZipFile('secret.zip', 'r') as pwd_zip:
...     # Extract data from encrypted documents
...     pwd_zip.extractall(path='extract_dir', pwd='Quish3@o')
Copy code

Secret. Will be opened in read mode Zip archive. The password is provided to extractall(), and the contents of the compressed file are extracted into extract_dir . Due to the with statement, the archive is automatically closed after extraction.

Create a new archive file

To create a new ZIP archive, open the ZipFile object in write mode (w) and add the file to archive:

>>> import zipfile

>>> file_list = ['file1.py', 'sub_dir/', 'sub_dir/bar.py', 'sub_dir/foo.py']
>>> with zipfile.ZipFile('new.zip', 'w') as new_zip:
...     for name in file_list:
...         new_zip.write(name)
Copy code

In this example, new_zip opens in write mode, file_ Each file in the list is added to the archive file. When the with statement ends, new is closed_ ZIP . Opening a ZIP file in write mode deletes the contents of the compressed file and creates a new archive file.

To add a file to an existing archive file, open the ZipFile object in append mode and add the file:

>>> with zipfile.ZipFile('new.zip', 'a') as new_zip:
...     new_zip.write('data.txt')
...     new_zip.write('latin.txt')
Copy code

Here you open the new. XML file created in append mode in the previous example ZIP archive. Opening the ZipFile object in append mode allows you to add a new file to the ZIP file without deleting its current contents. After the file is added to the ZIP file, the with statement leaves the context and closes the ZIP file.

Open TAR Archive

TAR files are uncompressed file archives such as ZIP. They can be compressed using gzip, bzip2, and lzma compression methods. The TarFile class allows reading and writing TAR archives.

Here's how to read from the Archive:

import tarfile

with tarfile.open('example.tar', 'r') as tar_file:
    print(tar_file.getnames())
Copy code

The tarfile object opens like most file like objects. They have an open() function that uses a pattern to determine how the file is opened.

Use "r", "w" or "a" mode to open uncompressed TAR files for reading, writing and appending respectively. To open the compressed TAR file, pass the mode parameter to tarfile Open(), with the format of filemode [:compression]. The following table lists the possible modes in which TAR files can be opened:

pattern	behavior
r	Open archive in uncompressed read mode
r:gz	Open the archive in gzip compressed read mode
r:bz2	Open the archive in bzip2 compressed read mode
w	Open archive in uncompressed write mode
w:gz	Open the archive in gzip compressed write mode
w:xz	Open archive in lzma compressed write mode
a	Open archive in uncompressed append mode

. open() defaults to 'r' mode. To read an uncompressed TAR file and retrieve its file name, use getnames() :

>>> import tarfile

>>> tar = tarfile.open('example.tar', mode='r')
>>> tar.getnames()
['CONTRIBUTING.rst', 'README.md', 'app.py']
Copy code

This returns the name of the content in the archive as a list.

Note: to show you how to use different tarfile object methods, the TAR file in the example is manually opened and closed in an interactive REPL session. By interacting with the TAR file in this way, you can view the output of running each command. In general, you may want to use the context manager to open file like objects.

In addition, you can access metadata for each entry in the archive using special properties:

>>> for entry in tar.getmembers():
...     print(entry.name)
...     print(' Modified:', time.ctime(entry.mtime))
...     print(' Size    :', entry.size, 'bytes')
...     print()
CONTRIBUTING.rst
 Modified: Sat Nov  1 09:09:51 2018
 Size    : 402 bytes

README.md
 Modified: Sat Nov  3 07:29:40 2018
 Size    : 5426 bytes

app.py
 Modified: Sat Nov  3 07:29:13 2018
 Size    : 6218 bytes
 Copy code

In this example, loop through getmembers() returns a list of files and prints out the properties of each file The object returned by getmembers() has properties that can be accessed programmatically, such as the name, size and last modification time of each file in the archive. After reading or writing to the archive, it must be closed to free up system resources.

Extract files from TAR Archive

In this section, you will learn how to extract files from the TAR archive using the following methods:

.extract()
.extractfile()
.extractall()

To extract a single file from the TAR archive, use extract() to pass in the file name:

>>> tar.extract('README.md')
>>> os.listdir('.')
['README.md', 'example.tar']
Copy code

README.md files are extracted from the archive to the file system. Call OS Listdir() confirms readme The MD file was successfully extracted into the current directory. To extract or extract everything from the archive, use extractall() :

>>> tar.extractall(path="extracted/")
Copy code

. extractall() has an optional path parameter to specify the destination of the extracted file. Here, the archive is extracted into the extracted directory. The following command shows that the archive was successfully extracted:

$ ls
example.tar  extracted  README.md

$ tree
.
├── example.tar
├── extracted
|   ├── app.py
|   ├── CONTRIBUTING.rst
|   └── README.md
└── README.md

1 directory, 5 files

$ ls extracted/
app.py  CONTRIBUTING.rst  README.md
 Copy code

To extract a file object for reading or writing, use extractfile(), which takes a filename or TarInfo object as a parameter extractfile() returns a class file object that can be read and used:

>>> f = tar.extractfile('app.py')
>>> f.read()
>>> tar.close()
Copy code

Open archives should always be closed after reading or writing. To close the archive, call on the archive file handle close(), or use the with statement when creating the tarfile object to automatically close the archive after completion. This frees up system resources and writes any changes you make to the archive to the file system.

Create a new TAR Archive

To create a new TAR archive, you can:

>>> import tarfile

>>> file_list = ['app.py', 'config.py', 'CONTRIBUTORS.md', 'tests.py']
>>> with tarfile.open('packages.tar', mode='w') as tar:
...     for file in file_list:
...         tar.add(file)

>>> # Read the contents of the newly created archive
>>> with tarfile.open('package.tar', mode='r') as t:
...     for member in t.getmembers():
...         print(member.name)
app.py
config.py
CONTRIBUTORS.md
tests.py
 Copy code

First, you need to create a list of files to add to the archive so that you don't have to add each file manually.

The next line opens in write mode using the with file manager called packages New archive of tar. Opening the archive in write mode ('w ') allows you to write new files to the archive. All existing files in the archive are deleted and a new archive is created.

When an archive is created and populated, the with context manager automatically closes it and saves it to the file system. The last three lines open the archive you just created and print out the name of the file it contains.

To add a new file to an existing archive, open the archive in append mode ('a '):

>>> with tarfile.open('package.tar', mode='a') as tar:
...     tar.add('foo.bar')

>>> with tarfile.open('package.tar', mode='r') as tar:
...     for member in tar.getmembers():
...         print(member.name)
app.py
config.py
CONTRIBUTORS.md
tests.py
foo.bar
 Copy code

Opening the archive in append mode allows you to add new files to it without deleting existing files.

Use compressed archive

Tarfile can read and write TAR archive files compressed using gzip, bzip2, and lzma. To read or write to a compressed archive, use tarfile Open(), passing the appropriate schema for the compression type.

For example, to read or write data from TAR archives compressed using gzip, use the 'r:gz' or 'w:gz' modes, respectively:

>>> files = ['app.py', 'config.py', 'tests.py']
>>> with tarfile.open('packages.tar.gz', mode='w:gz') as tar:
...     tar.add('app.py')
...     tar.add('config.py')
...     tar.add('tests.py')

>>> with tarfile.open('packages.tar.gz', mode='r:gz') as t:
...     for member in t.getmembers():
...         print(member.name)
app.py
config.py
tests.py
 Copy code

'w:gz' opens the gzip compressed archive in write mode, 'r:gz' opens the gzip compressed archive in read mode. Cannot open compressed archive in append mode. To add files to a compressed archive, you must create a new archive.

An easier way to create an archive

The Python standard library also supports the creation of TAR and ZIP archives using the advanced methods in the shutil module. The archive utility in shutil allows you to create, read, and extract ZIP and TAR archives. These utilities rely on the lower level tarfile and zipfile modules.

Use shutil make_ Archive() creates an archive

shutil.make_archive() accepts at least two parameters: the name of the archive and the archive format.

By default, it compresses all files in the current directory to the archive format specified in the format parameter. You can pass in an optional root_dir parameter to compress files in different directories make_archive() supports zip, tar, bztar and gztar archive formats.

The following is how to create a TAR archive using shutil:

import shutil

# shutil.make_archive(base_name, format, root_dir)
shutil.make_archive('data/backup', 'tar', 'data/')
Copy code

This copies everything in data / and creates a file system called backup Tar and returns its name. To extract the archive, call unpack_archive() :

shutil.unpack_archive('backup.tar', 'extract_dir/')
Copy code

Call unpack_archive() and pass in the archive name and target directory to backup Extract the contents of tar into extract_dir /. ZIP archives can be created and extracted in the same way.

Read multiple files

Python supports reading data from multiple input streams or file lists through the fileinput module. This module allows you to cycle through the contents of one or more text files quickly and easily. The following is a typical way to use fileinput:

import fileinput
for line in fileinput.input()
    process(line)
Copy code

By default, fileinput is passed from to sys Argv's command line parameter gets its input.

Use the fileinput loop to traverse multiple files

Let's use fileinput to build the original version of a common UNIX tool cat. The cat tool reads the files sequentially and writes them to standard output. When multiple files are given in the command line parameters, cat will connect the text files and display the results in the terminal:

# File: fileinput-example.py
import fileinput
import sys

files = fileinput.input()
for line in files:
    if fileinput.isfirstline():
        print(f'\n--- Reading {fileinput.filename()} ---')
    print(' -> ' + line, end='')
print()
Copy code

There are two text files in the current directory. Running this command will produce the following output:

$ python3 fileinput-example.py bacon.txt cupcake.txt
--- Reading bacon.txt ---
 -> Spicy jalapeno bacon ipsum dolor amet in in aute est qui enim aliquip,
 -> irure cillum drumstick elit.
 -> Doner jowl shank ea exercitation landjaeger incididunt ut porchetta.
 -> Tenderloin bacon aliquip cupidatat chicken chuck quis anim et swine.
 -> Tri-tip doner kevin cillum ham veniam cow hamburger.
 -> Turkey pork loin cupidatat filet mignon capicola brisket cupim ad in.
 -> Ball tip dolor do magna laboris nisi pancetta nostrud doner.

--- Reading cupcake.txt ---
 -> Cupcake ipsum dolor sit amet candy I love cheesecake fruitcake.
 -> Topping muffin cotton candy.
 -> Gummies macaroon jujubes jelly beans marzipan.
Copy code

fileinput allows you to retrieve more information about each line, such as whether it is the first line (. isfirstline()), line number (. lineno()), and file name (. filename()).

summary

You now know how to perform the most common operations on files and filegroups using Python. You already know how to use different built-in modules to read, find and manipulate files.

Programmer Think