Python 3 uses the Pathlib module for file operations

Posted by sc00tz on Thu, 10 Feb 2022 07:22:47 +0100

Python 3 uses the Pathlib module for file operations

Refer to official documents for more information pathlib -- object oriented file system path

In this tutorial, you will learn how to use the pathlib module to manipulate the names of directories and files. Learn how to read and write files, new ways to splice paths and manipulate the underlying file system, and some examples of how to list files and iterate over them. Most people use the os module most to process files, such as the following operations

>>> path.rsplit('\\', maxsplit=1)[0]

Or write the following long code

>>> os.path.isfile(os.path.join(os.path.expanduser('~'), 'realpython.txt'))

Using the pathlib module, you can rewrite the above two examples with elegant, readable and Python code, such as:

>>> path.parent
>>> (pathlib.Path.home() / 'realpython.txt').is_file()

Python file path processing problem

Using files and interacting with file systems are important for many different reasons. The simplest case may only involve reading or writing files, but sometimes there are more complex tasks. You may need to list all files in a given type of directory, find the parent directory of a given file, or create a unique file name that does not yet exist. In general, Python uses regular text strings to represent file paths. Generally, the operation of path splicing is used when using os,glob, shutil and other libraries. Splicing using os modules is slightly complex. The following example only needs three import statements to move all text files to the archive Directory:

import glob
import os
import shutil

for file_name in glob.glob('*.txt'):
    new_path = os.path.join('archive', file_name)
    shutil.move(file_name, new_path)

It is possible to use a conventional string to splice paths, but because different operating systems use different separators, it is prone to problems, so we usually use OS Path. join (). Python 3.4 introduces the pathlib module (PEP 428) to optimize the splicing of paths again. Using the Path method of the pathlib library, you can convert an ordinary string to pathlib The Path of the Path object type. In the early days, other software packages still used strings as file paths, but since Python 3.6, the pathlib module has been supported in the whole standard library, partly due to the addition of file system Path protocol. If you stick to traditional python, Python 2 also has a backward migration available. ok, having said so much, let's see how pathlib works in practice.

Create path

Here we first need to know two usages. Let's look at the code first:

from pathlib import Path

What you really need to know is pathlib Path class. There are several different ways to create paths. First, there are class methods, such as cwd (current working directory) and Home (user's home directory):

from pathlib import Path

now_path = Path.cwd()
home_path = Path.home()

print("Current working directory",now_path,type(now_path))
print("home catalogue",home_path,type(home_path))

Output content

Current working directory /Users/chennan/pythonproject/demo <class 'pathlib.PosixPath'>
home catalogue /Users/chennan <class 'pathlib.PosixPath'>

It can be found that the path format is pathlib Posixpath this is the display under unix system. The format displayed on different systems is also different. It will be displayed as WindowsPath on windows system. However, no matter what display type, it does not affect the following operations. As mentioned earlier, you can convert the path of string type to pathlib The path of path type is found in Python 3.0 after testing After 4, many modules and paths supporting this format. No need to convert to string. Compared to OS path. Pathlib is more convenient to use by joining and splicing paths. Examples are as follows:

import pathlib
DIR_PATH = pathlib.Path("/Users/chennan/CDM")
print(DIR_PATH,type(DIR_PATH))

Output content:

/Users/chennan/CDM <class 'pathlib.PosixPath'>

Through "/", we can splice paths. How about it? Is it convenient.

Read and write files

When we use open to read and write files, we can not only use the path in string format, but also directly use the path generated by pathlib:

path = pathlib.Path.cwd() / 'test.md'
with open(path, mode='r') as fid:
    headers = [line.strip() for line in fid if line.startswith('#')]
print('\n'.join(headers))

We recommend using lib or the following basic path

import pathlib
DIR_PATH = pathlib.Path("/Users/chennan/CDM") / "2000" / "hehe.txt"
with DIR_PATH.open("r") as fs:
     data = fs.read() 
print(data)

The advantage of this writing is that we don't need to go to the incoming path in open. We can directly specify the file read-write mode. In fact, the bottom layer of the open method here also calls OS Open method. Which way to use depends on your preferences. pathlib also provides several ways to read and write files: you can read and write without using the form of with open.

.read_text(): Find the corresponding path, then open the file and read it as str Format. equivalent open Operation file"r"Format.
.read_bytes(): The way the byte stream is read. equivalent open Operation file"rb"Format.
.write_text(): File write operation, equivalent open Operation file"w"Format.
.write_bytes(): File write operation, equivalent open Operation file"wb"Format.

Use resolve to return the full path of the file by passing in the file name. The usage is as follows

import pathlib
py_path =pathlib.Path("superdemo.py")
print(py_path.resolve())

output

/Users/chennan/pythonproject/demo/superdemo.py

It should be noted that the "superdemo.py" file should be in the same directory as my current program file.

Select the different components of the path

pathlib also provides many attributes of path operation, which can select different parts of the path, such as Name: you can get the name of the file, including the extended name Parent: returns the name of the parent folder stem: the obtained file name does not contain the extended name suffix: get the extended name of the file anchor: something similar to a drive letter,

import pathlib

now_path = pathlib.Path.cwd() / "demo.txt"
print("name",now_path.name)
print("stem",now_path.stem)
print("suffix",now_path.suffix)
print("parent",now_path.parent)
print("anchor",now_path.anchor)

The output is as follows

name demo.txt
stem demo
suffix .txt
parent /Users/chennan/pythonproject/demo
anchor /

Move and delete files

Of course, pathlib can also support other file operations, such as moving, updating, or even deleting files, but be careful when using these methods, because there is no error prompt in the use process, and there will be no waiting even if the file does not exist. Use the replace method to move the file and overwrite it if it exists. To avoid possible overwriting of files, the easiest way is to test whether the target exists before replacement.

import pathlib

destination = pathlib.Path.cwd() / "target" 
source = pathlib.Path.cwd() / "demo.txt"
if not destination.exists():
    source.replace(destination)

However, the problem with the above method is that it will occur when multiple processes operate in multiple destination s. You can use the following method to avoid this problem. In other words, the above method is suitable for the operation of a single file.

import pathlib

destination = pathlib.Path.cwd() / "target" 
source = pathlib.Path.cwd() / "demo.txt"
with destination.open(mode='xb') as fid: 
    #xb indicates that the file does not exist before operation
    fid.write(source.read_bytes())

When the destination file exists, the FileExistsError exception will appear in the above code. Technically, this copies a file. To perform a move, simply delete the source after the copy is complete. Use with_name and with Shufflix can change the file name or suffix.

import pathlib
source = pathlib.Path.cwd() / "demo.py"
source.replace(source.with_suffix(".txt")) #Modify the suffix and move the file, that is, rename

Can be used rmdir() and unlink() to delete the file.

import pathlib

destination = pathlib.Path.cwd() / "target" 
source = pathlib.Path.cwd() / "demo.txt"
source.unlink()

Several examples of using pathlib

Number of statistical files

We can use it The iterdir method gets all the files under the current file

import pathlib
from collections import Counter
now_path = pathlib.Path.cwd()
gen = (i.suffix for i in now_path.iterdir())
print(Counter(gen))

Output content

Counter({'.py': 16, '': 11, '.txt': 1, '.png': 1, '.csv': 1})

By using the Counter method of the collections module, we get the file type under the folder. As we mentioned earlier, the glob module can be found here[ https://www.cnblogs.com/c-x-a/p/9261832.html ]Similarly, pathlib also has glob methods and rglob methods. The difference is that the results of glob methods in glob module are in list form. iglob is the generator type. Here, the glob module of pathlib returns the generator type, and then pathlib also has an rglob method that supports recursive operation. In the following operation, I set rules to match files by using the glob method.

import pathlib
from  collections import Counter
gen =(p.suffix for p in pathlib.Path.cwd().glob('*.py'))
print(Counter(gen))

Display directory tree

The next example defines a function tree (), which prints a visual tree representing the file hierarchy with a given directory as the root. Because we want to list its subdirectories, we want to use rglob() method:

import pathlib
from  collections import Counter
def tree(directory):
    print(f'+ {directory}')
    for path in sorted(directory.rglob('*')):
        depth = len(path.relative_to(directory).parts)
        spacer = '    ' * depth
        print(f'{spacer}+ {path.name}')

now_path = pathlib.Path.cwd()

if __name__ == '__main__':
    tree(now_path)

Where relative_ The function of the to method is to return the path of path relative to directory. The parts method can return parts of the path. for example

import pathlib
now_path = pathlib.Path.cwd()
if __name__ == '__main__':
    print(now_path.parts)

return

('/', 'Users', 'chennan', 'pythonproject', 'demo')

Get the last modification time of the file

iterdir (),.glob () and The rglob () method is well suited for generator expressions and list understanding. Using stat () method, you can get some basic information of the file stat ().st_mtime can obtain the information of the last modification of the file

import pathlib
now_path = pathlib.Path.cwd()
from datetime import datetime
time, file_path = max((f.stat().st_mtime, f) for f in now_path.iterdir())
print(datetime.fromtimestamp(time), file_path)

You can even use a similar expression to get the last modified file content

import pathlib
from datetime import datetime
now_path =pathlib.Path.cwd()
result = max((f.stat().st_mtime, f) for f in now_path.iterdir())[1]
print(result.read_text())

.stat ().st_mtime will return the time stamp of the file. You can use datetime or time module to further convert the time format.

Other contents

About pathlib Path format path converted to string type

Because the path generated through the pathlib module operation cannot be directly applied to some operations of the string, it needs to be converted into a string. Although the str() function can be used for conversion, the security is not high. It is recommended to use OS Fspath () method, because if the path format is illegal, an exception can be thrown. Str () cannot do this.

The secret behind the mosaic symbol "/"

/The operator is defined by the truediv method. In fact, if you look at the source code of pathlib, you will see something similar.

class PurePath(object):

    def __truediv__(self, key):
        return self._make_child((key,))

Postscript

Starting with Python 3.4, pathlib has been available in the standard library. Using pathlib, the file Path can be represented by an appropriate Path object instead of a pure string as before. These objects enable code to process file paths:

  • Easier to read, especially if you can use "/" to connect paths together
  • More powerful, providing the most necessary methods and properties directly on the object
  • It is more consistent in the operating system because the Path object hides the characteristics of different systems

In this tutorial, you have learned how to create Path objects, read and write files, operate paths and the underlying file system, and how to traverse multiple file paths. Finally, I suggest you continue to practice more. I have verified the code in the article and there will be no running errors———————————————————————————————————————————— Original text: https://realpython.com/python-pathlib/ Translator: Chen Xiangan [gallery ids= 6600] more interesting content, please pay attention to WeChat official account: Python learning and development.

Related articles