Top ten file operations in Python office automation

Posted by gls2ro on Sun, 05 Dec 2021 09:14:57 +0100

There are many daily requirements for batch processing of files. It is very convenient to write scripts in Python, but it is inevitable to deal with files in this process. There will be many file operations for the first time, so you can only find Du Niang.

up has sorted out 10 file operations most commonly used in Python, which are implemented by built-in functions. They are used both in batch processing and reading files. I believe this sorting is helpful to you.

1. Display current directory

When we want to know what the current working directory is, we can simply use the getcwd() function of the os module or the cwd() function of pathlib, as shown below.

# The first method: display the current directory
... import os
... print("Current working directory:", os.getcwd())
...
Current Work Directory: /Users/ycui1/PycharmProjects/Medium_Python_Tutorials



# The second method: or we can use pathlib
... from pathlib import Path
... print("Current working directory:", Path.cwd())
...
Current Work Directory: /Users/ycui1/PycharmProjects/Medium_Python_Tutorials

 #If you are using an older version of Python (< 3.4), you must use this os module.

2. Create a new directory

To create a directory, you can use the mkdir() function of the os module. This function will create a directory under the specified path. If only the directory name is used, a folder will be created in the current directory, that is, the concepts of absolute path and relative path.

# Create a new directory in the current folder
... os.mkdir("test_folder")
... print("Does the directory exist:", os.path.exists("test_folder"))
...
# Create a new directory in a specific folder
... os.mkdir('/Users/ycui1/PycharmProjects/tmp_folder')
... print("Does the directory exist:", os.path.exists('/Users/ycui1/PycharmProjects/tmp_folder'))
...
Does the directory exist: True

However, if you want to create a multi-level directory, such as a folder under a folder, you need to use this makedirs() function.

>>> # Create a directory that contains subdirectories
... os.makedirs('tmp_level0/tmp_level1')
... print("Does the directory exist:", os.path.exists("tmp_level0/tmp_level1"))
...
Is the directory there: True

If you use the latest version of Python (≥ 3.4), you can consider using the pathlib module to create a new directory. It can not only create subdirectories, but also handle all missing directories in the path.

# Using pathlib
from pathlib import Path
Path("test_folder").mkdir(parents=True, exist_ok=True)

You should pay attention to a problem. If you try to run some of the above code multiple times, you may encounter the problem "unable to create an existing new directory". We can put exist_ The OK parameter is set to True to handle this problem (the default value of False will prevent us from creating the directory).

# Using pathlib
... from pathlib import Path
... Path("test_folder").mkdir(parents=True, exist_ok=False)
...
Traceback (most recent call last):
  File "<input>", line 3, in <module>
  File "/Users/ycui1/.conda/envs/Medium/lib/python3.8/pathlib.py", line 1284, in mkdir
    self._accessor.mkdir(self, mode)
FileExistsError: [Errno 17] File exists: 'test_folder'

3. Delete directories and files

After we finish working on some files or folders, we may want to delete them. To do this, we can use the remove() function in the os module to delete files. If you want to delete a folder, we should use rmdir() instead

 # Delete a file
... print(f"* Before deleting files {os.path.isfile('tmp.txt')}")
... os.remove('tmp.txt')
... print(f"* After deleting a file {os.path.exists('tmp.txt')}")
...
* Before deleting files True
* After deleting a file False
# Delete a folder
... print(f"* Before deleting a folder {os.path.isdir('tmp_folder')}")
... os.rmdir('tmp_folder')
... print(f"* After deleting a folder {os.path.exists('tmp_folder')}")
...
* Before deleting a folder True
* After deleting a folder False

If you use the pathlib module, you can use the unlink() method, and you can use the rmdir() method to delete the directory

4. Get file list

When we analyze a work or machine learning project for data processing, we need to obtain the list of files in a specific directory.

Typically, file names have matching patterns. Suppose we want to find all. txt files in the directory, which can be implemented by using the method glob() of the Path object. The glob() method creates a generator that allows us to iterate.

>>> txt_files = list(Path('.').glob("*.txt"))
... print("Txt files:", txt_files)
...
Txt files: [PosixPath('hello_world.txt'), PosixPath('hello.txt')]

In addition, it is also convenient to directly use the glob module. As shown below, it has similar functions by creating a list of file names that can be used. In most cases, such as file reading and writing, both can be used.

 from glob import glob
... files = list(glob('h*'))
... print("with h Start file:", files)
...
Files starting with h: ['hello_world.txt', 'hello.txt']

5. Move and copy files

move file

One of the general file management tasks is to move and copy files. In Python, this work can be done very easily. To move a file, simply replace its old directory with the target directory to rename the file. Suppose we need to move all. txt files to another folder, which is implemented by Path below.

>>> target_folder = Path("Target file")
... target_folder.mkdir(parents=True,exist_ok=True)
... source_folder = Path('.')
...
... txt_files = source_folder.glob('*.txt')
... for txt_file in txt_files:
...     filename = txt_file.name
...     target_path = target_folder.joinpath(filename)
...     print(f"** move file {filename}")
...     print("Target file exists:", target_path.exists())
...     txt_file.rename(target_path)
...     print("Target file exists:", target_path.exists(), '\n')
...
** move file hello_world.txt
 Target file exists: False
 Target file exists: True

** move file hello.txt
 Target file exists: False
 Target file exists: True

Copy file

We can use_ shutil_ Functions available in the module_ shutil_ Module is another useful module in the standard library for file operations. We can use copy() in the module by specifying the source and target files as strings. A simple example is shown below. Of course, you can use the copy() function in conjunction with the glob() function to handle a pile of files with the same pattern.

import shutil
...
... source_file = "target_folder/hello.txt"
... target_file = "hello2.txt"
... target_file_path = Path(target_file)
... print("* The file does not exist before copying:", target_file_path.exists())
... shutil.copy(source_file, target_file)
... print("* After copying, the file does not exist:", target_file_path.exists())
...
* The file does not exist before copying: False
* After copying, the file does not exist: True

6. Check contents / files

In the above example, the exists() method has been used to check whether a specific path exists. Returns True if it exists; False if it does not exist. This function is available in both os and pathlib modules. Their usage is as follows.

# Usage of exists() in os module
os.path.exists('path_to_check')

# Usage of exists() in pathlib module
Path('directory_path').exists()


#Using pathlib, we can also check whether the path is a directory or a file.

# Check if the path is a directory
os.path.isdir('Path to check')
Path('Path to check').is_dir()

# Check if the path is a file
os.path.isfile('Path to check')
Path('Path to check').is_file()

7. Obtain document information

File name

When working with files, you need to extract file names in many cases. Using Path is very simple. You can view the name attribute path.name on the Path object. If you don't want a suffix, you can look at the stem attribute path.stem.

for py_file in Path().glob('c*.py'):
...     print('Name with extension:', py_file.name)
...     print('Name only:', py_file.stem)
...
With file suffix: closures.py
 File name only: closures
 With file suffix: counter.py
 File name only: counter
 With file suffix: context_management.py
 File name only: context_management

file extension

If you want to extract the suffix of the file separately, you can view the suffix property of the Path object.

>>> file_path = Path('closures.py')
... print("file extension:", file_path.suffix)
...
File Extension: .py

File more information

If you want to get more information about the file, such as file size and modification time, you can use the stat() method, which is the same as os.stat().

# path object
... current_file_path = Path('iterable_usages.py')
... file_stat = current_file_path.stat()
...
>>> # Get file size:
... print("File size( Bytes):", file_stat.st_size)
File size( Bytes): 3531.3531
>>> # Get latest access time
... print("Last visit time:", file_stat.st_atime)
Last visit time: 283442956.3531
>>> # Get last modified time
... print("Last modified:", file_stat.st_mtime)
Last modified: 283442956.3531

8. Read file

One of the most important file operations is to read data from a file. The most common way to read a file is to create a file object using the built-in open() function. By default, this function opens the file in read mode and treats the data in the file as text.

# Read all text
... with open("hello2.txt", 'r') as file:
...     print(file.read())
...
Hello World!
Hello Python!
>>> # Line by line reading
... with open("hello2.txt", 'r') as file:
...     for i, line in enumerate(file, 1):
...         print(f"* Read row #{i}: {line}")
...
* Read row #1: Hello World!

* Read row #2: Hello Python!

If there is not much data in the file, you can use this read() method to read all the contents at once. However, if the file is large, you should consider using a generator, which can process the data line by line.

By default, the contents of the file are treated as text. If you want to use binary files, you should specify whether to use r or rb.

Another thorny problem is the encoding of files. Under normal circumstances, utf-8 encoding is used for open() encoding. If other encoding is used to process files, the encoding parameter should be set.

9. Write file

Still use the open() function to change the mode to w or a to open the file to create the file object. In W mode, the old data will be overwritten and new data will be written. In a mode, new data can be added on the basis of the original data.

# Write new data to file
... with open("hello3.txt", 'w') as file:
...     text_to_write = "Hello Files From Writing"
...     file.write(text_to_write)
...
>>> # Add some data
... with open("hello3.txt", 'a') as file:
...     text_to_write = "\nHello Files From Appending"
...     file.write(text_to_write)
...
>>> # Check whether the file data is correct
... with open("hello3.txt") as file:
...     print(file.read())
...
Hello Files From Writing
Hello Files From Appending

The above uses the with statement every time you open a file.

The with statement creates a context for us to process the file. When we finish the file operation, it can close the file object. This is very important. If we do not close the open file object in time, it is likely to be damaged.

10. Compress and decompress files

Compressed file

The zipfile module provides the function of file compression. Using the ZipFile() function to create a zip file object is similar to what we did with the open() function. Both involve creating a file object managed by the context manager.

 from zipfile import ZipFile
...
... # Create compressed file
... with ZipFile('text_files.zip', 'w') as file:
...     for txt_file in Path().glob('*.txt'):
...         print(f"*Add file: {txt_file.name} To compressed file")
...         file.write(txt_file)
...
*Add file: hello3.txt To compressed file
*Add file: hello2.txt To compressed file
 Unzip file
# Unzip file
... with ZipFile('text_files.zip') as zip_file:
...     zip_file.printdir()
...     zip_file.extractall()
...
File Name                                             Modified             Size
hello3.txt                                     2021-06-10 20:29:50           51
hello2.txt                                     2021-06-10 18:29:52

These are the top ten common file operations. Of course, you can also use pandas library to complete some reading operations.

Well, that's all for today

If you still haven't learned it, you can learn it together: Start18809

Learning to link: Personal space for self-taught Python Programming_ Beep beep beep_ Bilibili

There are a lot of knowledge points related to Python here. You can learn it together

Topics: Python Back-end

Programmer Think