Python 08 - file reading and writing
reference resources: Files in Python | geek tutorial (geek-docs.com)
introduce
This article describes how Python handles files and standard input and output. We will show how to read and write files from files.
Everything in Python is an object, and everything in UNIX is a file.
Disk file
open function
Built in function, belonging to IO module. open () returns a file object, the type of which depends on the schema, and performs standard file operations through this object.
open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
-
File is the name of the file to open.
-
Mode is an optional string that specifies the mode in which the file is opened. It defaults to "r", which means reading is turned on in text mode.
pattern | meaning |
---|---|
'r' | Read (default) |
'w' | write in |
'X' | Create a new file and open it for writing |
'a' | Add |
'b' | binary data |
't' | Text mode (default) |
'+' | Open disk file for update (read / write) |
-
Buffering is an optional integer used to set the buffering policy.
-
Encoding is the encoding name used to decode or encode files. It is the encoding of the platform by default
-
Errors is an optional string that specifies how encoding and decoding errors are handled.
-
newline controls the behavior of line breaks: None, '', '\ n', 'r', and '\ r\n'
-
If closefd is False, the underlying file descriptor will remain open when the file is closed. This does not work when a file name is given, in which case it must be True.
-
You can use a custom opener by passing callable functions to opener. Then call opener with (file, flags) to obtain the underlying file descriptor of the file object. Opener must return an open file descriptor (passing os.open as opener will lead to a function similar to passing None).
When reading in text mode, the platform specific line terminator (Unix \ R \ n, Windows \ r\n) will be converted to \ n by default. When writing in text mode, it will convert the appearing \ n back to the platform specific terminator by default. In this way, modifying file data behind the scenes is no problem for text files, but it will destroy binary data, such as data in JPEG or EXE files. Please note that binary mode should be used when reading and writing such files.
You can also use string or bytearray as a file for reading and writing. For strings, StringIO can be used like files opened in text mode, and for bytes, BytesIO can be used like files opened in binary mode.
# Default encoding >>> f = open("openpyxl 01 install.md") >>> f <_io.TextIOWrapper name='openpyxl 01 install.md' mode='r' encoding='cp936'> # Specify encoding >>> f = open("openpyxl 01 install.md",encoding="utf8") >>> f <_io.TextIOWrapper name='openpyxl 01 install.md' mode='r' encoding='utf8'>
with statement
When dealing with file objects, it is best to use the with keyword.
- The advantage is that the file will be closed correctly when the sub sentence is finished, even if an exception is thrown at some time.
- Using with is much shorter than the equivalent try finally code block, and processing files usually leads to errors;
- The with statement simplifies exception handling by encapsulating common preparation and cleanup tasks.
with open('workfile') as f: read_data = f.read()
File read function
-
read(n=-1) function
Reads the specified number of bytes from the file. If the number of bytes is not specified, it reads the entire file.>>> with open('hello.txt', 'r') as f: ... f.read(3) ... f.read() ... 'hel' 'lo world\n Hello, China'
-
readline() method
Read a line from the file. The trailing newline character is retained in the string. The function returns an empty string when it reaches the end of the file.>>> with open('hello.txt', 'r') as f: ... f.readline() ... f.readline() ... 'hello world\n' 'Hello, China'
-
readlines() method
Read the data until the end of the file, and then return to the line list.>>> with open('hello.txt', 'r') as f: ... content = f.readlines() # Returns a list of rows, each containing a newline character ... >>> for x in content: ... print(x.strip()) # Print each line, using str.strip() to remove white space characters, including line breaks ... hello world Hello, China
-
num = write() method
Writes a string to a file and returns the number of bytes>>> with open('hello.txt', 'w') as f: ... f.write("hello world\n") # 12 ... f.write("Hello, China") # 5 ... 12 5
file location
File location is the file location from which we read data.
-
The tell() method gives the current location in the file
>>> with open('hello.txt', 'r') as f: ... f.read(5) # Read 5 characters ... f.tell() # The current position is 5 ... 'hello' 5
-
The seek (offset, where = 0, /) method moves the location in the file.
Where option:* 0 -- Flow start (default); Offset should be zero or positive * 1 -- Current flow position; The offset may be negative * 2 -- Flow end; The offset is usually negative
>>> with open('hello.txt', 'r') as f: ... f.read(5) ... f.tell() ... f.seek(10) ... f.read() ... 'hello' 5 10 'd\n Hello, China'
Standard I/O
There are three basic I/O connections: standard input, standard output and standard error.
The standard inputs and outputs in Python are objects in the sys module.
object | describe |
---|---|
sys.stdin | The standard input is the data entering the program. Standard input comes from the keyboard. |
sys.stdout | Standard output is where we use the print keyword to print data. |
sys.stderr | A standard error is a stream in which the program writes an error message. Usually text terminals. |
In line with UNIX philosophy, the standard I/O stream is a file object.
Standard input
stdin is used for all interactive inputs (including calls to input());
import sys print('Enter your name: ', end='') name = '' sys.stdout.flush() while True: c = sys.stdin.read(1) if c == '\n': break name = name + c print('Your name is:', name)
However, in order to obtain input, a higher-level function is usually used: input().
>>> data = input("What's your name ? ") What's your name ? Peter >>> print(f"Welcom {data}") Welcom Peter >>> print(f"Welcom {data:^10}") Welcom Peter
standard output
- stdout is used for the output of print() and expression statements, and for the prompt of input();
>>> import sys >>> sys.stdout.write('Honore de Balzac, Father Goriot\n') Honore de Balzac, Father Goriot 32
-
The print function is usually used
print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)-
By default, the print function outputs text to sys stdout.
print('Honore de Balzac') print('The Splendors and Miseries of Courtesans', 'Gobseck', 'Father Goriot', sep=":") vals = [1, 2, 3, 4, 5] for e in vals: print(e, end=' ') print()
-
The print() function contains a file parameter that tells us where to print the data. So you can use the print() function to write to the file.
with open('works.txt', 'w') as f: print('Beatrix', file=f) print('Honorine', file=f) print('The firm of Nucingen', file=f)
-
Standard error output is almost the same as standard error. It is the outflow object of data. It is omitted.
Redirection and recovery of standard IO
Standard output can be redirected. In the following example, we redirect the standard output to a regular file.
- In the script, we redirect the standard output to the regular file output txt.
- Then, restore the original standard output. The original value of std.output is saved in a special sys__ stdout__ Variable.
import sys with open('output.txt', 'w') as f: sys.stdout = f print('Lucien') sys.stdout.write('Rastignac\n') sys.stdout.writelines(['Camusot\n', 'Collin\n']) sys.stdout = sys.__stdout__ print('Bianchon') sys.stdout.write('Lambert\n')
Serialization and deserialization of objects
The pickle module implements binary serialization and deserialization of a Python object structure.
- "pickling" is the process of converting Python objects and their hierarchies into a byte stream,
- "Unpicking" is the opposite operation. It will convert the byte stream (from a binary file or byte like object) back to an object hierarchy.
Pickling (and unpicking) is also called "serialization", "marshalling", or "planarization". To avoid confusion, the terms "pickling" and "unpicking" are used.
annotation
Serialization is a lower level concept than persistence. Although pickle reads and writes file objects, it does not deal with the naming of persistent objects or the concurrent access to persistent objects (even more complex). Pickle module can convert complex objects into byte stream, or convert byte stream into objects with the same internal structure. The most common way to process these byte streams is to write them to a file, but they can also be sent over the network or stored in a database. The shell module provides a simple interface for sealing and unsealing objects on DBM type database files.
Pickle: it means pickle or pickle
Pickling: it means pickling and pickling
unpickling: interpreted as pickling
method
-
Use the dump() method to pickle the object.
dump(object, file) dumps(object) -> string
-
Unlock the object using the load() method.
load(file) -> object loads(string) -> object
example
#!/usr/bin/env python # pickle_ex.py import pickle class Person: def __init__(self, name, age): self.name = name self.age = age def get_name(self): return self.name def get_age(self): return self.age person = Person('Monica', 15) print(person.get_name()) print(person.get_age()) with open('monica', 'wb') as f: pickle.dump(person, f) with open('monica', 'rb') as f2: monica = pickle.load(f2) print(monica.get_name()) print(monica.get_age())