Python 08 - file reading and writing

Posted by nmarisco on Sun, 23 Jan 2022 21:34:12 +0100

Python 08 - file reading and writing

reference resources: Files in Python | geek tutorial (geek-docs.com)

introduce

This article describes how Python handles files and standard input and output. We will show how to read and write files from files.

Everything in Python is an object, and everything in UNIX is a file.

Disk file

open function

Built in function, belonging to IO module. open () returns a file object, the type of which depends on the schema, and performs standard file operations through this object.

open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
  • File is the name of the file to open.

  • Mode is an optional string that specifies the mode in which the file is opened. It defaults to "r", which means reading is turned on in text mode.

patternmeaning
'r'Read (default)
'w'write in
'X'Create a new file and open it for writing
'a'Add
'b'binary data
't'Text mode (default)
'+'Open disk file for update (read / write)
  • Buffering is an optional integer used to set the buffering policy.

  • Encoding is the encoding name used to decode or encode files. It is the encoding of the platform by default

  • Errors is an optional string that specifies how encoding and decoding errors are handled.

  • newline controls the behavior of line breaks: None, '', '\ n', 'r', and '\ r\n'

  • If closefd is False, the underlying file descriptor will remain open when the file is closed. This does not work when a file name is given, in which case it must be True.

  • You can use a custom opener by passing callable functions to opener. Then call opener with (file, flags) to obtain the underlying file descriptor of the file object. Opener must return an open file descriptor (passing os.open as opener will lead to a function similar to passing None).

    When reading in text mode, the platform specific line terminator (Unix \ R \ n, Windows \ r\n) will be converted to \ n by default. When writing in text mode, it will convert the appearing \ n back to the platform specific terminator by default. In this way, modifying file data behind the scenes is no problem for text files, but it will destroy binary data, such as data in JPEG or EXE files. Please note that binary mode should be used when reading and writing such files.

You can also use string or bytearray as a file for reading and writing. For strings, StringIO can be used like files opened in text mode, and for bytes, BytesIO can be used like files opened in binary mode.

# Default encoding
>>> f = open("openpyxl 01 install.md")
>>> f
<_io.TextIOWrapper name='openpyxl 01 install.md' mode='r' encoding='cp936'>

# Specify encoding
>>> f = open("openpyxl 01 install.md",encoding="utf8")
>>> f
<_io.TextIOWrapper name='openpyxl 01 install.md' mode='r' encoding='utf8'>

with statement

When dealing with file objects, it is best to use the with keyword.

  • The advantage is that the file will be closed correctly when the sub sentence is finished, even if an exception is thrown at some time.
  • Using with is much shorter than the equivalent try finally code block, and processing files usually leads to errors;
  • The with statement simplifies exception handling by encapsulating common preparation and cleanup tasks.
with open('workfile') as f:
      read_data = f.read()

File read function

  • read(n=-1) function
    Reads the specified number of bytes from the file. If the number of bytes is not specified, it reads the entire file.

    >>> with open('hello.txt', 'r') as f:
    ...     f.read(3)
    ...     f.read()
    ...
    'hel'
    'lo world\n Hello, China'
    
  • readline() method
    Read a line from the file. The trailing newline character is retained in the string. The function returns an empty string when it reaches the end of the file.

    >>> with open('hello.txt', 'r') as f:
    ...     f.readline()
    ...     f.readline()
    ...
    'hello world\n'
    'Hello, China'
    
  • readlines() method
    Read the data until the end of the file, and then return to the line list.

    >>> with open('hello.txt', 'r') as f:
    ...     content = f.readlines()   # Returns a list of rows, each containing a newline character
    ...
    >>> for x in content:
    ...     print(x.strip())		# Print each line, using str.strip() to remove white space characters, including line breaks 
    ...
    hello world
     Hello, China
    
  • num = write() method
    Writes a string to a file and returns the number of bytes

    >>> with open('hello.txt', 'w') as f:
    ...     f.write("hello world\n") # 12
    ...     f.write("Hello, China")			# 5
    ...
    12
    5
    

file location

File location is the file location from which we read data.

  • The tell() method gives the current location in the file

    >>> with open('hello.txt', 'r') as f:
    ...     f.read(5) # Read 5 characters
    ...     f.tell()  # The current position is 5
    ...
    'hello'
    5
    
  • The seek (offset, where = 0, /) method moves the location in the file.
    Where option:

    * 0 -- Flow start (default); Offset should be zero or positive
    * 1 -- Current flow position; The offset may be negative
    * 2 -- Flow end; The offset is usually negative
    
    >>> with open('hello.txt', 'r') as f:
    ...     f.read(5)
    ...     f.tell()
    ...     f.seek(10)
    ...     f.read()
    ...
    'hello'
    5
    10
    'd\n Hello, China'
    

Standard I/O

There are three basic I/O connections: standard input, standard output and standard error.

The standard inputs and outputs in Python are objects in the sys module.

objectdescribe
sys.stdinThe standard input is the data entering the program. Standard input comes from the keyboard.
sys.stdoutStandard output is where we use the print keyword to print data.
sys.stderrA standard error is a stream in which the program writes an error message. Usually text terminals.

In line with UNIX philosophy, the standard I/O stream is a file object.

Standard input

stdin is used for all interactive inputs (including calls to input());

import sys
print('Enter your name: ', end='')
name = ''
sys.stdout.flush()
while True:
    c = sys.stdin.read(1)
    if c == '\n':
        break
    name = name + c
print('Your name is:', name)

However, in order to obtain input, a higher-level function is usually used: input().

>>> data = input("What's your name ? ")
What's your name ? Peter
>>> print(f"Welcom {data}")
Welcom Peter
>>> print(f"Welcom {data:^10}")
Welcom   Peter

standard output

  • stdout is used for the output of print() and expression statements, and for the prompt of input();
>>> import sys
>>> sys.stdout.write('Honore de Balzac, Father Goriot\n')
Honore de Balzac, Father Goriot
32
  • The print function is usually used
    print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)

    • By default, the print function outputs text to sys stdout.

      print('Honore de Balzac')
      print('The Splendors and Miseries of Courtesans', 'Gobseck', 'Father Goriot', sep=":")
      
      vals = [1, 2, 3, 4, 5]
      
      for e in vals:
          print(e, end=' ')
      print()
      
    • The print() function contains a file parameter that tells us where to print the data. So you can use the print() function to write to the file.

      with open('works.txt', 'w') as f:
          print('Beatrix', file=f)
          print('Honorine', file=f)
          print('The firm of Nucingen', file=f)
      

Standard error output is almost the same as standard error. It is the outflow object of data. It is omitted.

Redirection and recovery of standard IO

Standard output can be redirected. In the following example, we redirect the standard output to a regular file.

  • In the script, we redirect the standard output to the regular file output txt.
  • Then, restore the original standard output. The original value of std.output is saved in a special sys__ stdout__ Variable.
import sys

with open('output.txt', 'w') as f:

    sys.stdout = f

    print('Lucien')
    sys.stdout.write('Rastignac\n')
    sys.stdout.writelines(['Camusot\n', 'Collin\n'])

    sys.stdout = sys.__stdout__

    print('Bianchon')
    sys.stdout.write('Lambert\n')

Serialization and deserialization of objects

The pickle module implements binary serialization and deserialization of a Python object structure.

  • "pickling" is the process of converting Python objects and their hierarchies into a byte stream,
  • "Unpicking" is the opposite operation. It will convert the byte stream (from a binary file or byte like object) back to an object hierarchy.

Pickling (and unpicking) is also called "serialization", "marshalling", or "planarization". To avoid confusion, the terms "pickling" and "unpicking" are used.

annotation

Serialization is a lower level concept than persistence. Although pickle reads and writes file objects, it does not deal with the naming of persistent objects or the concurrent access to persistent objects (even more complex). Pickle module can convert complex objects into byte stream, or convert byte stream into objects with the same internal structure. The most common way to process these byte streams is to write them to a file, but they can also be sent over the network or stored in a database. The shell module provides a simple interface for sealing and unsealing objects on DBM type database files.

Pickle: it means pickle or pickle
Pickling: it means pickling and pickling
unpickling: interpreted as pickling

method

  • Use the dump() method to pickle the object.

    dump(object, file)
    dumps(object) -> string
    
  • Unlock the object using the load() method.

    load(file) -> object
    loads(string) -> object
    

example

#!/usr/bin/env python

# pickle_ex.py

import pickle

class Person:

    def __init__(self, name, age):
        self.name = name
        self.age = age

    def get_name(self):
        return self.name

    def get_age(self):
        return self.age

person = Person('Monica', 15)
print(person.get_name())
print(person.get_age())

with open('monica', 'wb') as f:
    pickle.dump(person, f)

with open('monica', 'rb') as f2:
    monica = pickle.load(f2)

print(monica.get_name())
print(monica.get_age())

Topics: Python