Introduction to python: reading and writing files

Posted by jlarson on Fri, 26 Nov 2021 13:34:21 +0100

Read write file

read

Open a file through the built-in function open() and return the file object. If it cannot be opened, OSError will be thrown.

File content

Hello
 Zhang San
 Outlaw maniac
 Lin Daiyu Fengxue mountain temple

read

If the file is small, read() is the most convenient one-time read

file = open("222.txt", mode="r", encoding="utf-8")
print(type(file)) #Type of print file object
print(file.read()) #Read all the contents of the file at one time, which means that it is doomed to be unable to read large files
file.close() #After opening the file, be sure to close the file, otherwise it will always occupy memory

result

<class '_io.TextIOWrapper'>
Hello
 Zhang San
 Outlaw maniac
 Lin Daiyu Fengxue mountain temple

What's left after read?

with open("222.txt", mode="r", encoding="utf-8") as f:
    print("for the first time read")
    print(f.read())
    print("The second time read")
    print(f.read())
for the first time read
 Hello
 Zhang San
 Outlaw maniac
 Lin Daiyu Fengxue mountain temple

Tang Monk vs Decepticons
 The second time read


Process finished with exit code 0

We found a problem. After the first read, we didn't read anything the second time. We can imagine that there are N cakes in the pot (get the content from the file to the cache). When the pot is brought to the basin (take it out of the cache and print it), the pot is empty

with

Well, we have a certain understanding of reading files, but we need to interrupt. There is a disadvantage in the way we read files above, that is, we must close the files at the end. The following two situations may cause us not to close the files normally, resulting in a waste of system resources

1. Naughty, just forget to write

2. The file was opened, but before closing, the program reported an error and could not close

The first one is good. Human eye verification is hard. The second one is hard. One solution is to try... finally, but it is still very cumbersome. In order to solve this problem, with is introduced. The following method will automatically call close for us

with open("222.txt", mode="r", encoding="utf-8") as f:
    print(type(f))
    print(f.read())
<class '_io.TextIOWrapper'>
Hello
 Zhang San
 Outlaw maniac
 Lin Daiyu Fengxue mountain temple

Is it the same as the first example? We'll write it like this in the future. Don't write it like before.

read(size)

Calling read() will read all the contents of the file at one time. If the file has 10G, the memory will burst. Therefore, to be safe, you can call the read(size) method repeatedly to read the contents of size bytes at most each time

If the file size cannot be determined, it is safer to call read(size) repeatedly

As for how to use it, to be honest, I haven't used this in actual use, because usually I read more configuration files, but I copied an answer

def readlines(f, separator):
  '''
  Method of reading large files
  :param f:  File handle
  :param separator:  Separator for each line
  :return:
  '''
  buf = ''
  while True:
    while separator in buf:
      position = buf.index(separator) # Position of separator
      yield buf[:position] # Slice, from start position to separator position
      buf = buf[position + len(separator):] # Slice again, cut off the data of yield, and retain the remaining data

    chunk = f.read(4096) # Read 4096 data into buf at one time
    if not chunk: # If no data is read
      yield buf # Return data in buf
      break # end
    buf += chunk # If read has data, add the read data to buf


with open('text.txt',encoding='utf-8') as f:
  for line in readlines(f,'|||'):
    # Why can the readlines function use the for loop to traverse? Because there is the yield keyword in this function, which is a generator function
    print(line)

readline

Call readline() to read one line at a time

with open("222.txt", mode="r", encoding="utf-8") as f:
    print(type(f))
    print(f.readline())
<class '_io.TextIOWrapper'>
Hello

We found that this readline can only read one line. How can we read multiple lines.

Before reading all the contents of the file, let's upgrade the file and add a blank line in the middle

Hello
 Zhang San
 Outlaw maniac
 Lin Daiyu Fengxue mountain temple

Tang Monk vs Decepticons
with open("222.txt", mode="r", encoding="utf-8") as f:
    done = 0
    while not done:  # 0 is False, not False = True
        line = f.readline()
        if line != "":  # If the read content is not empty
            print(line.strip())  # Print the contents of this line. The strip() method is used to remove the characters specified at the beginning and end of the string (space or newline by default)
        else:
            done = 1  # If the content read by readline is empty, the loop ends
Hello
 Zhang San
 Outlaw maniac
 Lin Daiyu Fengxue mountain temple

Tang Monk vs Decepticons

Many people may have questions here. Will an empty line be regarded as the end of the file? In fact, a blank line in a file does not return a blank line. Because there is one or more separators at the end of each line, the "blank line" will have at least one line break or other symbols used by the system. Therefore, even if the file really contains a "blank line", the read line is not empty, which means that the program will not actually stop until the end of the actual traversal and reading of the file.

readlines

Call readlines() to read everything at once and return the list by line

If it is a configuration file, it is most convenient to call readlines()

with open("222.txt", mode="r", encoding="utf-8") as f:
    for i in f.readlines():
        print(i.strip())
    print(type(f.readlines()))
Hello
 Zhang San
 Outlaw maniac
 Lin Daiyu Fengxue mountain temple

Tang Monk vs Decepticons
<class 'list'>
with open("222.txt", mode="r", encoding="utf-8") as f:
    print(f.readlines())
['Hello\n', 'Zhang San\n', 'Outlaw maniac\n', 'Lin Daiyu Fengxue mountain temple\n', '\n', 'Tang Monk vs Decepticons']

We found that the newline is \ n, the blank line is \ n, and the read blank line is not ""

for line in f.readlines():
    print(line.strip()) # Delete '\ n' at the end

rb and encoding

All characters in Python 3 are in the form of utf-8. What if I don't know the code when opening a file?

Then we don't specify the encoding code. Python 3 defaults to utf8, but that doesn't work. At the same time, we need to change the reading mode. r is the text mode, which can directly read the string. If the user doesn't know the file format, he can not specify the encoding format, and directly use the rb mode, It's how the hard disk is stored. You can store it in memory directly in binary form

Error demonstration

with open("one.jpg", mode="r") as f:
    print(f.readlines())

report errors

Traceback (most recent call last):
  File "/Users/zc/PycharmProjects/pythonProject1/test/MyOne.py", line 2, in <module>
    print(f.readlines())
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

Correct opening mode

with open("one.jpg", mode="rb") as f:
    print(f.readlines())

It's also very big to open a picture through binary files. Just show part of it

d7\xc4\xa5\x92\xbe]4H\x06\x9d8\xa6Aa\x8d\x15b%\xa3tD\x84\x8fUL\xa1F\xb7\x95\xc7\xf6G\xf4\xf6\xa0\x96T\x0b\xe7\xd5Y\xdbN:Vm\xac\xbd2\xe4`\xa6\x9eS\xea\x93J>\xb0\xaa\xd5\x04\x1d1)\xfa\xf8\xd3\x9b\xff\x00\xaf\xed|\x12+\xfe\x93\xf9\x0c}\xbd*\x82M/W\xe1\xe5\xf6\xf4\xae\x97r\xd6\xe2e\x9a\xb9*\x8a\x1aZ\x91\x03\x95vT}\x12i\x88\xd8\x8e}#\x8f\xf6\xfe\xdcm\xd2{5\x0c~ ~\xdf\xe5\xd2\xcf\x0c:\x96\x93$\xff\x00\x83\xfc\xfd\x0f;
. . . . . . . 

How do you know the file encoding format?

chardet!

Installation command

pip3 install chardet

use

import chardet

result = chardet.detect(open("222.txt", mode="rb").read())
print(result)
{'encoding': 'utf-8', 'confidence': 0.99, 'language': ''}

What if you encounter non-standard files

When encountering some files with nonstandard encoding, you may encounter Unicode decodeerror, because some illegally encoded characters may be mixed in the text file. In this case, the open() function also receives an errors parameter, which indicates how to deal with coding errors. The simplest way is to ignore it directly

Reverse textbook, use gbk encoding to read utf-8 encoding file

with open("222.txt", mode="r", encoding="gbk") as f:
    print(f.read())

report errors

Traceback (most recent call last):
  File "/Users/zc/PycharmProjects/pythonProject1/test/MyOne.py", line 2, in <module>
    print(f.read())
UnicodeDecodeError: 'gbk' codec can't decode byte 0xa4 in position 18: illegal multibyte sequence

What if we ignore the error report

with open("222.txt", mode="r", encoding="gbk", errors='ignore') as f:
    print(f.read())

result

Youソ
For three years
 Legal check
 Increase in support area

Panel lightфLong dark manuscriptぉPlutonium

It is found that although the code is wrong, at least no error is reported and the program is not interrupted

write

write

with open("222.txt", mode="r") as f:
    print("Before the file is written")
    print(f.read())

with open("222.txt", mode="w") as f:
    f.write("Why do meteorites always fall in craters? So accurate. Who dug this crater")

with open("222.txt", mode="r") as f:
    print("After the file is written")
    print(f.read())

result

Before the file is written
 Hello
 Zhang San
 Outlaw maniac
 Lin Daiyu Fengxue mountain temple

Tang Monk vs Decepticons
 After the file is written
 Why do meteorites always fall in craters? So accurate. Who dug this crater

We found that the file was overwritten. In fact

w is not a modification, but a new file name is created. If it has the same name as the original old file, the original file is empty. If it has a different file name, it is a new one, so we should use it carefully: w

append mode

Ah, it overwrites my file. I don't want to. I just want to add content at the end of the file. Then we need to use mode = "a"

with open("222.txt", mode="r") as f:
    print("Before the file is written")
    print(f.read())

with open("222.txt", mode="a") as f:
    f.write("This is an addition")

with open("222.txt", mode="r") as f:
    print("After the file is written")
    print(f.read())
Before the file is written
 Why do meteorites always fall in craters? So accurate. Who dug this crater
 After the file is written
 Why do meteorites always fall in craters? So accurate. Who dug this crater? This is an additional content

We found that it is directly added to the end of the file, and it is a peer display. If you want to cross line display, you only need to

```

f.write("this is the additional content")
```

Modified into

f.write("\n This is an addition")

writelines

a = ["\n", "I'm grandma Liu\n", "Grandma Liu's Baoyu fell in love with me"]
with open("222.txt", mode="r") as f:
    print("Before the file is written")
    print(f.read())
with open("222.txt", mode="a") as f:
    f.writelines(a)
with open("222.txt", mode="r") as f:
    print("After the file is written")
    print(f.read())
Before the file is written
 Why do meteorites always fall in craters? So accurate. Who dug this crater
 After the file is written
 Why do meteorites always fall in craters? So accurate. Who dug this crater
 I'm grandma Liu
 Grandma Liu's Baoyu fell in love with me

File path

File path is divided into relative path and absolute path. Relative path refers to the path relationship between the path of a file or folder and other files or folders, while absolute path refers to the path from the drive letter (i.e. disk area) to the current location.

Determine whether the path of a file or folder is an absolute path

import os
print(os.path.isabs("222.txt"))
print(os.path.isabs("/Users/zc/PycharmProjects/pythonProject1/test/222.txt"))
False
True

Get file absolute path

import os
print(os.path.abspath("222.txt"))
/Users/zc/PycharmProjects/pythonProject1/test/222.txt

Get current path

import os
path1 = os.getcwd()
path2 = os.path.dirname(__file__)  #More commonly used
print(path1)
print(path2)
/Users/zc/PycharmProjects/pythonProject1/test
/Users/zc/PycharmProjects/pythonProject1/test

Determine whether the path exists

import os
print(os.path.exists("/Users/zc/PycharmProjects/pythonProject1/test"))
print(os.path.exists("/Users/zc/PycharmProjects/pythonProject1/tes2"))
True
False

Return to the path

If the input parameter is a path, it returns to the previous layer

import os
path1 = os.getcwd()
print(path1)
path2 = os.path.dirname(path1)
print(path2)
/Users/zc/PycharmProjects/pythonProject1/test
/Users/zc/PycharmProjects/pythonProject1

If the input parameter is a file, the path of the file is returned

import os
path1 = "/Users/zc/PycharmProjects/pythonProject1/test/MyOne.py"
path2 = os.path.dirname(path1)
print(path2)
/Users/zc/PycharmProjects/pythonProject1/test

Splicing path

import os

path1 = os.path.dirname(__file__)
path2 = os.path.join(path1, "112.txt")
path3 = os.path.join(path1, "222.txt")
print(path2)
print(path3)
/Users/zc/PycharmProjects/pythonProject1/test/112.txt
/Users/zc/PycharmProjects/pythonProject1/test/222.txt

Topics: Python Java Back-end