Text and binary files
- text file
The text file stores ordinary "character" text, which can be opened with Notepad program. - Binary file
Binary files store the data content in "bytes" and cannot be opened with Notepad.
File operation related modules
name | explain |
---|---|
io module | Input and output operations of file stream input output |
os module | Basic operating system functions, including file operation |
glob module | Find file pathnames that match specific rules |
fnmatch module | Use patterns to match file pathnames |
fileinput module | Process multiple input files |
filecmp module | For file comparison |
cvs module | For csv file processing |
pickle and cPickle | Used for serialization and deserialization |
xml package | For XML data processing |
bz2,gzip,zipfile,zlib,tarfile | Used to process compressed and decompressed files (corresponding to different algorithms) |
open() creates a file object
The open() function is used to create a file object
Open (file name [, opening method])
To reduce the input of "\", you can use the original string: r "d:\b.txt"
pattern | describe |
---|---|
r | read mode |
w | write mode. If the file does not exist, create it; If the file exists, rewrite the new content; |
a | Append mode. If the file does not exist, create it; If the file exists, append content to the end of the file |
b | Binary binary mode (can be combined with other modes) |
+ | Read and write mode (can be combined with other modes) |
Note: creation of text file object and binary file object:
If the mode "b" is not added, the text file object is created by default, and the basic unit of processing is "character".
Add binary mode "b", the binary file object is created, and the basic unit of processing is "byte".
Common properties and methods of file objects
File object properties:
attribute | explain |
---|---|
name | Returns the name of the file |
mode | Returns the open mode of the file |
closed | Returns True if the file is closed |
File object open mode:
pattern | explain |
---|---|
r | Read mode |
w | Write mode |
a | append mode |
b | Binary mode (can be combined with other modes) |
+ | Read / write mode (other modes can be combined) |
Common methods for file objects:
Method name | explain |
---|---|
read([size]) | Read the contents of size bytes or characters from the file and return. If [size] is omitted, it will be read to the end of the file, that is, all contents of the file will be read at one time |
readline() | Read a line from a text file |
readlines() | Each line in the text file is treated as an independent string object, and these objects are returned in the list |
write(str) | Writes the string str contents to a file |
writelines(s) | Writes the string list s to the file without adding line breaks |
seek(offset[,whence]) | Move the file pointer to the new position, and offset represents the offset of how many bytes relative to where; Offset: positive to the end and negative to the start. Different values represent different meanings: 0: calculated from the file header (default) 1: calculated from the current position 2: calculated from the end of the file |
tell() | Returns the current position of the file pointer |
truncate([size]) | No matter where the pointer is, only the first size bytes of the pointer are left, and the rest are deleted; If no size is passed in, all contents will be deleted when the pointer reaches the end of the file |
flush() | Writes the contents of the buffer to the file without closing the file |
close() | Write the contents of the buffer to the file, close the file at the same time, and release the resources related to the file object |
pickle serialization
Serialization refers to converting objects into "serialized" data form, storing them on hard disk or transmitting them to other places through network. Deserialization refers to the reverse process of converting the read "serialized data" into objects.
The functions in pickle module are used to realize serialization and deserialization.
Serialize & deserialize:
pickle.dump(obj, file) obj is the object to be serialized, and file refers to the stored file
pickle.load(file) reads data from file and deserializes it into objects
Text file reading and writing
Text file writing steps
There are three steps to writing:
1. Create file object
2. Write data
3. Close the file object
write()/writelines() writes data
write(a): write the string a to the file. writelines(b): write the string list to the file without adding line breaks
close() closes the file stream
An open file object must explicitly call the close() method to close the file object. When the close() method is called, the buffer data will be written to the file first (or the flush() method can be called directly), and then the file will be closed to release the file object.
In order to ensure that the open file object is closed normally, it is generally implemented in combination with the finally or with keyword of the exception mechanism.
with statement (context manager)
Automatically manage context resources. No matter why the with block jumps out, it can ensure that the file is closed correctly, and can automatically restore the scene when entering the code block after the code block is executed.
Reading of text files
Generally, there are three methods:
1. read([size]) reads size characters from the file and returns them as results. If there is no size parameter, the entire file is read. Reading to the end of the file returns an empty string.
2. readline() reads a line and returns it as a result. Reading to the end of the file returns an empty string.
3. In the readlines() text file, each line is stored in the list as a string and the list is returned
Binary file reading and writing
The processing flow of binary files is consistent with that of text files. However, you need to specify a binary schema to create a binary object.
f = open(r"d:\a.txt", 'wb') #Writable, overridden binary object f = open(r"d:\a.txt", 'ab') #Writable, append mode binary object f = open(r"d:\a.txt", 'rb') #Readable binary object
After creating binary file objects, you can still use write() and read() to read and write files.
CSV file reading and writing
csv(Comma Separated Values) is a comma separated text format, which is commonly used for data exchange, import and export of Excel files and database data. Unlike Excel files, CSV files:
1. The value has no type, and all values are strings
2. Font color and other styles cannot be specified
3. The width and height of cells cannot be specified, and cells cannot be merged
4. There are no multiple worksheets
5. Image chart cannot be embedded
For example: Excel table:
Save as CSV format and open with Notepad:
Name, telephone, address
Xiaoming, 18889303000, Jinfeng Road
Xiaohong, 18829920000, Wuyuan Road
Wang Ming, 16668829922, Fengtian Road
csv module
The module csv of Python standard library provides objects for reading and writing csv format files
csv.reader object (csv file reading)
import csv with open(r"e:\a.csv") as a: a_csv = csv.reader(a) #Create a csv object, which is a list of all data, one element per line headers = next(a_csv) #Gets a list object that contains information about the title row print(headers) for row in a_csv: #Cycle through lines print(row) ##print ['full name', 'Telephone', 'address'] ['Xiao Ming', '18889303000', 'Jinfeng Road'] ['Xiao Hong', '18829920000', 'Wuyuan Road'] ['Wang Ming', '16668829922', 'Fengtian Road']
csv.writer object (csv file write)
import csv headers = ["Job number","full name","Age","address","a monthly salary"] rows =[("1001","Wang Ming",18,"Xisanqi No. 1 hospital","50000"),("1002","Gao Ba",19,"Xisanqi No. 1 hospital","30000")] with open(r"d:\b.csv","w") as b: b_csv = csv.writer(b) #Create csv object b_csv.writerow(headers) #Write one line (title) b_csv.writerows(rows) #Write multiple rows (data)
os module
os module can help us operate the operating system directly.
Os.system (execute system command)
import os os.system("ping www.baidu.com")
Note: the Chinese code may be garbled, and the IDE code needs to be adjusted to GBK
os.startfile (directly call the executable)
#Start wechat import os os.startfile(r"C:\Program Files (x86)\Tencent\WeChat\WeChat.exe")
os module - file and directory related operations
Common file operations:
Method name | describe |
---|---|
remove(path) | Delete the specified file |
rename(src,dest) | Rename a file or directory |
stat(path) | Returns all properties of the file |
listdir(path) | Returns the list of files and directories in the path directory |
Common directory operations:
Method name | describe |
---|---|
mkdir(path) | Create directory |
makedirs(path1/path2/path3/... ) | Create multi-level directory |
rmdir(path) | Delete directory |
removedirs(path1/path2...) | Delete multi-level directory |
getcwd() | Return to the current working directory: current work dir |
chdir(path) | Set path to the current working directory |
walk() | Traverse the directory tree sep the path separator used by the current operating system |
#coding=gbk #Test the file directory related operations in the os module import os #############Get information about files and folders################ print (os.name) #Windows - > NT Linux and UNIX - > POSIX print (os.sep) #Windows - > \ Linux and UNIX - >/ print (repr(os.linesep)) #windows->\r\n linux-->\n\ print(os.stat("main.py")) ##############About working directory operations############### #Note: relative paths are relative to the current working directory print(os.getcwd()) #Current working directory #os.chdir("d:") #Change the current working directory to: d: root directory #os.mkdir("book".encode("GBK")) #Create directory #os.rmdir("book") #Delete directory #os.makedirs("film/Hong Kong and Taiwan/Zhou Xingchi") #Create multi-level directory #os.removedirs("film/Hong Kong and Taiwan/Zhou Xingchi") #Only empty directories can be deleted #os.rename("movie", "movie") # dirs = os.listdir("movie") # print(dirs)
os.path module
os.path module provides directory related operations (path judgment, path segmentation, path connection, folder traversal).
method | describe |
---|---|
isabs(path) | Determine whether the path is an absolute path |
isdir(path) | Determine whether the path is a directory |
isfile(path) | Determine whether the path is a file |
exists(path) | Judge whether the file in the specified path exists |
getsize(filename) | Returns the size of the file in bytes |
abspath(path) | Return absolute path |
dirnameĀ§ | Returns the path to the directory |
getatime(filename) | Returns the last access time of the file |
getmtime(filename) | Returns the last modification time of the file |
walk(top,func,arg) | Traversing directories recursively |
join(path,*paths) | Connecting multiple path s |
split(path) | Split the path and return it as a list |
splitext(path) | Splits the file extension from the path |
##encoding: utf-8 #Common methods of testing os.path import os.path #################Obtain basic information of directory and file print(os.path.isabs("d:/a.txt")) #Absolute path print(os.path.isdir("d:/a.txt")) #Directory print(os.path.isfile("d:/a.txt")) #File print(os.path.exists("a.txt")) #Does the file exist print(os.path.getsize("a.txt")) #file size print(os.path.abspath("a.txt")) #Output absolute path print(os.path.dirname("d:/a.txt")) #Output directory ########Obtain the creation time, access time and last modification time########## print(os.path.getctime("a.txt")) #Return creation time print(os.path.getatime("a.txt")) #Return last access time print(os.path.getmtime("a.txt")) #Returns the last modification time ################Divide and connect paths############ path = os.path.abspath("a.txt") #Return absolute path print(os.path.split(path)) #Return tuple: directory, file ##print ('E:\\PythonProject', 'a.txt') print(os.path.splitext(path)) #Return tuple: path, extension ##print ('E:\\PythonProject\\a', '.txt') print(os.path.join("aa","bb","cc")) #Return path: aa/bb/cc
walk() recursively traverses all files and directories
os.walk() method:
Returns a tuple of 3 elements (dirpath, dirnames, filenames)
dirpath: the path to list the specified directory
dirnames: all folders in the directory
filenames: all files in the directory
#coding=utf-8 import os all_files = [] path = os.getcwd() list_files = os.walk(path) for dirpath,dirnames,filenames in list_files: for dir in dirnames: all_files.append(os.path.join(dirpath,dir)) for name in filenames: all_files.append(os.path.join(dirpath,name)) for file in all_files: print (file)
shutil module (copy and compression)
shutil module is mainly used to copy, move and delete files and folders; You can also compress and decompress files and folders.
The os module provides general operations on directories or files. As a supplement, the shutil module provides operations such as moving, copying, compressing and decompressing, which are not provided by these os modules.
#encoding=gbk import shutil import zipfile #copy file content #shutil.copyfile("a.txt","a_copy.txt") #"Music" folder does not exist to use!!! #Copy the contents under the folder "movies / learning" to the folder "music". Ignore all html and htm files when copying. #shutil.copytree("movie / RTHK", "music", ignore=shutil.ignore_patterns("*.html","*.htm")) #Compress all contents in the "movies / Hong Kong and Taiwan" folder into the "music 2" folder to generate movie.zip #shutil.make_archive("music / movie","zip", "movie / RTHK") #Compress: compress the specified multiple files into a zip file # z = zipfile.ZipFile("a.zip","w") # z.write("1.txt") # z.write("2.txt") # z.close() #Decompression: # z2 = zipfile.ZipFile("a.zip","r") # z2.extractall("d:/") #Set the decompression address # z2.close()
Common character coding
ASCII
ASCII code is represented by 7 bits and can only represent 128 characters. The highest bit of one byte ASCII encoding is always 0.
ISO8859-1
ISO-8859-1, also known as Latin-1, is an 8-bit single byte character set. It also makes use of the highest bit of ASCII and is compatible with ASCII. The new space is 128, but it is not completely used up. The corresponding text symbols of Western European language, Greek, Thai, Arabic and Hebrew are added on top of ASCII coding, which is downward compatible with ASCII coding.
GB2312,GBK,GB18030
GB2312
GB2312, fully known as the Chinese character coded character set for information exchange, was released in China in 1980 and is mainly used for Chinese character processing in computer systems. Covering most Chinese characters, it can not deal with special rare words such as ancient Chinese, so later codes such as GBK and GB18030 appeared.
GB2312 is fully compatible with ISO8859-1.
GBK
The Chinese character internal code extension specification mainly extends GB2312. Formulated in 1995
GB18030
The latest internal code word set was released in 2000. It mainly adopts single byte, double byte and four byte character coding. It is downward compatible with GB2312 and GBK. GBK and GB2312 are used most.
Unicode
Unicode encoding is designed to fix two bytes, and all characters use 16 bits.
Unicode is completely redesigned and is not compatible with iso8859-1 or any other encoding.
UTF-8
For English letters, unicode also needs two bytes to represent. Therefore, unicode is not convenient for transmission and storage. Therefore, UTF coding is generated.
UTF encoding is compatible with iso8859-1 encoding and can also be used to represent characters in all languages. However, UTF encoding is variable length encoding, and the length of each character ranges from 1-4 bytes. Among them, English letters are represented by one byte, while Chinese characters are represented by three bytes.
Chinese garbled code problem
The default code of windows operating system is GBK, and the default code of Linux operating system is UTF-8. When we use open(), we call the file opened by the operating system, and the default code is GBK.