[Python tutorial] Chapter 72 reading CSV files

Posted by dinno2 on Sun, 06 Mar 2022 11:20:23 +0100

In this article, we introduce how to use Python's built-in CSV module to read CSV files.

CSV file

CSV stands for comma separated values. A CSV file is a text file that uses commas to separate data.

A CSV file contains one or more rows of data, and each row of data represents a record. Each record contains one or more values separated by commas. In addition, all data rows in a CSV file contain the same number of values.

We usually use CSV files to store tabular data. Many software support this file format, such as Microsoft Excel and Google Spreadsheet.

Read CSV file

The steps to read CSV files in Python code are as follows:

First, import the csv module:

import csv

Secondly, use the built-in open() function to open the file in read mode:

f = open('path/to/csv_file')

If the CSV file contains UTF8 encoding characters, you can specify the encoding parameter:

f = open('path/to/csv_file', encoding='UTF8')

Then, pass the file object f to the csv module's reader() function, which returns a csv reader object:

csv_reader = csv.reader(f)

csv_reader is a traversable object, which is composed of data rows in CSV file. Therefore, we can use the for loop to traverse the data rows in the CSV file:

for line in csv_reader:
    print(line)

Each row is a list. If you want to access specific data, you can use square brackets ([]) to specify the subscript of the data. The subscript of the first value is 0, the subscript of the second value is 1, and so on.

For example, the following code represents accessing the first value in a row of data:

line[0]

Finally, call the close() method to close the file:

f.close()    

Alternatively, you can use the with statement to close the file automatically. The following is the complete code to read the CSV file:

import csv

with open('path/to/csv_file', 'r') as f:
    csv_reader = csv.reader(f)
    for line in csv_reader:
        # process each line
        print(line)

Examples

Sample file score The contents of CSV are as follows:

id,stu_id,coursename,coursescore
1,1,English,100
2,1,Math,95
3,2,English,96
4,2,Math,95
5,3,English,100
6,3,Math,99
7,4,English,98
8,4,Math,97
9,5,English,99
10,5,Math,95
11,6,English,96
12,6,Math,94
13,7,English,92
14,7,Math,100
15,8,English,97
16,8,Math,95

The following example reads the file and prints the contents of the file:

import csv

with open('score.csv', encoding="utf8") as f:
    csv_reader = csv.reader(f)
    for line in csv_reader:
        print(line)

The output results are as follows:

['id', 'stu_id', 'coursename', 'coursescore']
['1', '1', 'English', '100']
['2', '1', 'Math', '95']
['3', '2', 'English', '96']
['4', '2', 'Math', '95']
['5', '3', 'English', '100']
...

score. The first line in the CSV file is the title. To distinguish the title from the data, we can use the enumerate() function to obtain the subscript of each row:

import csv

with open('score.csv', encoding="utf8") as f:
    csv_reader = csv.reader(f)
    for line_no, line in enumerate(csv_reader, 1):
        if line_no == 1:
            print('Header:')
            print(line)  # header
            print('Data:')
        else:
            print(line)  # data

In the above example, we used the enumerate() function and set the subscript of the first line to 1. Inside the loop, if line_no is 1, indicating the current behavior title; Otherwise, the current row is data. The results of the code output are as follows:

Header:
['id', 'stu_id', 'coursename', 'coursescore']
Data:
['1', '1', 'English', '100']
['2', '1', 'Math', '95']
['3', '2', 'English', '96']
['4', '2', 'Math', '95']
['5', '3', 'English', '100']
...

Another way to skip the title line is to use the next() function, which means to read to the next line. For example:

import csv

with open('score.csv', encoding="utf8") as f:
    csv_reader = csv.reader(f)
    # skip the first row
    next(csv_reader)
    # show the data
    for line in csv_reader:
        print(line)

The following example reads score CSV file and calculate the sum of all grades:

import csv

total_score = 0

with open('score.csv', encoding="utf8") as f:
    csv_reader = csv.reader(f)
    # skip the header
    next(csv_reader)
    # calculate total
    for line in csv_reader:
        total_score += int(line[3])

print(total_score)

The output results are as follows:

1548

DictReader class

When we use CSV The reader () function is that you can use subscripts to access data in CSV files, such as line[0], line[1], and so on. However, using this function has two main limitations:

  • First, the way data is accessed is not obvious. For example, line[3] represents achievement. If we can use line ['courses core'] to access data, the meaning is obviously clearer.
  • Secondly, when the order of fields in the CSV file is changed or new fields are added, we need to modify the code.

The DictReader class can solve these problems, and it also comes from the csv module.

DictReader class can create an object similar to ordinary CSV reader, but it maps the information of each row of data into a dictionary (dict), and the value of key is specified by the first row of data.

By using the DictReader class, we can use line ['stu_id'], line ['coursename'] and other methods to access score Data in CSV file, for example:

import csv

with open('score.csv', encoding="utf8") as f:
    csv_reader = csv.DictReader(f)
    # show the data
    for line in csv_reader:
        print(f"The {line['coursename']} score of {line['stu_id']} is {line['coursescore']}")

The output results are as follows:

The English score of 1 is 100
The Math score of 1 is 95
The English score of 2 is 96
The Math score of 2 is 95
...

If we want to use a custom field name instead of the field name specified in the first line of the CSV file, we can specify it in the DictReader() constructor:

import csv

fieldnames = ['id', 'Student number', 'curriculum', 'achievement']

with open('score.csv', encoding="utf8") as f:
    csv_reader = csv.DictReader(f, fieldnames)
    next(csv_reader)
    for line in csv_reader:
        print(f"The {line['curriculum']} score of {line['Student number']} is {line['achievement']}")

summary

  • csv.reader() function and CSV The dictreader class can be used to read CSV files.

Topics: Python csv