python handles excel (openpyxl, xlrd, xlwt, openpyxl, panda)

Posted by vidyashankara on Fri, 08 May 2020 22:30:08 +0200

1, openpyxl of excel for python operation

Preface

According to the official documents, openpyxl is a third-party library that can handle Excel files in xlsx/xlsm format (A Python library to read/write Excel 2010 xlsx/xlsm files).

There are three main concepts in openpyxl: Workbook, sheet and cell.

Main operations in openpyxl: open Workbook, locate Sheet, and operate Cell.

(1) Support excel format

  • xlsx
  • xlsm
  • xltx
  • xltm

(2) Basic usage

First of all, it introduces some basic concepts of Excel. Workbook is equivalent to a file. WorkSheet is every specific table in the file. For example, 'Sheet1' in a new EXCEL file, and one or more worksheets in a workbook

  • Workbook: a workbook. An excel file contains multiple sheet s.
  • worksheet: there are multiple worksheets in a workbook, with table name identification, such as "sheet1", "sheet2", etc.
  • Cell: cell, storing data objects

1. Install openpyxl

pip install openpyxl

2. Parameter introduction

(1) Common situations

As far as I am concerned, the common situations may be as follows:

  • Read the data of the whole excel sheet page.
  • Read data of specified row and column
  • Write data to a blank excel document
  • Append data to an excel document that already has data

Here are some examples.

(2) Description of modules and functions involved

As far as I know, there are three modules that can operate excel documents, and three modules can be installed directly through pip.

  • xlrd: read data
  • xlwt: write data
  • openpyxl: can read or write data

This is just openpyxl, because this module can meet the above needs.

openpyxl function

function Explain
load_workbook(filename) Open excel and return all sheet pages to access the specified sheet page: * ා open the excel document WB = openpyxl.load ᦇ Workbook (file ᦇ) ා access the sheet Page * sheet = wb ['sheet page name'] ා close the excel document wb.close()
Workbook() Create the excel document wb = openpyxl.Workbook(). Save the excel document wb.save ('filename. xlsx ')
The following functions are used to access the specified cells for sheet = wb ['sheet page name'], sheet['B1 ']
min_row Returns the minimum row index containing the data, starting with 1. For example: sheet.min row
max_row Returns the maximum row index containing data, starting at 1
min_column Returns the minimum column index containing data, starting at 1
max_column Returns the maximum column index containing data, starting at 1
values Get all the data of excel document and return a generator object
iter_rows(min_row=None, max_row=None, min_col=None, max_col=None) Min row: minimum row index Max row: maximum row index min col: minimum column index Max col
title The name of the WorkSheet

Now I have such an excel. Let's use this Excel to explain.

3. Read file properties

import openpyxl

# Open an EXcel document
wb = openpyxl.load_workbook('test.xlsx')
sheet2 = wb['Sheet2']

print('The table name is:',sheet2.title)

print('Rows and columns of data:',sheet2.dimensions)

print('The minimum number of rows is:',sheet2.max_row)

print('The maximum number of rows is:',sheet2.min_row)

print('The maximum number of columns is:',sheet2.max_column)

print('The minimum number of columns is:',sheet2.min_column)

# Get the specified cell
print('Cell A1: ',sheet2['A1'])
print('Cell B1: ',sheet2.cell(row=1, column=2))

# Get row
print('That's ok:',sheet2.rows)

# Get column
print('Column:',sheet2.columns)

# Get all the data
print('All data:',sheet2.values)

# View Excel document properties
print('Read only:',wb.read_only)
print('Document properties:',wb.properties)

#Character set format of document
print('Character set format:',wb.encoding)

# Get active worksheet
print('Active worksheet:',wb.active)

# Get all worksheets
print('All worksheets:',wb.worksheets)

# Get the names of all worksheets
print('Output file all sheet names:', wb.sheetnames)

# Get the worksheet object according to the table name, case sensitive
print('Table name:',wb['Sheet2'])
The output result is:
Table name: Sheet2
 Number of rows and columns of data: A1:C8
 The minimum number of rows is: 8
 The maximum number of rows is: 1
 Maximum number of columns: 3
 The minimum number of columns is: 1
 Cell A1: & lt; cell 'SHEET2'. A1 & gt;
Cell B1: & lt; cell 'SHEET2'. B1 & gt;
Line: & lt; generator object worksheet.] cells by row at 0x000002612197f2e0 & gt;
Column: & lt; generator object worksheet. \;
All data: & lt; generator object worksheet.values at 0x000002612197f2e0 & gt;
Read only: False
 Document properties: & lt; openpyxl.packaging.core.documentproperties object & gt;
Parameters:
creator = 'small steel gun', title = none, description = none, subject = none, identifier = none, language = none, created = datetime.datetime (2015, 6, 5, 18, 17, 20), modified = datetime.datetime (2020, 5, 7, 4, 9, 19), LastModified by =, category = none, contentstatus = none, version = none, revision = none, keywords = none, lastprinted = none
 Character set format: utf-8
 Active worksheet: & lt; worksheet "score sheet" & gt;
All worksheets: [& lt; worksheet "SHEET2" & gt;, & lt; worksheet "sheet3" & gt;, & lt; worksheet "gradesheet" & gt;]
Output file all sheet names: ['sheet2 ',' sheet3 ',' grade sheet ']
Table name: & lt; worksheet "SHEET2" & gt;

4. Read file contents

import openpyxl

wb = openpyxl.load_workbook('test.xlsx')
sheet2 = wb['Sheet2']

# Get cell contents
# Mode 1
print('Method 1')
for row in sheet2.values:
    print(*row)
print('=========================')
# Mode 2
print('Method 2')
for row in sheet2.rows:
    print(*[cell.value for cell in row])
print('=========================')
# Mode 3
print('Method 3')
for row in sheet2.iter_rows():
    print(*[cell.value for cell in row])
print('=========================')
# Mode 4 (most complex and original)
print('Method 4')
for i in range(sheet2.min_row, sheet2.max_row + 1):
    for j in range(sheet2.min_column, sheet2.max_column + 1):
        print(sheet2.cell(row=i,column=j).value,end=' ')
    print()
The output result is:
Method 1
 No. Name Age
 1 what happened 25
 2 floating vision 26
 3 okay 27
 4 Reply 28
 5 good words 29
 6 sets of nets 30
 7 too gentle 31
=========================
Method 2
 No. Name Age
 1 what happened 25
 2 floating vision 26
 3 okay 27
 4 Reply 28
 5 good words 29
 6 sets of nets 30
 7 too gentle 31
=========================
Method 3
 No. Name Age
 1 what happened 25
 2 floating vision 26
 3 okay 27
 4 Reply 28
 5 good words 29
 6 sets of nets 30
 7 too gentle 31
=========================
Method 4
 No. Name Age 
1 what happened 25 
2 floating vision 26 
3 okay 27 
4 Reply 28 
5 good words 29 
6 sets of nets 30 
7 too gentle 31 

5. Delete and create tables

import openpyxl

# Open an EXcel document
wb = openpyxl.load_workbook('test.xlsx')
sheet2 = wb['Sheet2']

# Delete table
sheet1 = wb.get_sheet_by_name('Sheet1')
wb.remove_sheet(sheet1)

# Save changes to workbook
wb.save('test.xlsx')

# Create a new worksheet
wb.create_sheet('Score sheet')

# Save changes to workbook
wb.save('test.xlsx')

6. Store student grades in Excel

import openpyxl

# Open workbook
wb = openpyxl.load_workbook("Score sheet.xlsx")

# #Create a score sheet
# WB. Create "sheet
#
# # Delete table
# wb.remove_sheet(wb['Sheet1'])

# Get "student transcript"
score = wb["Student transcript"]
# title = ['serial number', 'name', 'language', 'Mathematics']
# no = range(6)
# names = ['Zhang San', 'Li Si', 'Wang Wu', 'Zhao Liu', 'Tian Qi']
# wen = [80,88,85,81,89]

# First row of data
score['A1'].value = 'Serial number'
score['B1'].value = 'Full name'
score['C1'].value = 'Chinese'
score['D1'].value = 'Mathematics'
# Second row of data
score['A2'].value = int('1')
score['B2'].value = 'Zhang San'
score['C2'].value = int('52')
score['D2'].value = int('64')

# Third row of data
score['A3'].value = int('2')
score['B3'].value = 'Li Si'
score['C3'].value = int('28')
score['D3'].value = int('95')

# Save workbook
wb.save('Score sheet.xlsx')

7. Insert sheet contents

#coding=utf-8
import openpyxl

def process_worksheet(sheet):
    # Column with total score
    sum_column = sheet.max_column + 2
    # Column with average score
    avg_column = sheet.max_column + 1
    # Save total and average scores to the last two columns
    for row in sheet.iter_rows(min_row=2, min_col=3):
        # Cell
        score = [cell.value for cell in row]
        # Total score
        sum_score = sum(score)
        # average
        avg_score = sum_score / len(score)
        # Save total and average scores to the last two columns
        sheet.cell(row=row[0].row, column=avg_column).value = avg_score
        sheet.cell(row=row[0].row, column=sum_column).value = sum_score

    # Set the title of the average score and total score
    sheet.cell(row=1, column=avg_column).value = "average"
    sheet.cell(row=1, column=sum_column).value = "Total score"

def main():
    # Open Excel document
    wb = openpyxl.load_workbook("achievement.xlsx")
    # Get a worksheet
    sheet = wb["Score sheet"]
    #    Insert the sheet in the external excel file (score table) into the current Excel
    process_worksheet(sheet)
    # Save exercise.xlsx
    wb.save("Practice-copy.xlsx")

if __name__ == '__main__':
    main()
View data for exercise - copy.xlsx

2, Python reading and writing Excel files (three modules and three ways

There are many ways for python to read and write excel. Different modules have slightly different ways of speaking in reading and writing:

  • Using xlrd and xlwt to read and write excel;
  • Using openpyxl to read and write excel;
  • Using pandas to read and write excel;

pandas introduction

  • Pandas is a data analysis package of Python, which is created to solve data analysis tasks.
  • Pandas integrates a large number of libraries and standard data models to provide the tools needed to operate data sets efficiently.
  • Pandas provides a large number of functions and methods that enable us to process data quickly and easily.
  • Pandas is a dictionary form, which is created based on NumPy, making NumPy centric applications easier.

1. xlrd module

Xlrd is used to read and write data from excel, but I usually only use it to read and write data. There are some problems in writing data. It is convenient to read with xlrd. The process is the same as usual manual operation of Excel. Open workbook, select sheets, and then operate cells. For example, to open an excel file named "data.xlsx" in the current directory, select the first worksheet, read all the contents of the first row and print it out.

#Open excel file
data=xlrd.open_workbook('data.xlsx')     
#Get the first sheet (by index)
table=data.sheets()[0] 
#Data list is used to store data
data_list=[]    
#Read and add the data in the first row of the table to the data menu list
data_list.extend(table.row_values(0))
#Print out all data of the first line
for item in data_list:
    print item

In the above code, table.row'u values (number) is used to read a row, and table.column'u values (number) is used to read a column. Number is the row index. In xlrd, both rows and columns are indexed from 0. Therefore, cell A1 in the top left corner of Excel is row 0 and column 0.

In xlrd, you can use table.cell(row,col) to read a cell, where row and col are the corresponding rows and columns of the cell.
Here is a brief summary of the use of xlrd

(1) Install xlrd module

Download the http://pypi.python.org/cmdpypi/xlrd module on the python official website for installation, provided that the python environment has been installed.

pip install xlrd

(2) Tips

  • sheet.name: the name of sheet
  • sheet.nrows: number of sheet rows
  • sheet.ncols: number of sheet columns
  • Sheet. Get rows(): returns an iterator, traverses all rows, and gives a list of values for each row
  • Sheet. Row? Values (index): returns a list of values for a row
  • sheet.row(index): returns a row object. You can get the cell object in this row through row[index]
  • sheet.col_values(index): returns a list of values in a column
  • sheet.cell(row,col): get a cell object (row and col start from 0
table = data.sheets()[0]          #Get by index order
table = data.sheet_by_index(0) #Get by index order
table = data.sheet_by_name(u'Sheet1')#Get by name

# Get values (arrays) for entire rows and columns
table.row_values(i)
table.col_values(i)
 
# Get the number of rows and columns
nrows = table.nrows 
ncols = table.ncols
       
# Circular row list data
for i in range(nrows):
       print table.row_values(i)
 
# Cell
cell_A1 = table.cell(0,0).value
cell_C4 = table.cell(2,3).value
 
# Use row and column index
cell_A1 = table.row(0)[0].value
cell_A2 = table.col(1)[0].value
 
# Simple write
row = 0
col = 0
 
# Type 0 empty,1 string, 2 number, 3 date, 4 boolean, 5 error
ctype = 1 value = 'Cell value'
xf = 0 # Extended formatting
table.put_cell(row, col, ctype, value, xf)
table.cell(0,0)  #Cell value '
table.cell(0,0).value #Cell value '

(3) View file data

import xlrd

book = xlrd.open_workbook('Practice-copy.xlsx')
sheet1 = book.sheets()[0]

nrows = sheet1.nrows
print('Total number of table lines:', nrows)

ncols = sheet1.ncols
print('Total columns:', ncols)

row3_values = sheet1.row_values(2)
print('Value in line 3:', row3_values)

col3_values = sheet1.col_values(2)
print('Values in column 3:', col3_values)

cell_2_2 = sheet1.cell(2,2).value
print('3 Value of row 3 column',cell_2_2)
The output is as follows
Total number of table lines: 8
 Total number of columns in the table: 3
 Value in line 3: [2.0, 'floating view', 26.0]
Values in column 3: ['age ', 25.0, 26.0, 27.0, 28.0, 29.0, 30.0, 31.0]
Value of 3 rows and 3 columns 26.0

2. xlwt module

pip install xlwt

If xlrd is not a simple Reader (if the last two characters in xlrd are regarded as readers, then the last two characters in xlwt are similar to writers), then xlwt is a pure Writer, because it can only write to Excel. Xlwt and xlrd are not only names, but also many functions and operation formats. Here is a brief summary of common operations

(1) xlwt common operations

Create a new Excel file (can only be written by creating a new one)

data=xlwt.Workbook()

Create a new sheet

table=data.add_sheet('name')

Write data to cell A1

table.write(0,0,u'Ha-ha')

Note: if you repeat the operation on the same cell, an overwrite Exception will be raised. To cancel this function, you need to specify it as overwritable when adding a worksheet, as follows

table=data.add_sheet('name',cell_overwrite_ok=True)

Save file

data.save('test.xls')

Only the extension xls can be saved here. The format of xlsx is not supported

xlwt supports certain styles. The operations are as follows

#Initialize style
style=xlwt.XFStyle()

#Create fonts for styles
font=xlwt.Font()

#Specify font name
font.name='Times New Roman'

#Bold font
font.bold=True

#Set the font to style font
style.font=font

#Use this style when writing to a file
sheet.write(0,1,'just for test',style)

(2) Examples

import xlwt  # It seems that Excel 2007's xlsx format is not supported

wb = xlwt.Workbook()
wb_sheet = wb.add_sheet('ddd')
wb_sheet.write(0,0,'Test content')
wb.save('d.xls')
View d.xls file

3. openpyxl module

pip install openpyxl

This module supports the latest version of Excel file format, and has read and write operations that respond to Excel files. There are two special classes for this, Reader and Writer, which are convenient for the operation of Excel files. However, I usually use the default workbook to operate. Common operations are summarized as follows:

(1) openpyxl common operations

Read Excel file

from openpyxl.reader.excel import load_workbook

wb=load_workbook(filename)

Show the index range of the sheet

wb.get_named_ranges()

Show all sheet names

wb.get_sheet_names()

Get the first sheet

sheetnames = wb.get_sheet_names()  
ws = wb.get_sheet_by_name(sheetnames[0])

Get table name

ws.title

Get the number of rows in the table

ws.get_highest_row()

Get the number of columns in the table

ws.get_highest_column()

Cell reading is very similar to that of xlrd, which is read through the indexes of rows and columns

#Read from cell B1
ws.cell(0,1).value

Of course, it also supports reading data through Excel coordinates. The code is as follows

#Read from cell B1
ws.cell("B1").value

(2) Examples

import pandas as pd
from pandas import DataFrame

# df = pd.read_excel(r 'exercise - copy.xlsx',sheet_name = 'student transcript')
# print(df.head())

data = {
        'name':['Zhang San','Li Si','Wang Wu'],
        'age':[11,12,13],
        'sex':['male','female','Unknown'],
}

df = DataFrame(data)
df.to_excel('new.xlsx')
View the new.xlsx file

4. Cases

Extract two tables from the original Excel document provided into another new Excel document, and arrange them horizontally

The code is as follows:

import openpyxl
import pandas
from pandas import DataFrame

wb = openpyxl.load_workbook('Live Database and table structure.xlsx')
sheet = wb['Sheet1']

aa = []
for row1 in sheet.iter_rows(min_row=3,max_row=6):
    score1 = [cell.value for cell in row1]
    aa.append(score1)
df = DataFrame(aa)
df.to_excel('xgp.xlsx')

bb = []
for row2 in sheet.iter_rows(min_row=8,max_row=14):
    score2 = [cell.value for cell in row2]
    bb.append(score2)

df = pandas.DataFrame(bb)
book = openpyxl.load_workbook('xgp.xlsx')
with pandas.ExcelWriter('xgp.xlsx')as E:
    E.book = book
    E.sheets = dict((ws.title, ws) for ws in book.worksheets)
    df.to_excel(E,sheet_name='Sheet1',index=False,startcol=12)
View the xgp.xlsx file

5. Summary

When reading Excel, there is little difference between openpyxl and xlrd, which can meet the requirements
When writing a small amount of data and saving it as xls format file, it is more convenient to use xlwt
When you write a large amount of data (exceeding the xls format limit) or you have to save it as an xlsx format file, you need to use openpyxl.

Topics: Python Excel pip encoding