Python operation Excel library summary

Posted by radi8 on Mon, 17 Jan 2022 11:38:24 +0100

Several common Excel libraries openpyxl, xlrd/xlwt, xlwings and xlsxwriter are explained in detail.

In order to further understand the similarities and differences of each library, so that it can be used flexibly in different scenarios, this paper will horizontally compare 7 common modules that can operate Excel files, and consolidate learning while comparing the common operations of each module!

First, let's grasp the characteristics of different libraries as a whole

"
  1. Xlrd, xlwt and xlutils have their own limitations, but they complement each other and cover Excel files, especially xls file operation. Xlwt# can be generated xls , file, xlrd , can read existing files xls , file, xlutils , connects two modules , xlrd , and , xlwt , so that users can read and write one at the same time xls file. In short, xlrd is responsible for reading, xlwt is responsible for writing, and xlutils is responsible for providing assistance and connection

  2. xlwings , can easily read and write the data in Excel files, and can modify the cell format

  3. XlsxWriter , is a tool for writing xlsx} file format module. It can be used to write text, numbers and formulas, and supports cell formatting, pictures, charts, document configuration, automatic filtering and other features. But it cannot be used to read and modify Excel files

  4. openpyxl} through the workbook "workbook - sheet - cell" mode xlsx# files can be read, written and changed, and the style can be adjusted

  5. pandas is no stranger to us. It is a powerful module for data processing and analysis. Sometimes it can also be used to automate Excel processing

"

If you are too lazy to look at the detailed comparison process, you can directly look at the final summary chart, and then pull it to the end of the article to collect some praise, even if you learn it

1, Installation

All 7 modules are non-standard libraries, so they need to be installed in {pip} on the command line:

pip install xlrd
pip install xlwt
pip install xlutils
pip install xlwings
pip install XlsxWriter
pip install openpyxl
pip install pandas

Two. Module import

Most modules can be imported directly by name, and some modules use abbreviations:

import xlrd
import xlwt
import xlwings as xw
import xlsxwriter
import openpyxl
import pandas as pd

The xlutils  module is the bridge between  xlrd  and  xlwt  and its core function is to copy a copy read into memory through  xlrd  Xls , object, and then copy the object through , xlwt , modification The contents of the xls# table. Xlutils , can copy and convert the Book object of , xlrd , into the Workbook object of , xlwt , and usually import the , copy , sub module in the module:

import xlutils.copy

3, Read Excel file

3.1 obtaining documents

Not all 7 modules can read Excel files, and even if they can read Excel files, they should be discussed by different suffixes, as follows:

"
  1. xlwt, xlutils, XlsxWriter cannot read the file

  2. xlrd# can be read xls # and xlsx file

  3. xlwings # can be read xls # and xlsx file

  4. openpyxl} can read xlsx file

  5. pandas} can be read xls # and xlsx file

"

The following uses two 10MB xls # and xlsx} file to test:

xls_path = r'C:\xxx\Desktop\test.xls'
xlsx_path = r'C:\xxx\Desktop\test.xlsx'

3.1.1}xlrd# reading files

xlrd# can be read xls # and xlsx file

xls = xlrd.open_workbook(xls_path)
xlsx = xlrd.open_workbook(xlsx_path)

3.1.2 # xlwings # reading files

xlwings , directly connects to apps, that is, excel applications, and then work books and worksheet sheets. xlwings , requires an environment with Excel applications installed. xlwings , can read xls # and xlsx file

app = xw.App(visible=True, add_book=False) #The program is visible. Only open without creating a new workbook
app.display_alerts = False #Warning off
app.screen_updating = False #Screen update off
# wb = app.books.open(xls_path)
wb = app.books.open(xlsx_path)
wb.save() #Save file
wb.close() #Close file
app.quit() #Close program

3.1.3 openpyxl read file

openpyxl} can read xlsx file

wb = openpyxl.load_workbook(xlsx_path)

If read xls# file will report an error:

wb = openpyxl.load_workbook(xls_path)

openpyxl.utils.exceptions.InvalidFileException: openpyxl does not support the old .xls file format, please use xlrd to read this file, or convert it to the more recent .xlsx file format.

3.1.4 # pandas # read files

pandas} can be read xls # and xlsx file

xls = pd.read_excel(xls_path, sheet_name='Sheet1')
xlsx = pd.read_excel(xlsx_path, sheet_name='Sheet1')

Next, compare the four modules and read 10MB under the same configuration computer The time of xlsx# file (running for 3 times to get the average value), and the code used is:

import time
import xxx

time_start = time.time()
xxx
time_end = time.time()
print('time cost: ', time_end-time_start, 's')

The final test result is that xlwings , is the fastest to read 10MB files, xlrd , is the second, and openpyxl , is the slowest (depending on the computer, the results are for reference only)

The table read into the Excel file is summarized as follows:

3.2 obtaining worksheets

For the above four modules that can read Excel files, the way to obtain worksheet sheet is further discussed

3.2.1}xlrd) get worksheet

You can find by sheet name:

sheet = xlsx.sheet_by_name("Sheet1")

You can also find by index:

sheet = xlsx.sheet_by_index(0)

3.2.2 # xlwings # get worksheet

xlwings # worksheets are divided into active worksheets and specific worksheets under the specified Workbook:

sheet = xw.sheets.active  #In the active workbook
sheet = wb.sheets.active  #In a specific Workbook

3.2.3 openpyxl get worksheet

The. active method gets the first worksheet of the workbook by default

sheet = wb.active

In addition, you can also obtain the worksheet by specifying the worksheet name:

sheet = wb['Sheet1']

3.2.4 # pandas # get worksheet

There is no {pandas} thing about getting the worksheet alone, because the worksheet has been and must be specified to read when reading the file:

xlsx = pd.read_excel(xlsx_path, sheet_name='Sheet1')

4, Create Excel file

Briefly summarize the creation of Excel files:

"
  1. xlrd, xlutils , cannot create Excel file

  2. xlwt} can only be created xls file, cannot be created xlsx file

  3. xlwings can be created xls # and xlsx file

  4. XlsxWriter , can create xlsx file

  5. openpyxl} can create xls # and xlsx file

  6. pandas} does not have the concept of creating Excel, but can be generated when stored xls or xlsx file

"

4.1}xlwt} creating files

xlwt} can only be created xls file, cannot be created xlsx file

xls = xlwt.Workbook(encoding= 'ascii')
#Create a new sheet table
worksheet = xls.add_sheet("Sheet1")

4.2 # xlwings # create files

xlwings can be created xls # and xlsx# file, you only need to write the suffix clearly when saving it last. Use the following command:

wb = app.books.add()

Whether you create or open a new workbook, you need to save the workbook, close the workbook, and close the program, that is:

wb.save(path + r'\new_practice.xlsx') 
wb.close() 
app.quit() 

4.3. XlsxWriter create file

XlsxWriter , can create xlsx} file:

xlsx = xlsxwriter.Workbook()   
#Add sheet
sheet = xlsx .add_worksheet('Sheet1')

4.4 openpyxl file creation

openpyxl can create xls # and xlsx# file, you only need to write the suffix clearly when saving it last. Use the following command:

wb = Workbook()
#The worksheet is created as specified in the new workbook
sheet = wb.active

4.5. pandas} create file

Pandas # only need to write the suffix clearly at the last transfer. In fact, pandas} does not need to create an excel file at the beginning. It can be used after various operations around the data frame to_ Use Excel command again xls # or xlsx# as file suffix. If you must generate a blank excel file, you can use the following command:

df = pd.DataFrame([])
df.to_excel(r'C:\xxx\test1.xlsx')

5, Save file

Briefly summarize the saving of Excel files:

"
  1. xlrd# cannot save Excel file

  2. xlwt} can be saved xls file

  3. xlutils , can copy , xlrd , objects as , xlwt , objects and save them xls file

  4. xlwings can be saved xls # and xlsx file

  5. XlsxWriter} can save xlsx file

  6. openpyxl} can be saved xlsx file

  7. pandas # can be saved xls # or xlsx file

"

5.1}xlwt} saving documents

xlwt} can be saved xls file

# xls = xlwt.Workbook(encoding= 'ascii')
# worksheet = xls.add_sheet("Sheet1")
xls.save("new_table.xls")

5.2 xlutils save files

xlutils , can copy , xlrd , objects as , xlwt , objects and save them xls file

# xls_path = r'C:\xxxx\test.xls'
# xls = xlrd.open_workbook(xls_path)
xls_xlutils = xlutils.copy.copy(xls)
xls_xlutils.save('new_text.xls')

5.3 # xlwings# save files

xlwings can be saved xls # and xlsx file

# wb = app.books.open(xls_path)
wb = app.books.open(xlsx_path)
wb.save() #Save file
wb.close() #Close file
app.quit() #Close program

5.4 XlsxWriter save file

XlsxWriter} can save xlsx# file After the close command is executed, the file is saved while it is closed:

# xlsx = xlsxwriter.Workbook()
# sheet = xlsx .add_worksheet('Sheet1')
xlsx.close()

5.5 openoyxl saving files

openpyxl} can be saved xlsx file

# wb = openpyxl.load_workbook(xlsx_path)
# wb = Workbook()
# sheet = wb.active
wb.save('new_test.xlsx')

6. pandas} save files

pandas # can be saved xls # or xlsx file

df1 = pd.DataFrame([1, 2, 3])
df2 = pd.DataFrame([1, 2, 4])
df1.to_excel(r'C:\xxxx\test1.xls')
df2.to_excel(r'C:\xxxx\test2.xlsx')

6, Gets the value of the cell

The basic premise of obtaining cell values is to be able to read files, so it is basically introduced around {xlrd, xlwings, openpyxl and pandas}. xlutils due to the ability to copy one Xls , therefore, you can use exactly the same method of reading cells as xlrd , xls , xlrd ,.

6.1. xlrd/xlutils get cells

xlutils , because it is a direct copy of an object that , xlrd , applies to, the method used to read cells is exactly the same as , xlrd ,. xlwt# does not have the ability to read cells

# xls = xlrd.open_workbook(xls_path)
# sheet = xlsx.sheet_by_name("Sheet1")
value = sheet.cell_value(4, 6) #Cell in row 5, column 7
print(value)
rows = table.row_values(4)
cols = table.col_values(6)
for cell in rows:
    print(cell)

6.2. xlwings , get cells

# app = xw.App(visible=True, add_book=False) 
# app.display_alerts = False 
# app.screen_updating = False 
# wb = app.books.open(xls_path)
# sheet = wb.sheets.active

#Gets the value of a single cell
A1 = sheet.range('A1').value
print(A1)
#Gets the value of multiple horizontal or vertical cells and returns a list
A1_A3 = sheet.range('A1:A3').value
print(A1_A3)
#Get the values of multiple cells in a given range, return a nested list, and list by behavior
A1_C4 = sheet.range('A1:C4').value
print(A1_C4)
#Gets the value of a single cell
A1 = sheet.range('A1').value
print(A1)
#Gets the value of multiple horizontal or vertical cells and returns a list
A1_A3 = sheet.range('A1:A3').value
print(A1_A3)
#Get the values of multiple cells in a given range, return a nested list, and list by behavior
A1_C4 = sheet.range('A1:C4').value
print(A1_C4)

6.3 openpyxl get cells

# wb = openpyxl.load_workbook(xlsx_path)
# wb = Workbook()
# sheet = wb.active

#I. value of specified coordinate range
cells = sheet['A1:B5']
#II. Specify the value of the column
cells = sheet['A']
cells = sheet['A:C']
#III. value of specified line
cells = sheet[5]
cells = sheet[5:7]
#Gets the value of the cell
for cell in cells:
    print(cell.value)

6.4 , pandas , get the value of the cell

Pandas: after reading the Excel file, convert it into a data frame object. The method of parsing the content is basically the knowledge points in the pandas system, such as iloc() . loc() . (IX) etc.:

print(df1.iloc[0:1, [1]])
print(df1.loc['b'])
print(df2.ix['a', 'a']) #Some versions cancel ix and can use iat

7, Write data

Let's briefly summarize the writing of data to Excel files:

"
  1. xlrd# cannot write data

  2. xlwt# can write data

  3. xlutils , can borrow the , xlwt , method to write data

  4. xlwings can write data

  5. XlsxWriter # can write data

  6. openpyxl} can write data

  7. pandas} after reading Excel files as data frames, it abstracts the data frame level for operation, and there is no concept of writing and modifying excel cells

"

7.1. xlwt/xlutils write data

# xls = xlrd.open_workbook(xls_path)
# xls_xlutils = xlutils.copy.copy(xls)
# sheet = xls_xlutils.sheet_by_name("Sheet1")
# value = sheet.cell_value(4, 6)
# print(value)
sheet.write(4, 6, "New content")

7.2 # xlwings # write data

# app = xw.App(visible=True, add_book=False) 
# app.display_alerts = False 
# app.screen_updating = False 
# wb = app.books.open(xls_path)
# sheet = wb.sheets.active

#Write 1 cell
sheet.range('A2').value = 'Daming'
#Write one row or column to multiple cells
#Horizontal write A1:C1
sheet.range('A1').value = [1,2,3]
#Longitudinal write A1:A3
sheet.range('A1').options(transpose=True).value = [1,2,3]
#Write multiple cells in range
sheet.range('A1').options(expand='table').value = [[1,2,3], [4,5,6]]

7.3 , XlsxWriter , write data

New in code_ Format , is the preset style, which will be described below

# xlsx = xlsxwriter.Workbook()
# sheet = xlsx .add_worksheet('Sheet1')

#I. write a single cell
sheet.write(row, col, data, new_format)
#A1: insert data from cell A1 and insert by row
sheet.write_row('A1', data, new_format)
#A1: insert data from cell A1 and insert by column
sheet.write_column('A1', data, new_format)

7.4. openpyxl} write data

# wb = openpyxl.load_workbook(xlsx_path)
# wb = Workbook()
# sheet = wb.active

#I. write cell
cell = sheet['A1']
cell.value = 'Business requirements'
#II. Write one or more lines of data
data1 = [1, 2, 3]
sheet.append(data1)
data2 = [[1, 2, 3], [4, 5, 6]]
sheet.append(data2)

8, Style adjustment

Simply summarize the adjustment of Excel file style:

"
  1. xlrd and xlutils , can't adjust the style (it can also be said that xlutils , can, but it just borrows the method of , xlwt ,)

  2. xlwt# you can adjust the style

  3. xlwings can adjust the style

  4. XlsxWriter} can adjust the style

  5. openpyxl# can adjust the style

  6. pandas} cannot adjust style

"

8.1}xlwt} adjust style

xlwt# supports adjusting font, border, color and other styles

#Font part
#Initialize style
style1 = xlwt.XFStyle()
#Create fonts for styles
font = xlwt.Font()
font.name = 'Times New Roman'   #typeface
font.bold = True                #Bold
font.underline = True           #Underline
font.italic = True              #Italics
#Set style
style1.font = font
#Use style
sheet.write(4, 6, "New content 1", style1)

#Border part
borders = xlwt.Borders()
#Set Linetype
borders.left = xlwt.Borders.DASHED
borders.right = xlwt.Borders.DASHED
borders.top = xlwt.Borders.DASHED
borders.bottom = xlwt.Borders.DASHED
#Set Swatch
borders.left_colour = 0x40
borders.right_colour = 0x40
borders.top_colour = 0x40
borders.bottom_colour = 0x40
# 
style2 = xlwt.XFStyle()
style2.borders = borders
#Use style
sheet.write(5, 8, "New content 2", style2)

8.2 # xlwings # adjust style

Briefly introduce # xlwings # the adjustment of color:

#Get color
print(sheet.range('C1').color)
#Set color
sheet.range('C1').color = (255, 0, 120)
#Clear color
sheet.range('C1').color = None

8.3 XlsxWriter adjustment style

XlsxWriter , contains a large number of functions. You can modify the style of high definition customization on the worksheet after creating the worksheet:

new_format = xlsx.add_format({
        'bold':  True,  #Bold font
        'border': 1,  #Cell border width
        'align': 'left',  #Horizontal alignment
        'valign': 'vcenter',  #Vertical alignment
        'fg_color': '#F4B084',  #Cell background color
        'text_wrap': True  #Auto wrap
    })

sheet.write(row, col, data, new_format)

8.4 openpyxl adjust style

openpyxl} styles mainly include font, border, paragraph alignment style, etc

#Font style
from openpyxl.styles import Font 
cell = sheet['A1'] 
font = Font(name='Arial', size=12, bold=True, italic=True, color='FF0000') 
cell.font = font

#Paragraph alignment
from openpyxl.styles import Alignment 
cell = sheet['B2'] 
alignment = Alignment(horizontal='center', vertical='center',                     text_rotation=45, wrap_text=True) 
cell.alignment = alignment 

#Border style
from openpyxl.styles import Side, Border 
cell = sheet['B2'] 
side1 = Side(style='thin', color='FF0000') 
side2 = Side(style='dashed') 
border = Border(left=side1, right=side1, top=side2, bottom=side2) 
cell.border = border

9, Insert picture

Briefly summarize the situation of inserting pictures into Excel files:

"
  1. xlrd and xlutils , can't adjust the style (it can also be said that xlutils , can, but it just borrows the method of , xlwt ,)

  2. xlwt# can be inserted bmp} picture

  3. xlwings can insert pictures

  4. XlsxWriter , can insert pictures

  5. openpyxl# can insert pictures

  6. pandas # cannot insert picture

"

9.1 # xlwt# insert picture

xlwt# inserting a picture requires that the picture format must be bmp} format can be inserted successfully

sheet.insert_bitmap("test.bmp", 2, 3, 2, 2, 0.5, 0.5)

insert_bitmap(img, x, y, x1, y1, scale_x, scale_y)img_represents the image address to be inserted, x_represents the row, y_represents the column x1, y1_represents the pixel scale shifted downward and right from the original position_ x scale_ Y represents the ratio of width to height relative to the original image, and the image can be enlarged or reduced

9.2 # xlwings # insert picture

The following is the code for inserting pictures with xlwings , where you can specify the location

sheet.pictures.add(r'C:\\xxx.jpg')
#You can also insert at a given location
sheet.pictures.add(r'C:\\xxx.jpg', left=sheet.range('A2').left, top=sheet.range('A2').top, width=100, height=100)

9.3 XlsxWriter insert picture

The first parameter is the starting cell to insert, and the second parameter is the absolute path of the picture file

sheet.insert_image('A1', r'C:\\xxx.jpg')

9.4 openpyxl insert picture

openpyxl can also insert specified pictures into Excel and modify the size

from openpyxl.drawing.image import Image
img = Image('test.jpg')
newsize = (180, 360) 
img.width, img.height = newsize #Set the width and height of the picture
sheet.add_image(img, 'A2') #Insert picture into A2 cell

Summary

The above is all about the comparison of common Excel operations according to different Python modules. The final results are summarized in the table below

Topics: Python