Generate 300,000 excel data in Python 3 (xlsx format)

Posted by anon_login_001 on Mon, 11 May 2020 03:09:31 +0200

In system testing of B/S architecture, it is sometimes necessary to import Excel files to generate some data records. When the amount of data is small, there is usually no problem. When the amount of data imported is large, the performance of the system is a test.To verify the performance of your system, sometimes you need to import a large amount of data, such as 300,000 data records, but you don't have that much data at hand.Copy and paste one by one, or drag excel down to generate records?This will kill the tester, and the tedious, repetitive work will be very interesting.With python, we can easily generate large amounts of data, free people from monotonous, repetitive work, and do more meaningful things.

For example, we need to generate the following data.

This is sample data of user information, which contains several information such as number, name, gender, hobby, favorite movie, music and so on, among which only different numbers in columns A, I and L need to be checked.Other data is not of much concern in performance testing, so we can use rules to generate some data, where only column A, I, L are different and other information remains unchanged.

Idea: Save the header and sample data in a sample file, read the sample content by xlrd, and write the header and different information by rule by xlsxwriter.

Say nothing more, let's go directly to the code:

 1 import xlrd
 2 import xlsxwriter
 3 #File name and path, preceded by one r Prevent unnecessary escaping.
 4 filename=r'D:\001\example.xlsx'
 5 data = xlrd.open_workbook(filename)
 6 # Get the first sheet page
 7 table = data.sheets()[0]
 8 # Get row 2 (article 1) data
 9 content=table.row(1)
10 print('Article 1 Sample data are:',content)
11 # Pass above print You can see what is read directly, although it is list,But each data is preceded by
12 # text:Word, not directly strong conversion tuple For our use, we customize a conversion method to make it usable list So that the later strong transition tuple
13 def convertRowToTuple(rowNum):
14     data=[]
15     # len(content)Is the number of columns in the row obtained above
16     for i in range(len(content)):
17         data.append(table.cell_value(rowNum,i))
18     # Read out the data in each cell and add it to the data this list Medium and Strong Tuple Return
19     return tuple(data)
20 # Row 0 (i.e. excel Data is read and tupled for assignment to header
21 header=convertRowToTuple(0)
22 # Line 1 (i.e. excel Sample data is read and tupled for assignment to data1
23 data1=convertRowToTuple(1)
24 # print(header)
25 # print(data1)
26 # File name to write data to
27 filename2=r'D:\001\TestDatas2.xlsx'
28 # If you have a very large amount of data, you can enable constant_memory,This is a sequential write mode, where you get a row of data and write it immediately, without keeping all the data in memory.
29 # If this mode is not enabled, the program will most likely get stuck when there is a large amount of data
30 workbook = xlsxwriter.Workbook(filename2, {'constant_memory': True})
31 # Create a new sheet page
32 worksheet = workbook.add_worksheet()
33 # startNum Indicates the line from which to start writing, where the number starts at 1, because it is followed by a combination of letters excel Medium, such as A1 Represents line 1 A column
34 startNum=1
35 # Initial values, followed by numbers that increase on top of them
36 startValues=['000001','245353','24289796']
37 # Columns corresponding to initial values
38 col=['A','I','L']
39 #Write header to file first
40 worksheet.write_row('A'+str(startNum), header)
41 # Formally start writing data
42 for i in range(300000):
43     # To keep the build process from getting bored, add this print information to view the progress
44     print('Creating #',i+1,'Bar data')
45     # The header occupies the first row, so the first data starts at the second row when i=0 When writing from A2 start
46     # take data1 Write data in turn A2,B2,C2......
47     worksheet.write_row('A'+str(startNum+i+1), data1)
48     # Previously it was equivalent to copying everything from the first data, but A,I,L The three columns need to be in the lower row, so we'll rewrite each row A I L Values in Cells
49     for m in col:
50         length=len(startValues[col.index(m)])
51         # Numbers are added with initial values i
52         content=str(int(startValues[col.index(m)])+i)
53         # Because a number like 000001 will lose the previous zero in the calculation, to keep the number of digits, the lost zero will be made up for it
54         if(len(content)<length):
55             content='0'*(length-len(content))+content
56         # Write the calculated value into the corresponding cell
57         worksheet.write(m+str(startNum+i+1), content)
58 # Close after writing workbook,Otherwise, an error will be made
59 workbook.close()


Next, let's make a cup of tea and wait for the data to be generated successfully.

After testing, it only takes about 4 minutes to generate 300,000 pieces of data!

Topics: Excel Python