[JAVA] poi parsing Excel table -- batch importing data

Posted by brooksh on Tue, 23 Nov 2021 20:57:38 +0100

1. Brief overview 😵

It's better to teach people to fish than to teach people to fish. I also spent a lot of time querying information and learning this. This article only represents my personal views and positions. If there are mistakes, please comment and point out the exchange. In order to take care of novices and understand the tutorials, all tutorials will be wordy

At present, there are two popular frameworks for excel processing in the market: poi and jxl. We can make a simple comparison between the two frameworks:

  1. Development team: poi is an open source project under Apache, which is officially maintained by Apache. jxl seems to be an open source project maintained by individuals.
  2. Respective advantages: poi supports formulas well, jxl is not good. jxl provides support for images (only PNG format), poi does not support it. (from this point of view, poi should be selected for data processing and jxl should be selected for media processing)
  3. Memory consumption: because jxl has done considerable work in resource recycling, jxl is slightly better than poi in memory consumption. Therefore, it is more cost-effective to choose jxl for software import with large amount of data. Of course, there is basically no difference for small amount of data.
  4. Running speed: it is estimated that it is due to high memory consumption. poi seems to do a lot better than jxl for the function of reading and writing speed, and supports compressing excel.

Conclusion: after so much comparison, it should be very obvious which framework should be used in your project (of course, these are also collected from the Internet, and there are no mistakes).

tips: let me record the use of poi here. It's very simple. The basic principle is to extract excel data to form a list. Then, you can cycle the data corresponding to the list and the data corresponding to your own data table. It should be noted that if it is a number type, the read data is generally returned to you in double type. For example, if you write 100 in excel, the read data is 100.0. This is annoying. Of course, you can handle it yourself.

  Scan VX for Java data, front-end, test, python and so on

2. Start learning 😘

POI also has two different jar packages, which deal with Excel 2003 and Excel 2007 +, respectively, corresponding to POI and poi OOXML. After all, POI OOXML is an upgraded version of POI, and the amount of single page data processed is millions, so we also choose poi OOXML. Well, here's the code. There are more comments, so I won't be wordy. If the package is imported:

<!--introduce poi-->
<dependency>
	<groupId>org.apache.poi</groupId>
	<artifactId>poi-ooxml</artifactId>
	<version>3.17</version>
</dependency>

Core class & syntax:

 
//Write to the Workbook object by reading the file stream. Here, file is the imported Excel file
Workbook workbook = WorkbookFactory.create(file.getInputStream());

In this way, we can get the Excel table data. Then, we can traverse it without brain and write it into the object. For convenience of understanding, I assume that the table data is:

idnameage
1Zhang San18
2Li Si24

So our entity class is:

 
@Data
public class User{
    private Long id
    private String name;
    private int age;
}

tips: in the above code@ Data It is a lombok dependency. If not, please add the corresponding set and get methods

Then back to the Workbook, let's first understand the structure of the excel table and then traverse it. In this way, the idea will be much clearer! Excel Sheet is composed of multiple Workbook sheets. When we open excel, we can see:

274 x 244

Each workbook Sheet is composed of each Row of Row and each Row of Row is composed of each column of Cell

Learned: wrokbook > Sheet > row > cell ✌

So we traverse the code, which is: 👇

  Scan VX for Java data, front-end, test, python and so on

//Circular Workbook
for (int numSheet = 0; numSheet < workbook.getNumberOfSheets(); numSheet++) {
	Sheet hssfSheet = workbook.getSheetAt(numSheet);//Each Workbook
	if (hssfSheet == null) {
        	continue;
        }
	List<User> lists=new ArrayList()<>;
	//Circular row
	for (int rowNum = 0; rowNum <= hssfSheet.getLastRowNum(); rowNum++) {
		Row hssfRow = hssfSheet.getRow(rowNum);//Each line
		if (hssfRow== null) {
        		continue;
        	}
		User user=new User();
		//Cyclic column
		for (int cellNum = 0; cellNum < hssfRow.getLastCellNum(); cellNum++) {
			Cell hssCell = hssfRow.getCell(cellNum);//Each column
			if (hssCell== null) {
        			continue;
        		}
			String cellVal = getCellVal(hssCell);
			//The getCellVal method here will be discussed shortly below
			System.out.println("The first"+numSheet+"Workbook, page"+rowNum+"OK, No"+cellNum+"The data for the column is"+cellVal);
			user.setId(Double.valueOf(cellVal).longValue());
			user.setName(cellVal);
			user.setAge(Double.valueOf(cellVal).intValue());
		}
		lists.add(user);
	}
	System.out.println("The first"+numSheet+"The data of workbook is"+lists);
}

Because the data types in each column are different, I encapsulated a getCellVal method to pass the cell and return the string of the corresponding value. Code paste: 👇

 
public static String getCellVal(Cell cel) {
        if(cel.getCellType() == Cell.CELL_TYPE_STRING) {
            return cel.getRichStringCellValue().getString();
        }
        if(cel.getCellType() == Cell.CELL_TYPE_NUMERIC) {
            return cel.getNumericCellValue() + "";
        }
        if(cel.getCellType() == Cell.CELL_TYPE_BOOLEAN) {
            return cel.getBooleanCellValue() + "";
        }
        if(cel.getCellType() == Cell.CELL_TYPE_FORMULA) {
            return cel.getCellFormula() + "";
        }
        return cel.toString();
}

After we get the list data, we can add data to the database in batches. I use the method of mybatis here. Other methods are similar, so we can study by ourselves. Attach Code: 👇

 
<!--mapper-->
<insert id="addUserList" parameterType="java.util.List">
        insert into user
        (
        id,
        name,
        age
        )
        values
        <foreach collection="lists" item="item" index="index" separator=",">
            (
            #{item.id},
            #{item.name},
            #{item.age}
            )
        </foreach>
</insert>

//dao
int addUserList(@Param("lists")List<User> lists);

You can call it and save it into the database in batches

3. Summarize and sprinkle flowers 😍

Understand poi parsing Excel format wrokbook > Sheet > row > cell, and use Mybatis to add it in batch

  Scan VX for Java data, front-end, test, python and so on

 

Topics: Java Spring Boot Back-end