Encapsulating a lightweight Excel parsing framework with POI

Posted by deffe on Wed, 05 Jun 2019 00:57:33 +0200

This article is the fourth in this series.
The first one is: Java POI Operation Excel(User Model)
The second part is: Java POI Operation Excel(Event Model)
The third part is: Java POI Operation Excel(Event User Model)

Preface

Through the previous three articles, we have a good understanding of POI parsing Excel. In this article, we encapsulate an Excel parsing framework ourselves.

So why do you make an analytical framework for yourself? The essence of this problem, I think, should start with the individual business model.

We go to work every day to earn wages, essentially in exchange for money with the time and attention that we just don't return. If we want to improve the return we get, the obvious way is to raise the hourly salary. Besides, there is another way to upgrade, which is to sell a lot of copies of time. For example, best-seller writers, write a book. Time is only used once, but it is not. It can still pay off after writing.

As programmers, can we also use this idea to solve problems in our work? Of course, for example, what we need to do today is to encapsulate an Excel parsing framework. In our expected follow-up work, the function of Excel importing data will certainly be rewritten. But if this is finished, I will check the data and rewrite it next time. It's just duplication of effort. The pits we encounter this time may inevitably be trampled on next time. And if we encapsulate our own library this time, we can use it directly next time. Not only can we save time, but we won't trample on the same pit. So let's get started.~

Analysis needs

In our work, for Excel upload, the scenario we will encounter is usually to parse the uploaded Excel, assemble it into an object, verify the data, convert it into Po and import it into the database. In this process, what our Excel parsing framework does is actually to parse Excel and assemble objects. We hope that we can parse Excel with a little bit of code, and And you can freely choose whether to parse in Dom or Sax mode. You even want to be unaware of the version of the uploaded Excel.

Interface definition

The interface that provides parsing functions can be understood as a facade.

public interface IExcelParser<T> {
    List<T> parse(IParserParam parserParam);
}

On the parametric specification of analytical methods.
In the process of uploading, we need Excel streams, to parse the type of objects assembled after completion, how many columns of data in Excel, Sheet to parse, and header data.

Excel is uploaded externally, so in general, we will check the header data to achieve convergence of functions and prevent misoperation, which will affect the system. Of course, if we do not want to check, it should also be supported in our analytical framework.

public interface IParserParam {

    Integer FIRST_SHEET = 0;

    InputStream getExcelInputStream();

    Class getTargetClass();

    Integer getColumnSize();

    Integer getSheetNum();

    List<String> getHeader();
}

Integral design


class diagram

IExcel ParseHandler interface provides specific parsing services. Parser on the upper layer is shielded from parsing details.

Client Code

We analyze the code of the caller to achieve the effect of the leopard.

     @Test
    public void testDomXlsx() {

        parser = new ExcelDomParser<>();

        IParserParam parserParam = DefaultParserParam.builder()
                .excelInputStream(Thread.currentThread().getContextClassLoader()
                        .getResourceAsStream("test01.xlsx"))
                .columnSize(4)
                .sheetNum(IParserParam.FIRST_SHEET)
                .targetClass(User.class)
                .header(User.getHeader())
                .build();

        List<User> user = parser.parse(parserParam);
        System.out.println(user);
    }

Class User:

public class User {

    @ExcelField(index = 0)
    private String name;
    @ExcelField(index = 1)
    private String age;
    @ExcelField(index = 2)
    private String gender;
    @ExcelField(index = 3, type = ExcelField.ExcelFieldType.Date)
    private String dateStr;

The client code is very simple. We just need to assemble a default object of IParserParam, DefaultParserParam. Then we can parse it into Parser.

Look at the Excel Field annotation on the fields of User class. User class. We all know that to convert a row of data into an object, using reflection is the easiest way, so Excel Field is the corresponding field and the number of columns in Excel.

As for why all fields are defined as String, because the object will be converted to Po later. It is most convenient to use String type in Excel upload and parse.

Thread security issues

Using our framework in Web projects is bound to integrate with Spring. Spring will default to create a singleton's parsing class when integrating. What we need to do is to ensure that the singleton's parsing class does not have thread security problems. So how do we achieve this?

Let's first look at how dom parsing works.

public class ExcelDomParser<T> extends AbstractExcelParser<T> {

    private IExcelParseHandler<T> excelParseHandler;

    public ExcelDomParser() {
        this.excelParseHandler = new ExcelDomParseHandler<>();
    }

    @Override
    protected IExcelParseHandler<T> createHandler(InputStream excelInputStream) {
        return this.excelParseHandler;
    }
}

Above is the code of the upper DomParser. According to the code, we can find that excelParseHandler is a member variable. It has always been used. Then let's look at the implementation of DomparseHandler.

public class ExcelDomParseHandler<T> extends BaseExcelParseHandler<T> {

    @Override
    public List<T> process(IParserParam parserParam) throws Exception {
        Workbook workbook = generateWorkBook(parserParam);
        Sheet sheet = workbook.getSheetAt(parserParam.getSheetNum());
        Iterator<Row> rowIterator = sheet.rowIterator();
        if (parserParam.getHeader() != null && parserParam.getHeader().size() != 0) {
            checkHeader(rowIterator, parserParam);
        }
        return parseRowToTargetList(rowIterator, parserParam);
    }

    private void checkHeader(Iterator<Row> rowIterator, IParserParam parserParam) {
        while (true) {
            Row row = rowIterator.next();
            List<String> rowData = parseRowToList(row, parserParam.getColumnSize());
            boolean empty = isRowDataEmpty(rowData);
            if (!empty) {
                validHeader(parserParam, rowData);
                break;
            }
        }
    }


    private Workbook generateWorkBook(IParserParam parserParam) throws IOException, InvalidFormatException {
        return WorkbookFactory.create(parserParam.getExcelInputStream());
    }

    private List<T> parseRowToTargetList(Iterator<Row> rowIterator, IParserParam parserParam) throws InstantiationException, IllegalAccessException {
        List<T> result = new ArrayList<>();
        for (; rowIterator.hasNext(); ) {
            Row row = rowIterator.next();
            List<String> rowData = parseRowToList(row, parserParam.getColumnSize());
            Optional<T> d = parseRowToTarget(parserParam, rowData);
            d.ifPresent(result::add);
        }
        return result;
    }

    private List<String> parseRowToList(Row row, int size) {
        List<String> dataRow = new ArrayList<>(size);
        for (int i = 0; i < size; i++) {
            if (row.getCell(i) != null) {
                DataFormatter formatter = new DataFormatter();
                String formattedCellValue = formatter.formatCellValue(row.getCell(i));
                dataRow.add(formattedCellValue.trim());
            } else {
                dataRow.add("");
            }
        }
        return dataRow;
    }
}

We can see from the code that DomParseHandler itself does not use any member variables, and a member variable head in the parent BaseExcel ParseHandler is not used in this class. So this class is safe in multi-threaded environment. There will be no problem.

Next, let's look at Parser for Sax parsing

public class ExcelSaxParser<T> extends AbstractExcelParser<T> {

    public IExcelParseHandler<T> createHandler(InputStream excelInputStream) {
        try {
            byte[] header8 = IOUtils.peekFirst8Bytes(excelInputStream);
            if (NPOIFSFileSystem.hasPOIFSHeader(header8)) {
                return new Excel2003ParseHandler<>();
            } else if (DocumentFactoryHelper.hasOOXMLHeader(excelInputStream)) {
                return new Excel2007ParseHandler<>();
            } else {
                throw new IllegalArgumentException("Your InputStream was neither an OLE2 stream, nor an OOXML stream");
            }
        } catch (Exception e) {
            logger.error("getParserInstance Error!", e);
            throw new RuntimeException(e);
        }
    }
    
}

Through the code, we found that each time a new Handler is created, and different handlers are used according to different judgments. This way, there is no problem in multi-threaded environment. Spring can be used to manage the singleton.

Integration with Spring

Using Dom

<bean id = "excelParser" class="com.snakotech.excelhelper.ExcelDomparser">

@Autowire
private IExcelParser excelParser;

Using Sax

<bean id = "excelParser" class="com.snakotech.excelhelper.ExcelSaxparser">

@Autowire
private IExcelParser excelParser;

summary

Because of the large amount of code, you can't explain all the details in detail, but look at the whole article, I believe you have some ideas about how to encapsulate, you can try to implement your own Excel parsing framework. In the process of doing this, I believe you will benefit a lot.

Full quantity code

https://github.com/amlongjie/ExcelParser

Topics: Excel Spring Java Database